Archive for the ‘web’ Category

Scalable Counters for Web Applications

Saturday, June 28th, 2008

So you need to provide a count or counter for your web application, but you want it to scale. The naive approach would be to simply select count(*) from table. That will fail under load because it requires scanning your entire collection.

The first question you need to ask is, Do you need exact counts or will approximate counts be good enough? I bet in many situations, an approximate count will be perfectly reasonable. Think about the use case of tracking web hits. When you’re talking about millions of hits, what is the difference between 1,000,000 and 1,000,001? Of course, only your business expert will know if approximate or exact answers are required. The decision, though, is crucial because it’s the difference between an easy implementation and a hard (costly) implementation.

Let’s say, for the purposes of this article, that you’ll need very close to accurate counts, plus you need to scale a lot. The first step is to pre-calculate the count, and cache the results. When a new web hit occurs, grab the current count, add one, and put it back. This approach will scale for a while, but the chance of missing a count goes up as load goes up. Because we’re not explicitly locking on the row (which can be expensive), the last person to write the record back to the database wins.

The next option is to wrap the “grab record, increment, put record back” inside a locking transaction. This will ensure that only one writer can access the counter at a time. This ensures an accurate count, but will greatly slow down the site as contention around the single counter increases.

The third option, and the best option, is to split the counter up into smaller counters. When it’s time to get the full, single count, simply grab all the counter partitions and add them up. For very high loads, increase the number of partitions. The theory is it’s quick to add up 100 partitions, while you’re providing 100 different counters to lock around.

How do you pick which partition to increment? One easy way is to create a hash of the timestamp (or some other part of the request that changes frequently) of the request, and mod it on the number of partitions in the system. The theory here is you’ll be spreading the load across the partitions as the number of concurrent requests increases.

In any scalable web system, reads should be by key and writes are expensive. Do whatever you can to read a single object by a key, and minimize your writes. Minimize the contention around objects in the data store, too. Realize that ad hoc queries can almost always be implemented by pre-calculating the answers, so that an ad hoc query is simply retrieving a record by a key (instead of scanning through all rows, computing the answer as you go.)

For more on this technique, I recommend the excellent video Builing Scalable Web Applications with Google App Engine.

Proposed Enhancements For Web Browsers

Tuesday, June 10th, 2008

I was listening to Muxtape (quite possibly the best user interface for a web application). I use Firefox and I have lots of tabs open. Muxtape is playing the background, in another tab. Sometimes a song comes up on Muxtape that I don’t like and wish I could skip, however, I’m in “the flow” and don’t want to leave my current tab or application.

I’d like to propose an enhancement for web browsers to simplify the interaction with tabs in the background. Web applications should be able to specify a small menu of commands which can be executed from the tab without having to pull that tab into the foreground.

I’d love to be able to right-click on the tab and see options such as “Skip”, “Pause”, “Back”, or “Repeat”. This context relative menu is specific to each tab, and by clicking on any of the options, a Javascript function would be called.

I envision this as easily specified as part of the larger effort of HTML 5 to address modern day requirements of web applications to offer a richer experience.

Addressing Doubts about REST

Friday, March 21st, 2008

Thinking about REST or need to address some lingering concerns about adopting REST? I found the article Addressing Doubts about REST full of pragmatic, down to earth answers and advice for comparing REST and WS-* (or, RPC).

Nothing new, but a solid collection of answers to how you should think about REST and how you should apply it to your system.

I especially appreciated how the author points out that if you are worried about transactions across your systems, that’s probably a design smell and you want to re-think your approach. Never expect that transactions will work reliably across many systems. Instead, build in logic to recover from error states. This is not a REST issue, but instead a large system design issue. REST simply makes it easy to pass the state of your resources between systems.

Scaling Web Applications

Thursday, July 26th, 2007

Sam Ruby, via Tim Bray, has collected a list of scaling web applications presentations and documents. As Tim said, this is “everything anybody knows” on the subject.

I’m interested in large scale data crunching as we build out our data warehouse. It’s tricky for us, as we have one machine to do all of our data crunching, so we are definitely constrained by I/O. To really solve this issue on a single machine, we need to be smart with our disks and spread the data out to ensure parallel reads.

As I read through these presentations and reports, I’m always trying to map it back down to one machine with maybe four discs and two dual core processors.

Of course, I can just rent a Hadoop cluster.

Note to Amazon EC2: Install a EC2 instance on the DoD .mil network so we can use it, too!

RESTifying a Real World J2EE Application

Sunday, June 17th, 2007

RESTify DayTrader, in which Joe Gregorio converts a real world J2EE application’s interface into a REST interface.

Perfect example of how to do REST with real life requirements.

A brief history of Consensus, 2PC and Transaction Commit.

Wednesday, June 13th, 2007

A brief history of Consensus, 2PC and Transaction Commit, in which Mark Mc Keown attempts to keep us all in sync with the history of consensus across processes in a distributed system.

Excellent read. Thanks Mark!

Interview and Book Excerpt from RESTful Web Services

Friday, June 1st, 2007

InfoQ has an Interview and Book Excerpt from the book RESTful Web Services, the new book published by O’Reilly. I’ve ordered my copy from Amazon already, and I’m looking forward to reading it.

The interview also links to a sample chapter from the book, titled The Resource Oriented Architecture.

A great quote from the interview, by Sam Ruby:

I also wanted a book that rose above the “we are 733T, WS are the Sux0rs” zealotry that, sadly, one too often hears.

Congrats to Leonard and Sam on their book!

Google Apps to Operate Offline?

Wednesday, May 30th, 2007

By the looks of things at Google, I’d guess that Google Gears is part of Google’s plan to take their Google Apps offline.

Google Gears bundles SQLite, and provides strategies for syncing data back to the web server.

WADL Does Web Application Service Description Restfully

Monday, May 28th, 2007

wadl is the Web Application Description Language, which is an XML specification to describe RESTful web services. Seeing as the current version of the spec is a mere 31 pages, I am excited to see this move forward. Many people would consider the lack of a description specification from REST web services a set back, so having a simple spec like WADL can be used to help the converts. I can see a future version of Rails generating WADL documents natively.

Why Correct Web Architecture Matters

Saturday, May 19th, 2007

Patrick Mueller explores why twitter is slow. The answer? It turns out that Web Architecture actually matters. If you are writing a Web Application (yes, that’s capital W and A), then you need to understand Web Architecture.

Hat’s off to Tim Bray at today’s Keynote at RailsConf to trumpeting Web Architecture and specifically the Atom Publishing Protocol as a great example of correct Web Architecture.