Scalable Counters for Web Applications
So you need to provide a count or counter for your web application, but you want it to scale. The naive approach would be to simply select count(*) from table. That will fail under load because it requires scanning your entire collection.
The first question you need to ask is, Do you need exact counts or will approximate counts be good enough? I bet in many situations, an approximate count will be perfectly reasonable. Think about the use case of tracking web hits. When you’re talking about millions of hits, what is the difference between 1,000,000 and 1,000,001? Of course, only your business expert will know if approximate or exact answers are required. The decision, though, is crucial because it’s the difference between an easy implementation and a hard (costly) implementation.
Let’s say, for the purposes of this article, that you’ll need very close to accurate counts, plus you need to scale a lot. The first step is to pre-calculate the count, and cache the results. When a new web hit occurs, grab the current count, add one, and put it back. This approach will scale for a while, but the chance of missing a count goes up as load goes up. Because we’re not explicitly locking on the row (which can be expensive), the last person to write the record back to the database wins.
The next option is to wrap the “grab record, increment, put record back” inside a locking transaction. This will ensure that only one writer can access the counter at a time. This ensures an accurate count, but will greatly slow down the site as contention around the single counter increases.
The third option, and the best option, is to split the counter up into smaller counters. When it’s time to get the full, single count, simply grab all the counter partitions and add them up. For very high loads, increase the number of partitions. The theory is it’s quick to add up 100 partitions, while you’re providing 100 different counters to lock around.
How do you pick which partition to increment? One easy way is to create a hash of the timestamp (or some other part of the request that changes frequently) of the request, and mod it on the number of partitions in the system. The theory here is you’ll be spreading the load across the partitions as the number of concurrent requests increases.
In any scalable web system, reads should be by key and writes are expensive. Do whatever you can to read a single object by a key, and minimize your writes. Minimize the contention around objects in the data store, too. Realize that ad hoc queries can almost always be implemented by pre-calculating the answers, so that an ad hoc query is simply retrieving a record by a key (instead of scanning through all rows, computing the answer as you go.)
For more on this technique, I recommend the excellent video Builing Scalable Web Applications with Google App Engine.
June 29th, 2008 at 3:46 pm
Here’s how I’d tackle this problem…although my mind is thinking in terms of Java solution.
First, you need a single counting entity. Static session bean type thing.
When it is created, it loads an initial count from a persistent store and sets in it in a variable.
When it is asked to get/increment count it would (in a synchronized/locking method)
1. Increment the counter variable
2. Throw a (small) “hit” event onto a message queue
3. Return the new value
Problem 1 solved..how to count fast in a synchronized manor.
Problem 2 is left…persisting the data. There are a few options here. The most simple option would to have a group of message beans read the queue and write the records to a DB.
This method should be able to scale very well. For outputting a new “hit” value, no DB read is required.
A fail-over system could be implemented to handle counts for downtime situations with no data loss.
Thanks for giving my brain a simple yet complex problem to crunch on this evening.
Are you the Seth Ladd from Manlius, NY?
Peter Daly
June 29th, 2008 at 9:24 pm
Hi Peter,
Yes, from Manlius.
Second, I’d caution about the message queue. Where are you going to persist those messages? Are you concerned about possibly losing them?
Third, if you are using Java, you want to look at AtomicLong.
Fourth, if you have a group of message beans (man, that’s old school. you haven’t switched to Spring yet?
are they going to write to the same counter? If so, you have the same contention issue. Or are you going to use partitioned counters?
June 30th, 2008 at 1:27 am
Message queue persistence - depends on the requirements…if minor losses are not a problem, a RAM based queue may be acceptable.
Counter persistence - again, depends on the requirements. Message beans would the easiest, but for better scalability I would implement a custom consumer for the queue that would read the queue and write many hits at once as a single DB transaction. Even without partitioning, that “write many at once” method can be tweaked to persist millions of records an hour based on similar personal experience. Primary bottleneck would be bandwidth to the DB or disk IO of the DB. If this becomes the bottleneck, then sure, partition it.
-Pete