A few weekends ago, I got up at the crack of dawn and headed out to the first (annual, I hope) Ontario Linux Fest. The admission price of $40 clearly signalled that this was a grassroots gathering of Linux hobbyists, but I’m sure many of those in attendance were also professional developers and/or system administrators. Although some of the talks were more show-and-tell that I would have hoped, I had to keep in mind the target audience, and I still learned a few things, particularly regarding the optimization of high traffic websites – thanks to Khalid Baheyeldin for his talk on this topic.
You might think that I know a thing or two about building high traffic websites and you’d be right, in a sense; I mean, I do work for CBC.ca and one of my duties is to keep everything humming along at our origin. But most of our site, thankfully, is static HTML – that’s a design philosophy we arrived at after many painful years of executing heavyweight Java code for every news story. But I do recognize that this isn’t a feasible design for high-traffic, database-intensive websites like Wikipedia, Facebook, Youtube, and a lot of these other innovative new web properties that represent an upgrade to the Internet’s version number. The new challenge that immediately follows from the demands of "user-generated content" is "how do we make our infrastructure highly scalable and reliable even though we need to execute heavyweight code on each page request?".
Basically the challenges are on two levels: given a standard LAMP stack, how does one a) deploy a caching tier in front of the stack to reduce unnecessary hits to the stack, and b) what enhancements can one make to the "M" (the database tier) to increase redundancy, scalability under load, and disaster recovery? I’m sad to report that there is no panacea solution to either of these problems, but there are certainly a lot of tools out there. Not all of them are good.
The Caching Tier
Let’s start with the caching tier, since caching is an oft-misunderstood problem domain. To my knowledge there is not yet a set of good practices around caching, particularly to answer the question of “what to cache”? For example, Khalid recommended that PHP administrators use APC to cache and optimize intermediate PHP code. But the first thing the aforementioned web page says is that "This extension is not bundled with PHP.". Well, why the heck not? Implementing APC after your PHP site falls over is hardly the best time to be doing it. I predict that eventually, APC (or maybe another competing opcode cache technology) will be the default in PHP. Perhaps that will happen after they iron out all the kinks in APC, like the fact that it segfaults Apache sometimes and causes a "White Screen of Death" to the end user. Like I say, these technologies are not mature – but eventually the widespread adoption of PHP will force their maturation.
There is also the question of HTTP caching and acceleration, currently not a well-understood space either. For years, people (like the Wikimedia Foundation) have been hijacking the forward proxy Squid as a reverse proxy, and using it to buffer requests. Apache 2.2.x also has this functionality now, albeit with fewer tunables. Neither of these options is particularly robust, partly because they don’t have a lot of cache management tools (e.g. how do you purge an object from the cache?) Squid has some management tools, but its performance is suboptimal because it fights the kernel and the VM subsystem – for an enlightened perspective on this, read Poul-Henning Kamp’s paper on Programming Like it’s 1970.
A promising contender to this field is the Varnish HTTP accelerator, which I’ve mentioned previously. Again, it is not yet a mature product, although many risk-averse folks are using it in production. The fact that they are doing so speaks to this yawning void in the marketplace.
What’s my recommendation about caching? If you run a high-traffic website that requires the execution of a bunch of code, you need an intelligent cache – whether that’s memcache as a big, distributed hashtable for generated objects, or APC for your intermediate PHP, or even a commercial solution like Zend Platform that bundles a lot of these features together. For your infrastructure, buy a ton of low-end boxes with little redundancy, but kit them out with RAM. This space changes so frequently that by the time your three-year warranty runs out with HP/IBM/Dell/whomever, you should have changed or at least tried several different caching tools — in production. I also expect that as time goes on, the core applications that run within the LAMP stack will become more cache-aware; many apps are already becoming memcache-aware, and this will only increase.
The Database Tier
To meet the requirements of availability, redundancy, scalability and disaster recovery for a large website, one needs to build more than one database server. That much is clear, but beyond that, not much else is. I can almost categorically state that none of the solutions that currently exist will meet all of the above adverb-based requirements 100% with no human intervention (especially in case of a disaster). Clustering technology, particularly for MySQL, is not yet at the state where it can meet the capacity demands of a high-traffic website. Part of that is due to the design philosophy of keeping all the clustered tables in RAM. Obviously this is not going to work for a site like Facebook.
Aside from some small players like uni/cluster (functionally sensible but rather invasive on one’s environment), the only proven option for open-source databases is replication, whether we are talking about MySQL replication or Slony-I for PostgreSQL. Unfortunately, replication is always a master-slave relationship; you can only have one master, but as many slaves as you want. Clearly, for sites that have a lot of user-generated content, the writes are going to be a bottleneck since they always have to go through the database master. And there’s no real good solution for that. You could have several masters, each holding a piece of the database, with the application being smart enough to detect which master to dispatch writes to based on the required DML; or you could just buy the biggest, baddest DB server you can afford to be your master, and cross your fingers. From an engineering perspective, both are suboptimal. I predict that future work will concentrate on adapting clustering more to this space to address scalability requirements rather than for redundancy.
Now that we’ve moved beyond building websites out of simple HTML, we’re into the marshy areas of trying to build infrastructure sensibly to support high-traffic, database-driven websites. The current state of affairs for these tools isn’t pretty, but will get better.
The most important thing that you should do, as a system administrator of such a site, is to keep abreast of what’s happening on a development front. Better tools will come along — it’s just a matter of time. Moreoever, also keep abreast of what others are doing; in particular, the Wikipedias and the Facebooks and Bloggers of the world. Chances are that some of them have run up against exactly the same barriers that you have, and in some cases, they have deeper pockets to be able to experiment with solutions and the flexibility to throw them out when they don’t work. If you’re a small startup, you might not have that luxury, so piggyback on the efforts of others. The good thing about coming second to the finish line is that the front-runner has already done the hard work for you.