Chef, devops, and the death of system administration

Opscode Chef logo Last night, at a meeting of NYLUG, the New York City Linux Users’ Group, I watched Sean O’Meara whip through a presentation about Chef, the system configuration management (CM) tool. I was impressed. The last time(s) I tried to play with automation tools like cfengine and Puppet I got very frustrated at their complexity. The folks at Opscode have definitely succeeded at bringing simplicity (as much as can be had) to the CM space.

But what struck me after hearing Sean had nothing to do with Chef. Instead, I came to the conclusion that pure systems administration is eventually going to die out as a profession. The developer is now king (or queen), and that’s not a bad thing. Continue reading →

/usr/bin/vmware-config.pl gone!

I upgraded to VMWare Workstation 6.5 recently and now /usr/bin/vmware-config.pl is gone. I only discovered this recently when I updated my kernel for a security fix, and lo and behold, the old method of making vmnet, vmmon, etc. modules for the new kernel no longer applies!

It seems like others are having the same problem and I can’t find a sensible solution other than uninstalling and reinstalling VMWare. It seems the geniuses over at the Evil Machine Corporation have decided to replace vmware-config.pl with some sort of GUI called vmware-modconfig that doesn’t seem to work right.

Why can’t people just leave working tools alone — or at least preserve the familiar API for people that don’t want to wade through 300 pages of PDFs to figure out how to fix the breakage?

adventures in configuration management with Puppet

I’ve started investigating higher-order system configuration management tools, in particular, Puppet, in order to help manage CBC‘s web infrastructure. About two years ago, the only player in this space was cfengine, which at the time struck me as quite functional, but also quite arcane. Puppet seems to be much easier to learn. But I’ve come up against a fundamental problem that I’m hoping more experienced Puppet users can help me with. How does one force synchronous updates to Puppet clients? In this entry, I’ll explain my use case, and hopefully get some answers as to whether Puppet is the right tool for the job. Continue reading →

re-implementing Cacti

Earlier this year, we were forced into decommissioning our Cacti installation after the server it was hosted on suffered a catastrophic failure (it literally melted down). The server was an ancient Compaq Proliant DL320 with an older HP SmartArray RAID controller, so we had no feasible way of recovering the RRDs off it, nor the MySQL database.

Nevertheless, we figured our trending needs would be met by the implementation of another trending solution whose name I will withhold. It does the job of monitoring devices over SNMP just fine, but this product cannot get data from external scripts. This is essential for us to monitor things such as the thread states on our Apache servers. Consequently, we have decided to rebuild an instance of Cacti for these needs.

Continue reading →

static vs. dynamic content: a footnote

My colleague Blake recently wrote an article on the occasion of the decommissioning of NewsDelivery, a dynamic content display engine that until recently ran all the news stories on CBC.ca. I can’t speak for any of our alumni, but I think all of us at CBC.ca have learned one lesson:

Large websites should never, ever use dynamic rendering for content rendering.

It’s amazing how many content management systems still do not grasp this principle. On a busy site, especially one that is liable to be Slashdotted or visited heavily (say, on 11 September 2001), you do not want to be executing Java/ASP/Smalltalk/FORTRAN/whatever code every time someone visits a story. In short, you do not want CPU usage to rise proportionally to the number of visitors you have.

What you do want is to make the content rendering "system" as simple as possible; in the ideal case, you can barely call it a system. For content rendering, CBC.ca now mostly uses a bare Apache instances with server-side includes, meaning that aside from the core Apache engine, no other code needs to be executed every time you view a story.

This seems like a very simple principle, but many other news sites are still not grasping it. I can almost guarantee that if there is another 9/11-scale of event, sites that use a servlet-based dynamic execution system like The Globe and Mail and The Toronto Star will fall over under heavy load far sooner than CBC.ca. But I don’t really blame those organizations for choosing, for example, Fatwire Content Server (as in the Star’s case) because a news organization’s primary need is to create content. Displaying it is a whole separate problem entirely and the shame should be on the vendor for closely tying the two together.

send CBC.ca news alerts to your cell phone with procmail and awk

CBC.ca has a News Alerts mailing list where you can subscribe to get breaking news delivered to your e-mail inbox. Unfortunately, the news alert e-mail has a lot of extra gunge before and after the body of the actual alert, in our case, a header saying “Breaking news from CBC” and then some trailing text about how to unsubscribe from our mailing lists, etc. I use this procmail recipe to strip out the extra stuff and just send the body of the message to my cell phone. You might find it useful too!

Continue reading →

why has CORBA failed?

There’s a great article in this month’s ACM Queue entitled The Rise and Fall of CORBA. Since it’s authored by Michi Henning, who worked on CORBA as part of the OMG’s architecture board, and subsequently became an ORB implementer, consultant, and author of a book on CORBA Programming with C++, I had to take notice. The article itself isn’t available online, so I’m sorry I can’t suggest that you read it — instead you’ll just have to put up with my opinions, peppered with some quotes from the article.

Continue reading →

Google for system logs

I’ve been playing around with Splunk recently, which I bill as “Google for your system logs.” It’s much more than just a simple search engine, but that’s the simplest way to describe what it does; it aggregates log data from multiple sources and allows you to search, correlate data in time, and also post (anonymized) snippets from your log data on Splunk Base for others to see.

For our little shop, Splunk is probably overkill; I have about 30 servers (physical and virtual) to manage, and I have not found myself needing the functionality, per se. But it’s still a neat tool. I wish we’d had something like this at my previous job, in particular to index log4j entries from misbehaving Java applications. Trying to sift through data from six Java servers and six webservers in real-time to try and find out why the site is tanking is nearly impossible and often led to live hacks on production to disable dumb ideas that were taking the site down.

Now that I’ve posted all those HREFs, I wonder if Google will take down the site when it next indexes my journal. 🙂

publishing free/busy information in Evolution

Ximian Evolution can publish Free/Busy information by using WebDAV, but this doesn’t seem to be documented anywhere I could find. Here’s what I did to set it up:

Set up a WebDAV-compliant webserver. I installed mod_dav for Apache 1.3.x.
Configure DAV properly, and make sure that the directory you are enabling DAV for is writable by the webserver user.
Configure Evolution. Select Tools > Settings, then Free/Busy Publishing. Click Add URL and in Publishing Location type in http://your-server-name/your-location/. Don’t forget to supply the username and password you set up for DAV.
You’ll get no diagnostics from Evolution when the publishing occurs, so you’ll have to check the webserver logs to see if it succeeded or failed.

connecting Tomcat and Apache

Please bear with me while I engage in the following diatribe about: “Why Is It So Darn Difficult to Connect Apache and Tomcat?” Anyone who has worked with mod_jk/mod_jk2 and its ilk know that connecting Apache and Tomcat over AJP (Apache JServ Protocol) is probably one of the more difficult server configuration tasks out there.

A little history: When Tomcat was still Apache/JServ (way back in the day), there was a mod_jserv that managed the AJP pipe between the front-end HTTP server (i.e. Apache HTTPD 1.x) and the back-end application server (JServ). Eventually, this evolved into mod_jk for the first series of Tomcat application servers.

All well and good, and the configuration is fairly straightforward, up to the point of actually talking to your web application: the dreaded JkMount syntax. The example directive looks like this:

JkMount /examples/* worker1

There are a number of problems with this syntax. First, it unnecessarily ties the paths that you use to access the web application from the backend with those that you use on the front-end. So for instance, I have no way to specify that I actually want to map “/julians_examples” on the front-end to “/examples” on the backend. Want to do that? Sorry — time to institute some kind of mod_rewrite hackery. Secondly, the “*” doesn’t mean what you think it means! It’s not a wildcard, so you can’t selectively map stuff; for instance, I can’t say JkMount /examples/foo* to map all resources starting with foo to the application server. This will tell AJP to look for a resource matching, literally, “/examples/foo*” and of course will fail as there’s no resource with that asterisk in there.

Ok, so along comes mod_jk2, which is supposed to be a refactoring of mod_jk. It has certain improvements, such as the ability to talk over a shared UNIX socket (instead of using a network-based AJP protocol), the configuration is simplified again, etc. But again, the web application mapping problem is prevalent! The syntax to map the front-end to the back-end is like this:

<Location "/foo"> JkUriSet worker ajp13:backend-server:8009 </Location>

ARGH! Still no way to specify that the front-end /foo should be mapped to some other back-end path!

Why is this so difficult? And why do we have so many connector projects (like mod_webapp) that have died? A few years ago, I looked into mod_webapp‘s WARP protocol and it seemed to be a breath of fresh air over this antique AJP13 protocol. What happened to it?

I should mention as a postscript that maybe, maybe, in HTTPd 2.1, the new mod_proxy_ajp will solve my problems. Its syntax looks like this:

<Location /examples/> ProxyPass ajp://backend-server:8009/examples/ </Location>

Wow! Finally a way to say that I should map something on the front-end to a path that could possibly be different on the back-end.

I don’t understand why it’s taken us ten years (and counting) to get to this state. Is it just me that thinks this is totally bonkers?

As a footnote to this, I get the sense from the documentation that AJP13 is a very poorly documented protocol, and is still around simply due to momentum. Read these statements from the documentation, for example:

"This describes the Apache JServ Protocol version 1.3 (hereafter ajp13 ). There is, apparently, no current documentation of how the protocol works. "
"In general, the C code which Shachor wrote is very clean and comprehensible (if almost totally undocumented)."
"I also don’t know why certain design decisions were made. Where I was able, I’ve offered some possible justifications for certain choices, but those are only my guesses."

Undocumented code? Unjustifiable design decisions? Little current documentation about how the protocol works?

It’s things like this that are killing us in the Open Source community. I find it pretty difficult to pitch Tomcat as a worthy alternative to IBM WebSphere or BEA WebLogic when we have this kind of cruft sitting around, pretending to be an "enterprise-worthy" solution.