When Application and Library Cookbooks Fail

Posted by Julian Dunn on January 23, 2013
Technology

Apologies in advance if you’re not interested in a post about the guts of Opscode Chef.

I recently started to adopt Bryan Berry’s application & library cookbook model as outlined in his excellent and funny blog post, "How to Write Reusable Chef Cookbooks, Gangnam Style". But I quickly ran into a blocker, because people are trying to solve problems using the compile phase and not the execute phase of Chef. Perhaps this calls into question the entire viability of compile-phase providers like chef_gem.

Let’s recap Bryan’s post quickly. He argues for vastly simplifying the number and complexity of bespoke cookbooks that sysadmins have to maintain by leveraging existing, high-quality community cookbooks as “libraries”. (The most common source of these library cookbooks is, of course, the Opscode Cookbooks GitHub repo.) Then, by creating a small bespoke cookbook (the “application” cookbook) that overrides certain attributes & behaviors, you can achieve the desired customizations for your own environment. As a trivial example, suppose you had a popular web application called “instachef” that just got bought by MyFace, and you need to handle more traffic in the Apache VirtualHost that’s fronting it. Well, you might create an instachef::server recipe that does nothing more than

node.set['apache']['prefork']['maxclients'] = 31337
include_recipe "apache2"

thereby overriding the default attributes in the apache2 cookbook but re-using all the logic that Opscode and other community members have put in there. Seems logical, right?

I wanted to do the same thing with the PostgreSQL cookbook: that is, I wanted to wrap the community’s with one that sets up Yum repos from the PostgreSQL Global Development Group (PGDG) and then installs PostgreSQL 9.2 instead of PostgreSQL 8.x on CentOS servers. The PGDG recipe looks something like this:

and then the run_list in my Berkshelf-managed Vagrant VM is just “recipe[smpostgresql::pgdb], recipe[postgresql::server]“. This works great: I get PostgreSQL 9.2 installed on the VM. So far, so good.

Things break down, however, when I want to use the database and database_user LWRPs to manage a set of databases and users. In order to use the LWRPs which run in the execute phase, I need to have the “pg” Rubygem installed in the compile phase. But the “pg” Gem has native extensions which must be compiled against the headers of the PostgreSQL I want, and I can’t retrieve those for PostgreSQL 9.2 until the execute phase, when the pgdg recipe sets up a Yum repo for me to retrieve them from! Argh, chicken and egg problem.

I think the root cause of this problem is that people are abusing the compile-phase of the Chef run to do things that normally would be done in the execute phase. Just look at the source code of postgresql::ruby: it’s almost like an entire recipe, forcibly run in the compile phase. Whenever I see code that breaks the boundaries between execute and compile, I think something’s seriously wrong.

I don’t know the internals of Chef and I’m no Ruby expert, so I don’t know how viable my solution is. But conceptually, this would all be solved if chef_gem was an execute-time resource only. Doing so would mean that other recipes that actually belong in the execute phase — like downloading postgresql92-devel and a C compiler — could be done ahead of installing the gem. It’s almost like we need lazy evaluation of the require "pg" call, until the execute phase, at which point the LoadError could be rescued, the Gem installation could proceed, and the require retried.

I’m interested to know what other Chef practitioners think. In the meantime, I’m working around the issue by simply avoiding using the LWRPs.