in Chef, Information Technology

Migrating from Sympa to Google Groups for Business

One of my last projects for 2014 is to move Chef off our old Sympa mailing list server to Google Groups for Business. This migration was codified in Chef RFC 028 a few months ago, but we wanted to hold its implementation until the migration to the chef.io domain was completed.

Moving the list of subscribers is fairly straightforward, but migrating the list archives has been enough of a chore that I thought I would document the required steps. This way, if anyone else is faced with a similar task, they do not need to spend hours Googling and/or banging their head against the wall of the Google Groups Migration API, where there are dragons — lots of them.

What is the Google Groups Migration API?

The Google Groups Migration API is a REST interface that allows you to post RFC822 message bodies, one by one, so that they show up in the archives of a group. The main advantage of doing a migration this way rather than using direct SMTP to the Google Groups SMTP server is that it doesn’t blast the emails to subscribers on the new mailing list, so you could conceivably do it any time. Additionally, the Migration API is reportedly much more reliable than SMTP, and it is also idempotent — reposting the same message body to the API won’t create a duplicate entry in the archive.

Unfortunately, the Migration API is only available if you are using Google Apps for Business. This API is not available for regular, public Google Groups (those ending in @googlegroups.com).

Getting Started

To use the Google Groups Migration API, you need to perform all operations as a domain administrator. First, let’s log into the domain’s administration control panel and add Groups for Business to our domain. (Confusingly, you can have Groups in your domain without Groups for Business.) Click Apps on the administration homepage:

Groups_Migration_Apps

If you don’t see Groups for Business in the ensuing list, click the + sign at the top to show all available services:

Migration_Add_Apps

Now let’s create a group in Google Groups for Business by visiting groups.google.com/a/<your-domain>. I’m not going to walk through that procedure here; it’s pretty similar to creating a public Google Group (although of course there are more settings – e.g. you can choose the scope of who can view and post to your group).

Set up a Migration Project in Google Developer Console

Now we need to set up a migration project in the Google Developer Console to get access to the Migration API. This was by far the most confusing part for me — I thought, why do I need a “project” in order to do this? Anyway, visit the developer console and create a new project. It doesn’t matter what the name or description are:

new-google-developer-project

Once the project is enabled, we need to authorize it to access the Migration API. Click APIs under “APIs & Auth” in the left-hand-side navigation and scroll down to Groups Migration API and turn it on:groups-migration-api-turn-on

Finally, we need an OAuth2 client ID for the API. Again, under “APIs and Auth”, click “Credentials” and then the “Create a Client ID” button:

oauth-create-client-id

It’s important to select “Installed application” in the ensuing dialog.

installed-application

You will get prompted to set up a consent screen (developer name, responsible party, etc.) — this doesn’t really matter since this isn’t an application we’re offering to the public, but it is a mandatory procedure. Once you finish filling out the consent screen, you can complete creating the client ID, and you’ll see it in your list of clients:

client-id

Now you can click “Download JSON” — this is the set of secrets you’ll pass to the migrator application.

The Migrator Application

There isn’t a prepackaged application for doing this work. I updated the example code from Google’s enterprise deployments repo to be able to take a fileglob instead of calling it on individual messages. I also had to monkey around with the httplib2 object directly to give it a valid SSL certificate chain, because that was failing. (Sorry I hardcoded those right in the script itself, but I’m sure you can make it more elegant.) Here’s my copy of the script: https://gist.github.com/juliandunn/1eb80f4f909b8c87c663

Note that there are some Python dependencies you need to install to be able to run the script. Install those with pip or easy_install or whatever your poison is.

A Special Note About Message Migration Order

Migrated messages will show up in the archive in the order they were migrated. It’s up to you to make sure you’re feeding the migrator with messages in oldest-to-newest order, which is why I am sorting the fileglob in the Python script.

For Sympa specifically, the archive filenames are not zero-padded, so you need to handle that. In other words, a sorted fileglob will look like [“1”, “100”, “101”, “102”, … , “2”, “200”, “201”, … , “3” … ] which is probably not what you want, so use a shell trick like this to zero-pad the filenames before processing:

$ cat <<EOF > ~/zeropad
#!/bin/sh
num=`expr match "$1" '[^0-9]*\([0-9]\+\).*'`
paddednum=`printf "%03d" $num`
echo ${1/$num/$paddednum}
EOF
$ cd archive-parent ; for h in * ; do cd $h ; for i in * ; do mv $i `sh ~/zeropad $i`; done ; cd .. ; done

Yeah, yeah, not the most elegant, and I could probably have done similar trickery in Python, but remember, this is a one-off 🙂

Run the Migrator Application

Start a tmux session first! The migration takes a long time and you don’t want it to fail in the middle just because your Internet connection went down. If you don’t know what tmux is, have a look at the previous link — I’ll wait. 🙂

Once inside a tmux, run the migrator:

$ ./chef_groups_migration.py -m 'work/chef-sympa/chef_*/*' -d yourdomain.com -g [email protected]
Credentials are invalid or do not exist.
This function, oauth2client.tools.run(), and the use of the gflags library are deprecated and will be removed in a future version of the library.
Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fapps.groups.migration&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&client_id=fake.apps.googleusercontent.com&access_type=offline

Enter verification code:

Like the application says, visit the URL in your browser and you will be asked to perform the 3-legged OAuth authentication. Copy and paste the resultant verification code, and the migration will start:

Successfully retrieved access token
Authentication successful.
URL being requested: GET https://www.googleapis.com/discovery/v1/apis/groupsmigration/v1/rest
Attempting to insert work/chef-sympa/chef_2009-01/001
URL being requested: POST https://www.googleapis.com/upload/groups/v1/groups/yourlist%40yourdomain.com/archive?uploadType=media&alt=json
Response Code: SUCCESS
...
...

Don’t worry if the migration fails at some point; occasionally Google Groups will throw a “503 Backend Error”. Because all the message bodies are checksummed by Google, the operation is idempotent so you can restart it with a narrower fileglob wherever it fails. (Or you could also catch exceptions and retry within the script, but again, I’m going for quick-and-dirty here.)

Wrapping Up, Other Tools

As with most projects, this “one-day project” consumed most of my week. On the way to shaving this yak, I found several tools that were somewhat helpful:

  • Got Your Back is a suite of Python scripts for migrating email in and out of Google Mail and Google Groups for Business. Unfortunately, it didn’t work for this case, because while it can read mbox files (which I would have happily converted the individual Sympa digests into, using formail), it only knows how to restore them to a Google Mail account, and not a group.
  • Sympa Data Extractor is a great Java program for sucking data out of Sympa, using Firefox and Selenium to automate the administration web interface. The downside is that Selenium is very particular about the version of Firefox in use, so I wasted a bunch of time doing that — and in the end, it didn’t gain me that much because I only had two lists to migrate. However, if you had a Sympa server with dozens of lists, automating the download of the user lists and archives would be a great timesaver.
  • Dito GAM is a suite of command-line tools to help administrators work with Google Apps. If I had to actually administer a Google Apps domain as part of my day job I would definitely look into it. I ended up not needing it. Again, if I had dozens of lists to migrate, I could see piping the subscriber list to a GAM script.

Also, if you’re a Chef user: yes, we will be announcing the cutover of Sympa to these new Google Groups in the near future, once we have dealt with the other administrivia & configuration of said lists.