64-bit Xen considered harmful

Recently at work, we tried to implement Xen on Intel Xeon, running a 64-bit dom0/domU. I have to say that this failed horribly, so I’m writing this post to warn others off it. My colleague Gabriel worked hard to migrate everything back to a 32-bit environment, so kudos to him.

The specific symptoms we experienced while running 64-bit Xen is that the domU’s would crash and reboot randomly under (or after) high load. One of our domU’s is a development server, which also runs a CruiseControl, a continuous integration system. This means that every minute, CruiseControl wakes up, does a cvs update to see if there are any changes, and then recompiles the project(s) if needed. Periodically we started to see error messages like
Bad pte = 32971e067, process = cvs, vm_flags = 100077, vaddr = b7f34000 [] vm_normal_page+0xb7/0xd3 [] unmap_vmas+0x3d1/0x761 [] unmap_region+0x8a/0xf0 [] do_munmap+0x148/0x19b [] sys_munmap+0x33/0x41 [] syscall_call+0x7/0xb =======================

After a few of these, domU would reboot. It seems like others are having the same problem on 64-bit Xen. This user was running CentOS 5.1, which is basically what we’re running (we have the real deal Red Hat Enterprise LInux 5.1).

As I said, migrating the domU back to a 32-bit dom0 seemed to fix this, so let this be a fair warning to others thinking of running a 64-bit dom0.

can’t sa-update after a recent SpamAssassin upgrade?

I got bitten by this bug after upgrading to SpamAssassin 3.2.4 recently. It seems that the GnuPG key shipped with SA precludes the verification of signatures from updates downloaded using sa-update, due to some esoteric defect with the OpenPGP design. Anyway, the point is that attempting to download new signatures using sa-update results in the following error:

error: GPG validation failed!
The update downloaded successfully, but the GPG signature verification failed.
channel: GPG validation failed, channel failed

(How many times can one say the word “failed” before I get the message?)

Anyway, it looks like the SA folks have corrected the problem with their key but it’s only available in SVN trunk so you have to perform the following magic incantation:

$ sudo gpg --homedir /usr/local/etc/mail/spamassassin/sa-update-keys --delete-key 0x5244ec45 $ wget -O - http://cvs.apache.org/viewvc/spamassassin/trunk/rules/sa-update-pubkey.txt?revision=610699 | sudo gpg --homedir /usr/local/etc/mail/spamassassin/sa-update-keys --import -

That assumes you’re using FreeBSD — adjust your paths appropriately.

The bug is still open and will be fixed in the next version (boy, if I had a nickel for every time I’ve heard that from vendors…)

SDF celebrates 20 year anniversary

On June 16th, SDF Public Access UNIX system will celebrate its 20th anniversary!

Twenty years ago, SDF-1 was a 300 bps dialup BBS running on an Apple ][e computer system, and has evolved over time into a twelve node DEC Alpha cluster running the NetBSD operating system. SDF users, of which I am one (keymaker@), pride themselves on the fact that theirs is one of the last bastions of “the real INTERNET”, out of the reach and scope of the commercialism and advertising of the DOT COM entities. I recall fondly the days before commercial traffic was permitted on the NSFNet, and oftentimes wish that we could return to those days when everyone knew their proverbial neighbours.

If you’re interested in SDF, lifetime membership is very affordable at $36. You can find out more information about SDF here. You won’t find any fancy Web 2.0 widgets, but you can definitely still use Gopher 1.0!

“lp0: on fire” explained

Ever get the above message on your Unix/Linux machine? This awesome explanation shows you from where the error originates.

California Gubernatorial Race

Now that the California gubernatorial race has turned into a complete circus sideshow, with both Arnold Schwarznegger and Larry Flynt of Hustler running, I’m suggesting that Darl McBride should mount a campaign, as well. Since the state of California isn’t doing so well financially, he can mount frivolous lawsuits against other states in an attempt to prop up the economy.

In fact, he could have the State of CalifOrnia (SCO) claim to own the copyright to the concept of rolling blackouts, which they purchased from PG&E. Then, he can sue, say, Idaho, for initiating blackouts without paying proper licensing fees.

Or perhaps, after IBM’s lawyers are finished breaking his spine on the Catherine wheel, he’ll just have to find another ailing public company in need of a business model that involves suing people.

SCO

I’m just waiting for SCO to declare itself in violation of its own trademarks, and sue itself.

XFree86 “Crisis”

So there’s this big flap about whether or not XFree86 should be forked. Doesn’t it seem like we go through this every few months with every other large open source project? I mean some operating systems are a direct result of forking. And then you have Linux with its -dj, -ac, -my_dog_spot branches, and myriads of different releases — 2.0.x, 2.2.x, 2.4.x, 2.5.x. It’s crazy. Not that the Linux development model (a/k/a complete and utter chaos) should be emulated by anyone.

Mike Harris has an interesting diary entry on why people are so fed up with XFree86, but my point, as I’ve made it above, is that the problems the XFree86 project have are endemic to any large open source project. After a while, any “core” development team becomes so insular it becomes a little “old boys’ club”, and unless there are folks willing to help reverse that trend, you end up with a lot of people outside core being very pissed off, and threatening to fork the code, etc. By and large I think code forks are a Bad Thing except in cases where the project is trying to do two different technical things at the same time. But forking code due to the inability of people to cooperate, and due to the core team becoming so insular — that’s not beneficial to anyone.

IP block renumbering day

… it’s only our internal 10.10.10 netblock, but still, a lot of grunt work.

I managed to reconfigure all the switches without locking anyone out (or myself), and MRTG didn’t complain that much. All that remains is to renumber the dev server, and hopefully doing a perl -p -i -e 's/10.10.10.20/10.10.10.5/' /etc/* will do the trick.

Then the IT department can appropriate that new Cisco (the one which has about 4 out of 10 ports in use, when a bunch of bimaps on our firewall could do the trick) for ourselves 🙂