in BSD

rebuilding the FreeBSD file server

I finally took an evening to upgrade my aging Compaq AP550 fileserver (FreeBSD 4.11-STABLE) to FreeBSD 6.x. Even with some good planning (as any good IT person should do), there were still a few problems.

The first thing I did was to acquire a new pair of 80GB drives and attach them to the RAID controller as a new RAID-1 set. The array build took about 45 minutes. I tried to boot FreeBSD 6.2-RC2 off a CD and the system froze halfway through the boot. I booted in verbose mode and found a whole bunch of ACPI strangeness; it seemed the motherboard temperature sensor was giving weird values and causing acpi_tz0 to thrash. Given the age of the box (pre-2001) I decided to boot with ACPI off and that seemed to work just fine. (Eventually I disabled “Thermal Monitoring” in the BIOS to get around this problem.)

Armed with the old fstab I mounted and tarred up partitions & directories of interest such as /etc, /usr/local/etc, and so on. The contents of these directories helped me reproduce all the configuration from the old server’s software with ease. Within a couple hours I had amavisd-new, postfix, and so on working again. My goal was to at least get e-mail flowing again and NFS working (so I could use my regular workstation).

Overnight I initiated a world and kernel rebuild to get up to 6.2-STABLE. In the morning, I installed the new world and kernel, and I was done! Or was I?

As you can see, the new kernel panicked. Uh-oh…

I quickly booted the old kernel and configured dumps in /etc/rc.conf:

dumpdev=”AUTO”

Then I rebooted the broken kernel and watched it dump core to my swap partition, as expected. When I rebooted with the working kernel, I had a vmcore.0 in /var/crash, as expected.

Armed with the debugging information, I was all ready to poke at it with kgdb but I thought I would check the freebsd-stable mailing list to see if anyone else had reported the problems. Sure enough, they had.

It seems like a patch to tcp_subr.c was missed, and I just happened to cvsup at a bad time (innocent bystander). I must admit that I was quite stunned for -STABLE to panic, because usually it’s quite… stable. Still, human error happens. I’m just thankful that the problem wasn’t due to my aging hardware!