in Hardware

another one bites the dust

Today I sent another SATA hard drive back to Seagate because it failed. You might recall that I have had a bad track record with SATA drives: since purchasing this PC about two years ago, I’d gone through about three Western Digital SATA drives (all replaced under warranty due to failure) until I finally got fed up about six months ago and bought a pair of Seagate Barracuda 250 GB drives. One of them failed after three months, and the other died just this past weekend. Fortunately, my PC is RAID-1 protected (and I have all the data backed up on DLT) – but seriously, why are SATA drives so prone to failure?

To give you some context for my astonishment at the SATA failure rate, I have a 120 GB Maxtor PATA drive in the same computer that I use as a temporary scratch disk. It’s about three years old and predates this computer. It’s been powered on just as much as any of these SATA drives, and yet it keeps on chugging. Then, I also have a RAID-1 mirror in my main file server (aphrodite), again, backed by a pair of old Western Digital 80 GB PATA drives. I’ve had that set up since 2003, and the array has lost exactly one disk.

I’ve been hard-pressed to find any papers comparing failure rates between SATA and PATA, although I did find one interesting USENIX paper from FAST’07 that suggests that there is no discernible difference in failure rates between FC, SCSI and SATA. This could be due to the fact that PATA is an end-of-life technology, and SATA is the intended successor.

My conjecture is that hard disks are failing more frequently due to increased spindle speed and miniaturization, and also increasingly poor quality control. The latter is self-evident in the lengths of warranties being issued by drive manufacturers these days. Where formerly five-year warranties were the norm, most manufacturers will now provide only a three-year warranty. Moreover, since drives are so inexpensive these days, manufacturers are probably assuming that there will be a high abandonment rate after failure, i.e. consumers will just buy a replacement drive with double the capacity for the same price they paid for the failed disk. This of course neglects the intangible cost of lost data, but clearly that does not factor into the manufacturers’ bottom line.

I can’t think of a good way to protect myself from disk failure given my bad track record, other than continuing to use RAID-1 technology. For now, if I have to specify a system using commodity SATA drives, I will likely specify the use of RAID-1 even though it adds another ~$300 to the cost of a system. $300 is still better than spending $1000+ on data recovery services after the fact.