in Internet Services

Amazon S3 backups: a proof-of-concept

Recently, I decided to experiment with Amazon Web Services‘ Simple Storage Service (S3) for online backups. This was predicated by my DLT7000 tape drive dying; when I discovered the repair cost is nearly $400, I decided to do a cost-benefit analysis using the S3 platform as a proof-of-concept before sending it off to the shop. Today’s post will review the results of that analysis.

Some Background Information…

I use Amanda to run backups twice a week on a fourteen tape (tapecycle 14 tapes) rotation, specifying that I want full backups of everything at least every four weeks (dumpcycle 4 weeks). This gives me (almost) two full copies of all my data; I didn’t have sixteen tapes at the time of setting it up. Each DLTIV tape (in DLT7000 mode) holds at least 35GB of data uncompressed; I generally assume each tape will hold 40GB of real data.

Replicating The Setup Using S3

Amanda 2.6.x provides a new Device API to abstract backup devices, so that Amanda can talk to more than just tapes or even virtual tapes; it can now use CDs, DVDs, and of course, Amazon S3, since someone has written an S3 driver using the Device API. My server platform, FreeBSD, does not yet include Amanda 2.6.x, but there is a port in the works, so I took the plunge and installed the testing port.

The Amanda S3 implementation emulates a tape changer, with each "tape" being a directory within an S3 bucket. Following the howto, I set up a changer.conf with the following configuration (I’ve changed the actual public key name for security):

multieject 0
gravity 0
needeject 0
ejectdelay 0
statefile /usr/local/var/amanda/changer-status
firstslot 1
lastslot 14

slot  1  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-01
slot  2  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-02
slot  3  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-03
slot  4  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-04
slot  5  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-05
slot  6  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-06
slot  7  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-07
slot  8  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-08
slot  9  s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-09
slot  10 s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-10
slot  11 s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-11
slot  12 s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-12
slot  13 s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-13
slot  14 s3:0KGS4MZZ7G0WZY4AA803-backups/s3/slot-14

Most of the other configuration items from the HowTo were replicated as-is.

Functional Testing Results

I found no major flaws in the functionality of backing up to S3 using Amanda; everything worked as advertised, to the credit of the Amanda development team. However, in addition to the technical pros and cons of backing up to the Internet, I wanted to analyze whether this is a feasible solution on a speed and cost basis.

Analysis of Backup Time

It’s immediately clear that backing up 40GB of data to the Internet using a consumer-grade DSL link is going to take forever:

Therefore this solution is not really suitable for home use. But even over a 10Mbps SDSL link, backing up 40GB still takes a while:

Analysis of Costs

S3’s pricing model is clearly described on their website and is a combination of actual storage cost, plus inbound/outbound bandwidth. (I will neglect the cost of requests for the purposes of this discussion as they are negligible.) Obviously, actual costs of backing up to S3 will depend on the frequency at which your data changes, and as such, the size of each backup.

For my use case, let’s assume a uniform backup size for each virtual tape ("slot") of 0.25 * 40 GB = 10 GB (in order to make the arithmetic easy). This results in:

10 GB/backup * 2 backups/week * 4 weeks/month * $0.10/GB = $8/month for data transfer

If the "virtual tape library" is fully populated with data, ongoing storage costs are:

14 tapes * 10 GB/tape * $0.150 GB/month = $21/month for storage

resulting in a total bill of $29/month.

Conclusions

What this exercise has proved to me is that S3 is a good backup option only if you have both a high speed link, and very little data to back up, not just because of the speed factor, but because the operating costs are high. It’s great that commodity storage is so cheap now that pseudo-HA storage can be had for $0.15/GB/month, but throw enough GB and months at it and you will still have a sizeable bill by the end. In this example, by not using S3, I will recoup my $400 investment in repairing the DLT drive in a little over a year — which is why Nortek has my drive for repair. I probably will end up using S3 to back up my office computer though; my home directory contains less than 10 GB and I probably only need to keep one or two copies for disaster recovery.

Write a Comment

Comment

  1. Hi Julian,
    I've gone through a similar calculation recently. Motivated by DLT drive dying too. I have a library and TON of tapes, but I am not sure it's worth repairing. Acanac dsl provider is offering 100GB or ftp space with their service. They are also providing virtual pc (account on their centOS box with with remote X). Both can be used for backup free of bandwidth or storage cost. DSL speed is still an issue however.

  2. I wound up sending the drive to the shop to try and get at least another year of use out of it before I switch to LTO. I think the outcome of this exercise is to validate that online backups (over a consumer-grade pipe) are only useful if you have a small amount of data. Comprehensive and frequent backups of your entire system are infeasible.

    I am, however, using S3 for backing up my office computer, since $WORK has such an enormous pipe. Consequently, it only takes me about 10-15 minutes to back up my entire hard drive. I estimate the monthly cost will be around $8-10.