Jeremy Zawodny’s blog post from early last month has prompted me to look at offsite backup solutions again. Currently I am backing up all my websites, from various servers and accounts, to my home server using rsnapshot, running at 4am every morning. So far so good, and I loves the flexibility of rsnapshot. I guess if one of my server dies, it would be trivial re-populating another server, moving the DNS records, and start serving again.

Moreover, the cost of running a home server is in fact less than what Jeremy has calculated. My home server (a 1Ghz Duron + 3 smaller disks) uses less juice, but more importantly, it needs to be running anyway regardless whether I am using it to perform backups or not, as it also provides a few other services. Like, acting as a file server for my home network.

My home file server backup is another matter. Currently all files are sitting on an RAID1, with home directories rsync’ed to another drive on daily basis. But for big media files (my photo archives, home videos, etc), there is no live backup at moment.

I really need to backup those files. My daughter‘s photos and videos are way more valuable than all my websites combined. I need to have them on some storage, somewhere, as long as it’s not at home.

So I decided to gave Amazon’s Simple Storage Service a try, since there are so many good reviews about them.

I basically installed two clients — S3Fox and JungleDisk. One is a Firefox extension that lets me managing S3 buckets and files. The other one is a local WebDAV server that pushes data to S3. However, an evening of attempts later, I decided to give it up. No way I am going to use S3 to backup my data.

  • Middle-ware makes things “indeterministic”. I am trying to copy a folder of 1,000 photos to JungleDisk, so it can translate directory structure to S3′s schema, and upload the files for me. However, half way through I am getting a few internal errors from Amazon, but JungleDisk kept on going. Now, is that photo backed up? No sure. Having a middle-man like JungleDisk that translates WebDAV calls to S3′s service API calls does make things a little bit “indeterministic” sometimes.
  • Uploading is Slow. From Sydney Australia to Amazon’s servers, uploading seems to max out at around 30Kbytes/sec. That’s slow especially when you are considering backing up photos and videos. So I simply gave up without waiting for photos to upload.
  • Extra complexity. For the scripts and apps I have seen so far, everyone is trying to implement a filesystem or a pseudo file system on top of S3′s bucket system. Moreover, between products there is few compatibility. Directory tree uploaded by s3sync.rb cannot be understood by JungleDisk, for example.

I know it is cheap as you only pay for the amount you have used. I know it is cool to use a web services to back up your files. I know Amazon can really take care of my files securely. But there is just too much work for me.

So I ended up backing up my files to my DreamHost account, via scp and rsync. Since I already have a very under-utilising DreamHost account, so I am not really paying another $9.95 per month for an account. Moreover,

  • rsync and ssh is faster, much more reliable and easier to understand. They are also easily scriptable.
  • Uploading across the Pacific to DreamHost’s server in LA at 95Kbytes/sec — a big improvement over S3!

See my Cacti graph. I think it maxed out my ADSL2+ modem’s upload.

Traffic graph for uploading to DreamHost

Looks like I have just found myself an offsite backup solution.