Hard Disk Crashes – Are You Prepared?

Via TechNation, The Podcast Network, probably the world’s first podcasting network, went offline on Saturday due to technical issues, i.e. crashed hard drive(s) on their dedicated server. As of now (Monday, 48 hours later), the site is still not back.

On Saturday Dec 20, the hard drive on TPN’s server suddenly died. We are in the process of restoring and re-building all of our sites and will have all of the shows back online asap.

Hard drive crashes — it’s not if but when, and when that actually happens, are you prepare for it? Especially when it keeps every file of your online business, how much down time can you afford to loose, and how much are you willing to pay to reduce the downtime?

If you do not run mission critical applications and have a very low budget (like me), here are some of simple things that you can do that do not cost a lot to implement.

1. Have a Contingency Recovery Plan

Even when you have top of line hardware with redundant power supply and RAID’ed disk arrays, there is still a possibility that you’ll loose all your files due to an accident (natural disasters, security breach, or fat fingered sysadmin typed in rm -rf /). So always having a recovery plan in mind might be a good thing. For those working on Amazon EC2, not having a persistent local storage is something taken from granted, and smart ways to preserve data and to provide fast server recovery sprung naturally. Maybe all sysadmins and website owners should have the same attitude.

It’s a good idea to come up and document a check list of “todo’s” when disaster happens, so you won’t miss out something during the panic.

Cost: $0 (but lots of thinking)

2. Backups — as Frequent as You Can Afford

It’s something that have been emphasised again and again — make sure you have backups! Moreover, NEVER rely on your hosting provider to provide backups for you, if you are on shared hosting or VPS. Because (1) you can never be assured that the exact files you want to be backup has been backed up (2) backups are usually within the same data centre (or even on the same computer God forbids), which is useless if access to that provider has been cut (3) it’s much faster to restore data if you can DIY instead of firing a few support requests.

How frequent is frequent enough? No less than once per day in my case. You definitely do not want to restore from a database that’s more than 2 weeks old. Some backup tools like rdiff-backup and rsnapshot also let you keep a few rolling backups, so you can have something like “daily backups from the last 7 days”.

For me, I use rsync to my DreamHost account’s backup user. Since I already have a DreamHost account, it does not cost me anything extra for backups 50GB or less. There are many alternate rsync backup storage providers. I have personally used BQInternet, they are pretty good at reasonable price and rdiff-backup is supported there.

Finally thing about backups — make sure you check the status regularly and make sure you can restore from backups. You might wish to run something on weekly basis to verify that everything you need has been backed up. Data restoration procedure should also be part of (1) Contingency Recover Plan — no point of keeping regular backups if you can’t restore them. Not just restoring — but restoring the data quickly on new servers in case of disaster, so your downtime is limited.

Cost: 10G — $5/month (BQInternet).

3. Serving Large Media Files from the Cloud

I don’t do podcasting nor video casting because I sounds crap and looks like an idiot in front of a camera. However in the case of The Podcast Network, they ought to serve their podcast MP3′s from those cloud-storage like Amazon S3/CloudFront or Mosso Cloud Files. If you have 100′s of GB of media files, it makes more sense to have them served directly from the cloud, which probably would have replicated your files multiple times already. Instead of restoring them from off-site backups (which can take weeks), your big media files can continue to be served on the clouds even when your main server has been rebuilt from scratch.

That means you can probably get away without backing up those files (which will make backing up/restoring much faster). Or just push them up to two different cloud service providers.

Cost: $0.15/GB/month storage + $0.10i/$0.17o/GB data transfer (Amazon S3).

4. DR Servers — Ready but not Deployed

Paying for one server can be expensive for some people, and having to pay for a live data redundancy server would be unthinkable. Building a DR solution that has two servers always in sync with each other is another complex topic to look at, and it will probably stay unavailable to most amateur webmasters.

However, it is still a good idea to have some providers that can instantly (or very very quickly, depending on how much in panic you are) provision a new servers when you need to execute your recovery plan. That’s where VPS shines — many virtual private server providers can instantly provision a server when you sign up, so your downtime is minimised.

Cost: $0 (but I recommend Linode and SliceHost)

5. Constant Server Monitoring

What we are trying to do here is to quickly re-deploy the backups when you realise that an unrecoverable disaster had happened to your website/server. But how do you know that a site is down? It happened to me before that my site was down for 10+ hours and I have only realised that it was down when one of my users emailed me. You should be the first one to be notified when your site is down, so you can quickly determine the cause and decide whether to execute your recovery plan.

There are many site monitoring services and I have used both Pingdom and Site24x7. Both provides good service. If you have quite a few sites/servers to monitor, then Pingdom might be the cheaper choice.

Cost: $9.95/month (Pingdom).

That’s pretty much the minimum you need to do if you have your business websites online. Any good tips on reducing the downtime cheaply?