Amazon Web Services What do you do when you have regular traffic spike? Say, for once a month, traffic increases 3 fold for 12 hours after your company sent out the monthly news letter? Your current web server barely copes with regular load. Do you go out to buy 2 more dedicated servers just for that 12 hours in a month? That wouldn’t be too economical paying 2 extra servers sitting there idling most of the time, wouldn’t it?

Judd Vinet of ArchLinux (one of my favourite, btw) has recently written an article to solve this very issue, Web clustering with Amazon EC2, where extra servers are hired from Amazon EC2 on part-time basis to serve surge in traffic. A semi-automated system has been built to make the task of “summoning new servers” much easier, and has been discussed in the article.

It’s quite a detailed write up with a lot of good information. Basically it is what Judd is trying to achieve.

  • 2 existing web server cluster using LVS. Use Pound for web balancer.
  • Start EC2 instances, and upon start up each instance will phone home to register itself.
  • Master server (where Pound is) will add new EC2 instances, and grant database permission to allow connection from EC2.

Seems to be quite a solid approach to scale up your web site instantly. Fire up a few EC2 instances before expected traffic surge, and shot them down after the storm — you are only paying for the time and bandwidth you have used!

At the end of the article Judd mentioned that a fully-automated system will be their next step, so they don’t need to manually activate/deactivate EC2 instances — they should just start/stop whenever the traffic builds up/slows down. Nice for unexpected traffic like Slashdotting or Digging.

Who said the dedicated server vendors shouldn’t be worried?

However, a few problems are also encountered during this exercise — web traffic load balancing and database access. Pound has been used in the example but I think other “web fronts” like Nginx should have no problem either. I have never used Pound but from the example given, a 2 second delay is needed to cleanly stop the old process before new one can be started. I guess it would be unnecessary if Nginx or Lighttpd is used, where you can gracefully reload the server without causing any delay.

However the biggest problem I have with the proxy approach is — which is actually an issue with EC2 itself — that you are effectively paying double the data transfer. You are paying Amazon for data transfer coming in or going out from EC2 at $0.20/Gb. At the same time you are paying whoever is hosting your reverse-proxy servers for traffic going out to your customers. Problem wouldn’t exist if you can run your reverse proxy right inside EC2, as traffic between EC2 instances are free. Except it is not trivial. Not until persistency and static IP are implemented anyway.

Database access over public Internet is also not something trivial to get it right. For a busy website, even 20ms latency between the web server and DB server can cause quite a noticeable degrading in performance. A locally replicated DB will definitely be much better. But somehow “part-time” “on-demand” “MySQL replication” all sounds oxymoron to me.

Conclusion? I guess that’s why EC2 is still in beta, and is still pretty much “developers only”. But no doubt it is full of potential.