Setting Up Part-Time Web Cluster with Amazon's EC2

Tagged in

Amazon Web Services What do you do when you have regular traffic spike? Say, for once a month, traffic increases 3 fold for 12 hours after your company sent out the monthly news letter? Your current web server barely copes with regular load. Do you go out to buy 2 more dedicated servers just for that 12 hours in a month? That wouldn’t be too economical paying 2 extra servers sitting there idling most of the time, wouldn’t it?

Judd Vinet of ArchLinux (one of my favourite, btw) has recently written an article to solve this very issue, Web clustering with Amazon EC2, where extra servers are hired from Amazon EC2 on part-time basis to serve surge in traffic. A semi-automated system has been built to make the task of “summoning new servers” much easier, and has been discussed in the article.

It’s quite a detailed write up with a lot of good information. Basically it is what Judd is trying to achieve.

  • 2 existing web server cluster using LVS. Use Pound for web balancer.
  • Start EC2 instances, and upon start up each instance will phone home to register itself.
  • Master server (where Pound is) will add new EC2 instances, and grant database permission to allow connection from EC2.

Seems to be quite a solid approach to scale up your web site instantly. Fire up a few EC2 instances before expected traffic surge, and shot them down after the storm — you are only paying for the time and bandwidth you have used!

At the end of the article Judd mentioned that a fully-automated system will be their next step, so they don’t need to manually activate/deactivate EC2 instances — they should just start/stop whenever the traffic builds up/slows down. Nice for unexpected traffic like Slashdotting or Digging.

Who said the dedicated server vendors shouldn’t be worried?

However, a few problems are also encountered during this exercise — web traffic load balancing and database access. Pound has been used in the example but I think other “web fronts” like Nginx should have no problem either. I have never used Pound but from the example given, a 2 second delay is needed to cleanly stop the old process before new one can be started. I guess it would be unnecessary if Nginx or Lighttpd is used, where you can gracefully reload the server without causing any delay.

However the biggest problem I have with the proxy approach is — which is actually an issue with EC2 itself — that you are effectively paying double the data transfer. You are paying Amazon for data transfer coming in or going out from EC2 at $0.20/Gb. At the same time you are paying whoever is hosting your reverse-proxy servers for traffic going out to your customers. Problem wouldn’t exist if you can run your reverse proxy right inside EC2, as traffic between EC2 instances are free. Except it is not trivial. Not until persistency and static IP are implemented anyway.

Database access over public Internet is also not something trivial to get it right. For a busy website, even 20ms latency between the web server and DB server can cause quite a noticeable degrading in performance. A locally replicated DB will definitely be much better. But somehow “part-time” “on-demand” “MySQL replication” all sounds oxymoron to me.

Conclusion? I guess that’s why EC2 is still in beta, and is still pretty much “developers only”. But no doubt it is full of potential.

Comments

Gravatar

Yes pay per view or pay per use model is on the anvil. That is the best model which will suit both big businesses as well as SME.

It is almost like the “congestion zone” in London. If you go to a congestion zone you pay. As simple as that.

Gravatar

EC2 can run an image doing the proxying and if you keep all the data on S3 you transparently grant access for each instance of the worker EC2 images. In a sense it somewhat locks you in to using S3 and EC2, but for some it really makes sense.

Otherwise, I agree, the bandwidth used by utilizing your own server as dispatcher could be expensive in itself.

Gravatar

Eric — If you are serving static files only, then you might as well doing a CNAME to S3 and serve data that way. I guess that problem we are trying to solve is, how to utilise the computational power provided by EC2 to scale up a busy website.

I’ve been following the EC2 developers forum on Amazon, and there seems to be some plans for renting a static IP. You can then just permanently run an instance of EC2 with static IP as load balancer, and start/stop other instances depending on the load. Hopefully we will see something like this happening this year.

Post new comment

The content of this field is kept private and will not be shown publicly.

More information about formatting options