Jason Hoffman of Joyent has written an interesting article, Why EC2 isn’t yet a platform for “normal” web applications.
- No IP address persistence (they all function as DHCP clients and are assigned an IP). One has to use dynamic DNS services for a given domain.
- No block storage persistence. When the instance is gone, the data is gone. Yes I know you can send this back regularly to S3, but isn’t that actually a ‘hack’?
- No opportunity for hardware-based load balancing (which happens to be the key to scaling a process based framework like Rails and mentioned above).
- No vertical scaling (you get a 1.7Ghz CPU and 1 GB of RAM, that’s it). So like the block storage problem, this hits databases, we run about 32GB of ours in memory.
- Can’t run your own kernel or make kernel modifications so there’s no ability for kernel and OS optimizations, and no guarantee that they’ve been done.
- Images have to be uploaded and then moved around their network to find a launching point. This can take several minutes, if not more. Move 100 GBs around a busy gigabit network sometime and see.
Some points have already been raised multiple times before, like no IP address persistence, no local storage persistence, etc. These are exactly the same points which I had raised back in August last year. As of kernel modification in point 5, this issue is certainly not Amazon EC2 specific, Most VPS hosts won’t let you run your own kernel anyway. I do not think moving around AMI is an issue either in point 6. I know an EC2 instance can take a few minutes to deploy, but it is still far faster than deploying a dedicated server.
Jason did bring up a very interesting point though on the inability to scale vertically. The common myth around web-based applications is — scaling horizontally is easy, and you can just throw in more hardware to make it run faster. Or at least the PHP, RoR or “share-nothing” folks would want you to believe that. This is simply NOT TRUE, and I have to keep on reminding the sales guys at work that we just cannot throw in more instances to fix the scalability issue.
The easiest way to scale a database system is still running it on a BIGGER hardware. More RAM so database pages can be cached. Faster disks so IO wait can be reduced. That is if you do not want to go down to the path of database partitioning (lots of domain specific tuning), or spending lots of $$$ on Oracle RAC or DB2 (which are usually beyond the budget of most Web 2.0 startups).
I guess you just cannot get anything one-size-fits-all in this world.

Delicious
Digg
Reddit
Comments
Reality check for the sales guys aside, I don’t think it matters whether you scale vertically or horizontally, eventually you are going to run into the hard problems. I mean how much faster disks, processors, or additional RAM can you even get? That stuff gets pretty expensive and there is a limit to how far it can go. Meanwhile, “share-nothing” is usually a misnomer because you always have to share something (usually a database). However, depending on the nature of your application, you may able to cache a majority of your traffic, in which case throwing more cheap hardware at it is more effective than climbing up the exponential price scale of server hardware. “share-nothing” doesn’t solve the scaling problem for you, but it gives you a head start.
I would hardly consider anything “scaling” if it can still run on a single piece of hardware. The truly big sites are running orders of magnitude more traffic than you could do on even a modern supercomputer.
Anyway, I’m just arguing against the implication that vertical scaling is somehow inherently better than horizontal—sure it’s easier, but it also has a hard limit. As far as EC2 is concerned, obviously you’re going to want better control over your hardware, especially on the RAM aspect since you can often buy a huge performance gain with just a little more RAM.
Gabe,
When you can scale horizontally, it means the number of transactions, requests, or payload handled is proportional to the number of hardware nodes you have added into the cluster. Of course it is in the ideal situation, and as you have said, even the shared-nothing architecture shares database which does not scale horizontally easily.
How do you scale a database? You can either partition the data space (does not fit in many data modeler, domain specific designs required, does not scale down), clustering it (has synchronisation issue, does not work in all applications), buy sophisticated database engine (expensive, does not always guarantee to work), etc. Or sometimes it can be solved by simply throwing in more RAM and more CPU cores into the database server.
I am not saying vertical scaling is better. It is like the Dark Side of the Force. Quicker, easier and more seductive — you can quickly get result and make things run faster by upgrading the hardware. However there is a limit to it. Just that sometimes it is cheaper to get a better hardware, than hiring extra software engineers or DBA to scale it.
I don’t think Jason has a single valid point on his list. A “normal” website would never need more than a shared hosting environment. If we are talking about a site that gets a large amount of traffic and needs to scale I don’t think you can call that “normal” since most sites don’t fall into that category.
If you are going to use EC2 you should use more than one instance. Use DNS to round robin between the different IPs or use a frontend server outside of EC2 to redirect to instances that are currently up. You just need to think differently about needing block storage. For a wide number of applications there are very few writes that are critical. For critical writes there is the queueing service. There are other tricks in the bag outside of hardware load balancing, see #1. Again you just need to think differently. From what I’ve seen there are few people who do hard core tweeks to the kernel to get better performance. You can tweek the OS just not the kernel. This isn’t a valid point at all. There is a 10G limit on the size of an image and I bet in the majority of cases the image size is less than 2G.
A lot of these issues are brought up again and again because people want to use EC2 for a shared hosting replacement and I bet that wasn’t the use Amazon had in mind when they put it together.
I see this as FUD. You don’t need a $100K box and $100K of network infrastructure to do a lot of the stuff that people make you think you need that type of hardware for.
Sorry Carson but you’re incorrect about labeling what I said as FUD. My problem is with the marketing of EC2 as an “infinitely scalable solution”.
To address your points:
Maybe FUD is too strong but you seem to be trying to create a fear of using EC2. Where is EC2 marketed as and “infinitely scalable solution”?
There are a number of services out there that will do RR DNS and monitor your nodes for you. Most places find it acceptable to drop some number of users when machines fail. In most cases people have their entire system in one physical location regardless of load balancing so there is still a single point of failure. That is why I believe the only shortcoming of EC2 right now is that it is housed in a single datacenter and you have no control over where your node is created.
I should have added persistent part on to the end of that. You have two options:
Replicate your read data to each newly created EC2 as needed from something outside of EC2 or out of S3. They become a forward cache. Use queueing for your writes. Use S3 itself as a database and just think about your data differently. I’m not talking about sticking MySQL on top of S3 but instead using S3 as the database.
Stick a software LB in front of them on the same box like most people do. There isn’t much to it. Or use something different like java or php.
This seems to boil down to a database issue. I’m just saying you don’t need one mega databsae box but instead need to think differently on how you do your data storage.
Well you can still change things with sysctl. That opens the way for increasing shared memory size and other tweeks that can speed up userland applications.
I’m saying that the majority of images are way smaller than 10G. I’ve made images in the small 100Ms so this isn’t that much of an issue.
I realise this article is a couple of months old. Amazon just launched two new appliances, with 4 and 8 CPU cores and a stack load more ram. 8 CPU cores and 15Gb of RAM will definitely allow a database to scale to a reasonable size, although not as much as dedicated hardware obviously.
is it just me or is carson insane? i do not think it is just me.
I don’t think Carson is insane. If buying a hulking database server is the solution to all database problems, why did Google and Amazon take the trouble of inventing their own distributed computing environment, distributed file system AND distributed database systems? For small Web sites trying to grow quickly without complexity, swapping out a simple DB server box for a more powerful one will work fast and best. But for medium Web sites trying to become huge ones, this approach hits a wall quickly. Horizontal scaling via distributed systems is the only way to go.
Amazon now have a database specific web service, which probably negates the need to use EC2 to spin up a DB server. It seems like it would be more scalable too. I guess unless you really need a specific extension to mySQL or something then you should be fine using that.
My issue with all of these services is how do you get a physical backup of your database? I’d love to use these SimpleDB, for instance…but I would really be into it if you could pay for like a BluRay or tape backup to be mailed to you every 2 weeks or something. Otherwise, I’m kind of iffy on the whole thing.
For all of you concerned about persistent storage please read the blog post http://blog.mohanjith.net/2008/02/amazon-ec2-with-rock-solid-persistent.html. I was able to get harware node level data persistence. It’s worth looking at.
Post new comment