Any non-overselling web hosting out there?

“Is there any non-overselling web hosting company out there?” It seems to be one of the most frequently asked questions on WebHostingTalk forums. There do exist a few typical answers. “Yes, my web host doesn’t!” “No, everyone oversells in this industry.” “Don’t buy from DreamHost/Site5/MediaTemple — they oversell like crazy!”

Overselling is indeed a very touching subject in web hosting. I previously wrote against it earlier this year when I first discovered this terminology. Then I read DreamHost and Site5‘s take on this issue, and was convinced that overselling is a “business strategy”, rather than a fraud or trickery.

However, with popularity of today’s dynamic database-driven web-based applications in hosted shared servers, “overselling” is no long as simple as “having less bandwidth/disk space than the sum of all users’ allocation”. I will try to take another shot on it in this article.

What is overselling?

First of all, can we define what is overselling? This is what Wikipedia has to say.

Overselling is a term used in web hosting to describe a situation in which a company provides hosting plans that are unsustainable if every of its customers uses the full extent of them. The term is usually referred to the usage of webspace and bandwidth.

I shall take Dan’s example (which, by the way, is excellent and has very good analysis on the pros & cons of overselling). If you are a web host with 80Gb hard disk, and you are selling more than 80x 1Gb storage web hosting accounts, then you are overselling! However, Dan’s definition differs slightly. He said in his opening paragraph:

Overselling basically means to sell beyond the means of delivery.

It seems that “overselling” is actually not just about the actual numbers, but the ability to deliver. Let’s take the following two cases:

  1. If a host promises almost-unlimited space and data transfer, but has always delivered what its clients have requested, is it still overselling?
  2. If a host does not “oversell” on paper, but still becomes unsustainable even before its customers uses the full extent of advertised resources, is it overselling?

Many web hosting customers are actually happy about (1) — as long as it delivers what I need, I do not really care. After all it is the very benefit of overselling — better utilisation on the resources so “some” customers can get more bang for the buck. Not everyone on your servers will get digged/slashdotted/farked at the same time. Not everyone is running a YouTube clone. But companies have advertised as though they can — to attract all those YouTube wannabes and those who love to have big buffers.

It is a fine business strategy by all means. Tried and tested by all types of service industries over the years, and they worked great — as long as resources are properly managed. A bit of statistics, planning and a big credit helps.

However, there is another case, where a web hosting service can become unsustainable even if it does not oversell advertised resources. How can that be the case?! Because running a server or farm of them is a complicated task, and the limiting resource can be much more than the usual advertised “disk space” and “monthly data transfer”.

What are the constraining resources?

Web hosting firms usually sell plans defined by a combination of disk space and monthly data transfer (or bandwidth, if you are on a unmetered dedicated server/VPS). While they can be easily understood by the customers, they do not represent the true picture of requirement to run a dynamic web site. There are many more types of resources that a web host needs to consider, when “server-side scripts” were introduced. Some of the hosts put a soft limit on these resources, but many seldom publicise about it. Thus it is quite possible to exhaust certain resources without exceeding advertised disk space/monthly transfer, even when they are small reasonable amount.

I shall give some examples. My cases would be a bit extreme, but they are there just to prove the point.

CPU time

CPU time on the server is by far the most obvious example of measurable resource that some hosts have actually started to advertise them and charge excess (MediaTemple Grid Server‘s GPU for example). Many content management systems without careful caching are bounded on CPU or IO, and you can run many “busy” CMS sites that bring a server to its knee without consuming too much bandwidth and/or disk space.

Or take this example — a web-based distributed.net client, written in PHP just to show off my 1337-ness! Not much data transfer is needed (input=key range + cipher text, output=hit or miss). Not much disk space is needed either (a 5k PHP file). However I can be sure that I will be the most unwelcoming customer if I run it 24×7 on a shared host.

Memory / RAM

Physical memory on a hosted server is also a finite resource shared by all accounts on that server. Servers using excessive amount of memory usually swap heavily, which degrades the overall performance (or entering into “spiral of death”). Hosting companies usually limit the amount of memory each single process by setting PHP’s memory_limit option or ulimit -v to prevent run-away CGI process.

But how often do you see web hosting companies explicitly state the maximum virtual memory size for a single FastCGI application? What about the maximum memory for all processes? Across the cluster? It gets more and more difficult to measure, monitor and enforce the policy.

Disk IO

Majority of database applications are disk IO-bound, and disk access can often be the biggest bottleneck on a shared hosting environment — your disk cache can be easily trashed when it tries to serve hundreds of independent websites. Try joining a few medium size SQL tables on non-indexed columns. Do it a few times. Now imagine half the people on your shared server doing the same thing again and again — you can easily stall a server without exhausting your storage and bandwidth, while your CPU is only mildly loaded.

However disk IO utilisation is difficult to quantify on shared hosting. Have you seen any web hosting company putting “thou shalt not do more than 50 block IO/sec” into their terms of services?

There are many others. Number of files (on some file systems large number of files can degrade performance), number of concurrent TCP connections (large number of slow connections -> many Apache processes -> bad), number of opened file descriptors, etc — they are all finite resources on the server. System integrators and developers would track them, monitor them and optimise their applications for them, but how often do you see hosting companies publicly advertise their limit on the usage of these resources? That would be insane. But as it is in their best interests to keep the servers running smoothly, they have to police the shared hosting accounts to prevent them hogging CPU/RAM/IO/etc.

And they don’t need to oversell their bandwidth and storage for the policing to happen. A host might still fail to deliver what it has promised because it oversold its CPU time, as “amount of CPU time” is never spelt out in the plan specification. Or maybe it is the amount of disk IO. Or the number of inodes.

So, is there any non-overselling shared hosting?

If “non-overselling” is strictly defined to mean disk storage and bandwidth, then the answer is probably yes. There are companies who claimed to be non-overselling to be distinctive. Some of them are doing it to limit the number of accounts on the server to reduce the chance where other resources might be constraining.

However,

  1. Is there a way to verify the claim? How do you know that they are in fact having only 100 accounts, instead of 500 accounts on the same server?
  2. Do you have an overselling upstream provider? How do you know that the bandwidth to the dedicated server hasn’t been oversold? Or the co-location service? Or the carrier bandwidth?
  3. Are they overselling the CPU, IO, and other server resources?

Another alternative to “non-overselling hosts” is “pay-only-what-you-have-used” hosts. For example, Amazon S3 which charges you according to the exact amount of disk space and data transfer you have used. NearlyFreeSpeech.net is another one that I am aware of who charges you on the amount you have used.

The biggest issue still remains. What about the resources that are not advertised? Can a non-overselling host guarantee you to use all the storage space + disk transfer, regardless how much CPU time and MySQL connection your apps chew?

Should companies advertise all possible limits?

Maybe they should, just to protect themselves from being accused as fraudsters and liars on WHT. However I don’t think it will ever happen in reality. Those numbers don’t mean anything to a regular web hosting customer, and everyone hates complexity. We want simplicity! Which one would you sign up to? The host with 10 numbers that you should not cross, or the host with two? Or better, the one with none, before it got shot dead with flaming arrows labelling “die! unlimited bandwidth!”

Okay, maybe I should wrap up

The more I dig into it, the more I realise the impossibility to not oversell in a shared hosting environment. Shared web hosting is in fact a tricky business, and keeping servers running smoothly requires both proactive (minimise overselling, plan ahead, clustering etc) and reactive (reduce bottleneck, fast provisioning to scale up, etc) approaches.

Life of being a developer that just writes the software is so much simpler 🙂

If I am signing up a shared hosting account today, these are the things I will seek after regarding overselling.

  1. $10/month for 50Gb or 2Tb does not bother me. I do not have enough popular sites to use 50Gb a month anyway, so a plan with smaller allocation might be more suitable. Then again, there is no guarantee they haven’t oversold their other resources.
  2. Check the size of the hosting company. 2Tb/month offered by a 100+ servers company is actually different from 2Tb/month offered by a 2-server one-man shop. Overselling actually works with larger numbers.
  3. Ask about their policy on other resources. I prefer if the company has set limits on resources like CPU time, MySQL connections and number of files. Better if those limits are publicly visible.
  4. Search around whether someone’s account has been suspended from that host, and for what reason. I actually prefer hosts that suspend their customers and can justify with numbers. They are the one taking reactive approach on server loads.

Well, that’s enough for today. More later 🙂