Common Sysadmin Mistakes Start-Ups Make

Via Hacker News, here is a blog post from cloudkick — Three sysadmin mistakes start-ups make.

While being a start-up with scaling issues is a sign that things are going well, sometimes such a small team does not have the expertise to make sure all their servers are in order. We wanted to share a couple pitfalls that we have helped diagnose in hopes to prevent other start-ups from doing the same thing.

It pointed out that

  1. Switching out your Apache with an alternative when you are in trouble is bad.
  2. Don’t use SQLite in production.
  3. fork(2) is one of the most expensive system calls (for some operating systems), and should not be used on every request.

The emphasis is mine, because I actually do not agree entirely on the argument.

Switching away from Apache, not when you are in trouble, but right at the start!

I agree that throwing out your well-tuned Apache configuration to switch to a light weight web server such as Lighttpd or Nginx, when the system load goes up exponentially due to excessive swapping is bad. When you are panicking you make silly mistakes, and the result is not just a slow site, but a non-functional site!

However, after testing out both lighty and nginx almost 3 years ago on this blog, I can’t see why a web startup is still stuck with Apache, unless they are restricted by the Apache modules that they must use (or stuck with shared hosting — huh for a web startup?!) And seriously, doing reverse proxy to Apache is trivial with both Nginx and Lighttpd so they can easily offload the static assets. Maybe using an alternate web server should be decided up front, instead at the point of panic.

SQLite is painfully slow in write

Something does not seem to be right here:

It is important to remember that sqlite is single flat file, which means any operation requires a global lock. Locks will inevitably cause points of contention if the database gets even remotely busy. On top of that, your web server will appear to peg the CPU when under load. This is because of all that contention around the lock.

Well.

  • SQLite requires an exclusive lock when a process tries to write to the DB (more about SQLite locking here), but it can be read by multiple processes concurrently.
  • When SQLite client tries to write when the file is locked, you’ll usually get a lock conflict exception. Many client libraries will do a wait with timeout.
  • Waiting on locks eats very limited CPU time — unless it’s the bottleneck on every process.

Still. You would not use SQLite DB as a session storage. You probably won’t even use it as backend to your CMS. However it is still an easy way to provide “configuration data” to your application, where data is mostly read from the DB.

Forking in app code is not as bad as you think

Of course not on a per-request level — which was why we have things like mod_php and FastCGI to replace CGI processes. Well — that’s if you are starting a new process like os.system('mv foo bar'), where you execute a shell, initialise the new process, and then execute the command.

On the other hand, Python commands such as os.fork() is actually very useful and relatively small overhead, as it basically just splits your current process into two. The new process does not need to perform any bootstrap operations — it’s already done! Python modules such as multiprocessing heavily use system call fork(2).

Really. It is not that bad if properly controlled (so it does not create too many processes and send the system to swap) and used in the right way. And if you are on a *nix-like system where you have fork(2). Sorry for the Windows-using brethrens 🙂