Compiling with DistCC on low memory VPS

Tagged in

Two months ago I had written an article on tuning GCC parameters for compilation under low-memory VPS, by limiting the maximum memory it can use, which is often useful on user-bean-counter based VPS’s (OpenVZ and non-SLM Virtuozzo). It worked in most cases on my 256Mb VPS at VPSLink, but in some circumstances the packages just won’t compile.

Last night I was re-compiling PHP 5.1.6 and tried to force the maximum heap to be under 8Mb, and it turned out GCC was trashing the heap space so much that it took more than an hour trying to compile one single file! The CPU was sitting at maximum, and lucky I haven’t yet heard about the complains from VPSLink. I guess GCC was probably spending all its time allocating and freeing the memory instead of compiling the source. Limiting the maximum memory just does not work.

So in this article I am looking at another option of compiling source tarballs with GCC on a low-memory VPS — distcc: a distributed C/C++ compiler. We are going to look at distributing the compilation jobs to your own desktop computer.

From its website:

distcc is a program to distribute builds of C, C++, Objective C or Objective C++ code across several machines on a network. distcc should always generate the same results as a local build, is simple to install and use, and is usually much faster than a local compile.

Getting a faster compilation was the designed purpose of distcc, however it wasn’t what we are trying to achieve here. We are hoping to migrate the compilation duty, the very task of calling gcc to another computer that is not constrained by lack of memory and all that UBC silliness, and distcc is a perfect fit for this mission.

In theory you can add any Linux/Unix computer you have access to to this “compile farm” for distcc. For example, you might have a dedicated server somewhere or another VPS that can be used for compilation. However, in this article I am going to used the very desktop computer, the one you usually use to remotely control your VPS, as the “compilation box”. It is something I am sure everyone would have, and it’s also running Linux, right? :)

In my case, my “compilation box” is a Gentoo Linux virtual machine running inside VMWare on my Dell Latitude D600 notebook running Windows XP Pro. It worked out quite well, and have successfully compiled PHP 5.1.6 and many other packages last night.

Getting Started

These are taken out from Gentoo’s distcc documentation.

First of all, make sure distcc is installed on both your VPS and your “compilation box”.

compilebox ~ # emerge distcc
...
vps ~ # emerge distcc

Start distcc daemon on your compilation box. We also want to force connection only from localhost. More about it later.

compilebox ~ # echo '127.0.0.1' > /etc/distcc/hosts
compilebox ~ # /etc/init.d/distcc start

Enable “distcc” feature for Gentoo portage on the VPS.

vps ~ # echo FEATURES="distcc" >> /etc/make.conf

Now distcc has been set up on both VPS and your compilation box, i.e. your desktop. Doing any emerge on VPS now will call distcc instead of gcc. However, how does VPS know where the compilation request to send to? How does it know the IP address of your desktop box? What about firewall?

All these can be easily solved with a bit of SSH tunnelling.

Setting up the Network

The thing we are trying to set up is, asking distcc client on VPS to connect to the distcc daemon on your compile box through SSH tunnel. First of all, we’ll add localhost to the cloud of compilation farm on VPS. Not that we want to run distcc daemons on our VPS, but we’ll redirect all connection to that port back to our compile box via SSH tunnel.

vps ~ # distcc-config --set-hosts "127.0.0.1"

We will now connect to VPS from compile box with extra parameters to enable tunnelling. We’ll forward all connection to localhost:3632 on our VPS to localhost:3632 on our compile box.

compilebox ~ $ ssh -R 3632:localhost:3632 vps

That’s it! Emerge something big on VPS now (remember don’t use any memory-restriction parameters). Provide there’s sufficient bandwidth between VPS and compilebox, you can get glibc, PHP or MySQL compiled on your VPS without running into memory or performance issues.

vps ~ # emerge -avD dev-lang/php

You should be able to see distcc being used in place of gcc, and a netstat would show connections to localhost:3632. You can also use distccmon-text on VPS to monitor the compilation progress. I am compile ncurses in this example:

vps ~ # DISTCC_DIR="/var/tmp/portage/.distcc/" distccmon-text 3
 11741  Preprocess  newdemo.c       127.0.0.1[0]
 11773  Compile     railroad.c      127.0.0.1[0]
 11789  Compile     rain.c          127.0.0.1[0]
 ...

You can also add the SSH remote forwarding arguments to your SSH configuration file so port 3632 is always forwarded when you connect to your VPS. Add these to your ~/.ssh/config

Host vps
    RemoteForward localhost:3632 localhost:3632

So now you can just do ssh root@vps emerge world :)

It is equally easy to set up if you are compiling stuff not in portage (or are unfortunate enough to use other Linux distributions).

vps ~ # tar zxf src.tgz
vps ~ # cd src
vps src # CC=distcc ./configure
...
vps src # make && make install

Other Issues

While using distcc enables me to compile heavy packages on a low-memory UBC-based VPS, it might not really speed up the process if your desktop/compile box is slow. It also highly dependant on bandwidth. It took around an hour to compile PHP last night, and distcc is constantly pushing at around 100kbps from VPS to compilebox, and around 15kbps the other way around. It’s a good idea to turn on compression on SSH to save traffic. As my VPS is thousands of miles across the Pacific, the latency might also slow down the compilation.

Another issue I found with distcc is cross platform compilation. It is possible, but a little bit more work to set up in order to have my notebook compiling packages for my VPS on SliceHost, which runs 64bit Linux.

Finally, you cannot use distcc to compile GCC. GCC uses two stage compilation. In the first stage, it will use host’s existing compile (distcc) to compile a new version of GCC. In the second stage, it will use the newly compiled GCC to compile itself again. That means while distcc would help in stage 1, it is still possible to run out of memory/privvmpages in stage 2. Actually so far I have not worked out a way to compile GCC 4.1 on my 256Mb non-burstable VPSLink account, other than building the binary package elsewhere first.

Despite all the short-comings, I’ll still consider distcc a useful tool on memory-limited VZ VPS, especially if you use Gentoo and compiles a lot.

Post new comment

The content of this field is kept private and will not be shown publicly.

More information about formatting options