[distcc] Using ccache on distcc server

Wilson Hong
3 min readFeb 10, 2018

--

Building large c-family project (C, C++, Object-C, etc) can be very slow due to long compiling time. There are simple way to improve compilation:

  1. ccache: Cache the compiled result, a.k.a object file locally, and reuse it when source code is not mutated. This gives huge improvement for incremental build.
  2. distcc: Distribute compilation tasks among a pool of machines (distcc server) via network. This scale up the build performance linearly to the number of machines you have (but still bounded by network and I/O).

People usually combine 1. and 2. with CCACHE_PREFIX, as this article described. We use ccache for local cache and sends to distcc server if rebuild is necessary.

There is a further improvement we can do — also let distcc servers cache object files by using ccache! The flow will be.

(local)                  (remote)
ccache > distcc ==> distcc > ccache > hit? > return
> miss? > gcc -> cache & return

The concept is straightforward, but took me a while to get a workable solution. Before that, let me first introduce how ccache works: once you installed ccache, it overwrites PATH=/usr/local/ccache:$PATH. This is path masquerade: when you run g++ main.cc -c, the machine actually runs:

/usr/local/ccache/g++ main.cc -c

/usr/local/ccache is a symlink to /usr/bin/ccache. Ccache then search real compiler path in $PATH , ex. /usr/bin/g++. The command is eventually translated as:

/usr/bin/g++ main.cc -c

Before executing that command, ccache compares preprocessed output (#include injection, macro expansion..) with previous ccache tasks in local cache, and reuse it if possible. Otherwise, it will invoke the compiler to do the job.

So when we setCCACHE_PREFIX=distcc, similar to above steps, ccache will expand the command, look up local cache, and invoke compiler (if cache miss) with prefix distcc

distcc /usr/bin/g++ main.cc -c

The distcc command here will then distribute compilation task to remote servers. On distcc server side, it will receive this command and execute it:

/usr/bin/g++ main.cc -c

However, the above command does not utilize ccache on server side because it just plainly invoke the compiler but not ccache. We don’t want distcc server receives/usr/bin/g++ main.cc -c, we want it to receive:

/usr/local/ccache/g++ main.cc -c

This means that we need to force distcc to use ccache anyway, even without ccache in command prefix. Is that achievable? Yes, we can use flag DISTCC_CMDLIST to achieve that. From its man page:

DISTCC_CMDLIST

If the environment variable DISTCC_CMDLIST is set, load a list of supported commands from the file named by DISTCC_CMDLIST, and refuse to serve any command whose last DISTCC_CMDLIST_MATCHWORDS last words do not match those of a command in that list. See the comments in src/serve.c.

This allows distcc server to map a compiler path to another path:

/usr/bin/g++  -> /usr/local/ccache/g++

Bingo! That’s what we want. Here are the instructions:

1. create a file /home/.distcc/DISTCC_CMDLIST :

/usr/local/ccache/g++

2. in /etc/default/distcc, add theselines:

export DISTCC_CMDLIST=/home/.distcc/DISTCC_CMDLIST
export CCACHE_DIR=/home/.ccache
export PATH=/usr/local/ccache:$PATH

Line 1 tells distcc server to use DISTCC_CMDLIST file for the mapping.
Line 2 is necessary, to make sure child processes spawned in distcc know where the ccache directory located.
Line 3 tells distcc server to use ccache compiler masquerade

3. change ccache directory permission

sudo chmod 777 /home/.ccache

4. restart distcc server

sudo /etc/init.d/distcc restart

Good to go!

Now we get the benefits of cache, not just on local machine, but also on distcc servers. Caching prebuilt result is so crucial for heavy c-family project since compiling is so expensive: even a simple helloworld program can takes 20 ms to compile. Using ccache only need 1ms :)

#include <iostream> 
using namespace std;
int main() {
cout << "hello world" << endl;
}

--

--

Responses (2)