[distcc] Using ccache on distcc server
Building large c-family project (C, C++, Object-C, etc) can be very slow due to long compiling time. There are simple way to improve compilation:
- ccache: Cache the compiled result, a.k.a object file locally, and reuse it when source code is not mutated. This gives huge improvement for incremental build.
- distcc: Distribute compilation tasks among a pool of machines (distcc server) via network. This scale up the build performance linearly to the number of machines you have (but still bounded by network and I/O).
People usually combine 1. and 2. with CCACHE_PREFIX
, as this article described. We use ccache for local cache and sends to distcc server if rebuild is necessary.
There is a further improvement we can do — also let distcc servers cache object files by using ccache! The flow will be.
(local) (remote)
ccache > distcc ==> distcc > ccache > hit? > return
> miss? > gcc -> cache & return
The concept is straightforward, but took me a while to get a workable solution. Before that, let me first introduce how ccache works: once you installed ccache, it overwrites PATH=/usr/local/ccache:$PATH.
This is path masquerade: when you run g++ main.cc -c
, the machine actually runs:
/usr/local/ccache/g++ main.cc -c
/usr/local/ccache
is a symlink to /usr/bin/ccache
. Ccache then search real compiler path in $PATH
, ex. /usr/bin/g++
. The command is eventually translated as:
/usr/bin/g++ main.cc -c
Before executing that command, ccache compares preprocessed output (#include injection, macro expansion..) with previous ccache tasks in local cache, and reuse it if possible. Otherwise, it will invoke the compiler to do the job.
So when we setCCACHE_PREFIX=distcc
, similar to above steps, ccache will expand the command, look up local cache, and invoke compiler (if cache miss) with prefix distcc
distcc /usr/bin/g++ main.cc -c
The distcc
command here will then distribute compilation task to remote servers. On distcc server side, it will receive this command and execute it:
/usr/bin/g++ main.cc -c
However, the above command does not utilize ccache on server side because it just plainly invoke the compiler but not ccache. We don’t want distcc server receives/usr/bin/g++ main.cc -c
, we want it to receive:
/usr/local/ccache/g++ main.cc -c
This means that we need to force distcc to use ccache anyway, even without ccache
in command prefix. Is that achievable? Yes, we can use flag DISTCC_CMDLIST
to achieve that. From its man page:
DISTCC_CMDLIST
If the environment variable DISTCC_CMDLIST is set, load a list of supported commands from the file named by DISTCC_CMDLIST, and refuse to serve any command whose last DISTCC_CMDLIST_MATCHWORDS last words do not match those of a command in that list. See the comments in src/serve.c.
This allows distcc server to map a compiler path to another path:
/usr/bin/g++ -> /usr/local/ccache/g++
Bingo! That’s what we want. Here are the instructions:
1. create a file /home/.distcc/DISTCC_CMDLIST
:
/usr/local/ccache/g++
2. in /etc/default/distcc
, add theselines:
export DISTCC_CMDLIST=/home/.distcc/DISTCC_CMDLIST
export CCACHE_DIR=/home/.ccache
export PATH=/usr/local/ccache:$PATH
Line 1 tells distcc server to use DISTCC_CMDLIST file for the mapping.
Line 2 is necessary, to make sure child processes spawned in distcc know where the ccache directory located.
Line 3 tells distcc server to use ccache compiler masquerade
3. change ccache directory permission
sudo chmod 777 /home/.ccache
4. restart distcc server
sudo /etc/init.d/distcc restart
Good to go!
Now we get the benefits of cache, not just on local machine, but also on distcc servers. Caching prebuilt result is so crucial for heavy c-family project since compiling is so expensive: even a simple helloworld program can takes 20 ms to compile. Using ccache only need 1ms :)
#include <iostream>
using namespace std;
int main() {
cout << "hello world" << endl;
}