Hacker News new | past | comments | ask | show | jobs | submit login
Nocc – A Distributed C++ Compiler (github.com/vkcom)
62 points by gjvc 9 days ago | hide | past | favorite | 33 comments





Some other alternatives, also haven't hearded about this one:

  - SN DBS - https://www.snsystems.com/ - Used by a lot of game developers, to spread mostly compilation (but also shader compile, or custom jobs).

  - IncrediBuild - https://www.incredibuild.com/

  - Fast build - https://www.fastbuild.org/

  - icecream - https://github.com/icecc/icecream

  - Goma - https://chromium.googlesource.com/infra/goma/client/

  - Bazel / buck / like with various RBE back ends - https://bazel.build/remote/rbe

  - distcc - https://www.distcc.org/

  - ElectricAccelerator - https://docs.cloudbees.com/docs/cloudbees-build-acceleration/11.0/

  - and many others...
I've had mostly experience with IncrediBuild in the past, currently SN-DBS, but colleagues are looking into FASTBuild. Though my personal favorite is bazel

We discovered that while incredibuild did trap the system calls needed to stream and cache files from the building computer, at the time they did not trap Beep()!

ha! we've used it really long time ago, and there was something a bit non-standard with their C++ pre-processor, back then it wasn't working with one of our macros, and by the way they were displaying messages, and later searched these message in their binary I think I've found pascal-like strings (e.g. byte or word with length, then characters), so I thought it might've been written in Borland/Delphi Pascal?

From the README this tool has been built because distcc was too slow

distcc was really slow IME as well. It did not know how to optimally distribute the work across different build nodes. AFAIR distcc simply assumed that all build nodes are created equal and it also disregarded the load on the nodes resulting in over-saturation with build jobs.

However, this is something that icecream aimed to solve, and in my case it really did. I can't remember the actual numbers but it provided a major gain and it was super easy to set up.

So I guess it would be more nteresting to see how nocc compares to icecream instead of distcc.


Yes. This makes sense.

distcc vs ccache?

Just listing these without any form of comparison is useless.

Giving a comparison - e.g. evaluation - is a work paid hours. Also no one probably in their mind is ever going to do such an evaluation, or if it's done, it'll be one where you had to scratch one solution because of some limiting factor required by your company (but that factor may not exist outside - for example "support for Windows", or "works with precompiled libraries, does not expect everything to come from source code", etc., etc.)

word salad. write less but say more.

a useless comment lacking any kind of comparison

why don't you say what they meant to say but better, in less words?


because listing alternatives as a response is like saying "it also rains" when someone remarks. on how sunny it is. yes, it's true, but it's useless without discussing why it's preferable or not in what circumstances.

The funny thing about the name of the tool is that it's essentially named after a PHP linter I wrote while working at VK, called NoVerify. I did an internal poll for the best name, and noverify won, because the way to skip the checks done that linter was "git push --no-verify"...

Thank you for noverify! We used it at $DAYJOB in CI.

No problem! If you don't mind sharing, what are you using it for? I've built NoVerify primarily because the VK PHP code base was quite old-school, so it relied a lot on global functions, global constants, global variables, etc, so I needed a way to quickly make an index of all of what's defined, and it required pararallelism with shared memory which PHP doesn't support. It turned out to be quite fast in general as well, but it lacks more advanced features that other PHP linters have (mostly because I spent a few months on it vs. a few years for others :))

It looks like there are some infelicities documented that really should be fixed:

* fix that silly "got another sha256" bug, it really shouldn't be hard

* optimal job count should be infinite from the perspective of the client's `make`, but the daemon should only send jobs to the servers based on how many jobs they are willing to perform in parallel. Possibly split out the "upload blobs" part from the "actually run the build" part?

* You should detect compiler version differences, which is essential for reliability. This is nontrivial (wrapper scripts, internal headers) but good solutions exist if you're willing to hard-code what compilers are supported.


(2022)

They use KPHP which creates huge and a lot of files, similar to meta's php compiler or my B::C perl compiler in use at cPanel. In my case the outstanding work was to avoid compilation at all, split it up into many modules/libs, and only compile those which changed. The mnain advantage is instant startup and less memory. With many shared libs the startup advantage is less, but you can still use static libs and use a modern fast linker. And avoid -g and use -Os


Neat! I remember using [something similar][1] at a previous job.

[1]: https://github.com/icecc/icecream


That appears to only handle the distcc half. It looks like the main advantage of nocc is that it integrates the equivalent of ccache, in a way that works even when there are both multiple clients and multiple servers, without the ludicrous complexity of bazel-like tools.

Not true. icecream handles that case very well. Any machine in the network can trigger the build. And all the machines in the network can register themselves as build nodes. It also integrates with the ccache.

`ccache icecream` (like `ccache distcc`) only handles the "single client, multiple servers" case.

`icecream ccache` (like `distcc ccache`) only handles the "multiple client, single server" case.

`ccache icecream ccache` (like `ccache distcc ccache`) just means you're wasting even more disk space.

Now, if you configure `ccache` to use remote storage, and you can make it actually work, then that combination will compete with `nocc` (though admittedly it follows different models). But that's a lot more moving parts.


I am not sure I follow your argument. You said that icecream cannot work in multi-client multi-server case. I said that it certainly can, and it can. Perhaps you're arguing that in that multi-case scenario ccache is not usable? That I don't know but it definitely works when ccache is setup on the client machine and this client machine is using whatever number of server machines for the build jobs.

`ccache icecream` with default configuration means that the build servers have to duplicate work for the common case where multiple clients build the same artifact.

Now, maybe remote caching is reliable nowadays, despite its inevitable reputation (given that it wasn't designed for it). But it's still more moving parts, and the magic is in ccache not icecream too.


Did the authors also write their own operating system to avoid the “ludicrous complexity” of the Linux kernel? This project just looks like NIH syndrome and an unwillingness to read documentation.

Distributed C++ Build System? It's not a compiler, like it's stated in the readme, it uses g++..

Why not just use Bazel?

How do I do if I want to make something with bazel that is going to use libraries on the host computer? Say, I want to make an app that builds against system-provided Qt6, does Bazel finally allow this?

Haven't tried it, but you can always `cc_import` a locally installed library and expose it - though not sure how this works across the wire...

Then again, this might be all asking for trouble - what if your colleague has different installed version. Best is to have these artifacts (recompiled and/or in source form) some other way.


> though not sure how this works across the wire

You send all header files to 1,000 boxes. Same as nocc.

Oh, and if you really want, you can make Bazel ship your /usr/bin/gcc and co, too. It is just slow so nobody would like that. Or maybe in 2025 it does not feel slow any more (though still wasteful)?


I think at some point, but could be misremembering, IncrediBuild did local preprocessing, and then all you need on the other end is compiler. It might be still an option if pre-processing (for your own codebase) is insignificant vs compilation, but hard to say.

Though it greatly simplifies things with little massage (e.g. all these #pragma line - and make them deterministic using some path mapping, etc.)


> what if your colleague has different installed version.

"colleagues" does not make much sense for an open-source project built by people all around the world with various environments and where the tool should accomodate all of them


Right, I'm giving my thoughts based on where I work, and actually we do enforce the same compiler for everyone, even where it's installed, but still..

Would you say that this tool is required because cpp is slow to compile or is it still useful even with a module system (or whatever would make it a fast-to-compile-proglang) ?



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: