Hacker News new | past | comments | ask | show | jobs | submit login
Working with jumbo/unity builds in C/C++ (austinmorlan.com)
65 points by sph 3 months ago | hide | past | favorite | 44 comments



Unreal Engine does unity builds quite well - it will, as part of a prebuild step:

1. Merge related cpp and h files together in groups into monolith files, usually in the order of 10-20 source files merged based on my observations - often entire modules will be grouped together. 2. Exclude any individual files from the monolith that have edits since the last change.

It's a nice middle ground, you don't substantially slow down your incremental builds since the files you're editing are still compiled individually, plus you get a fairly substantial improvement in build times.

It's nice in that you can still structure your source in separate cpp/h files will little extra consideration for the unity builds and it mostly "just works" in both unity mode and with regular builds.

Unfortunately you occasionally do have issues with builds that work in unity but not in individual compilation or vice versa (usually due to arcane #include dependency chains) but they're usually easy to fix.


Unreal's unity builds get very annoying as soon as your teams exceeds a handful of developers. You often have code that has builds fine on a machine but doesn't on another developer - it completely depends on what order the UBT picks. This is even more annoying when you have some people on Windows, some on Linux, some on Macs, and they consistently get different results. The fact that most tools (Intellisense, all clang-based tooling, ...) are broken with UE makes automating the detection of missing includes even harder.

We worked around this by introducing a mandatory pre-merge CI stage that constantly does non-unity builds - but it's costly and not something a small company can often afford (a non-unity build of our UE project is ~20min on a Linux runner, and way more on a Windows one. That adds up fast).

Unreal itself hasn't been non-unity buildable for a very long time. In general IMHO Unity builds are a testament to the failure of the C++ standards committee to realise that modules and the building model should be part of the standard to. The "one file is a translation unity" hasn't been adequate for years IMHO - I honestly appreciate how Rust basically imposed cargo as a standard, it was a hard but sane choice.


I agree with your point on the C++ standards, but disagree about your complaints with Unreal's unity issues. There are issues with non-reproducibility in theory, but I've worked on massive projects and the impact is minimal.

> We worked around this by introducing a mandatory pre-merge CI stage that constantly does non-unity builds - but it's costly and not something a small company can often afford (a non-unity build of our UE project is ~20min on a Linux runner, and way more on a Windows one. That adds up fast).

Or you could only build the files that have changed. Even the largest of large files are one minute compiles.

> Unreal itself hasn't been non-unity buildable for a very long time.

Unreal builds in non-unity just fine. There was definitely a time period where it _didn't_, but for the last few years it's been much better than that.


> the impact is minimal.

yes, I must admit it's not the end of the world, but it tends to add an extra layer of gotchas on top of our already crufty legacy codebase (millions of lines of code, >100 devs on 3 different platforms). Which is something I'd rather not have to deal with, honestly.

> Or you could only build the files that have changed

In my experience the UBT is somewhat inconsistent with that. I've seen multiple times for instance Mac builds being borked on our main branch because a Windows developer pushed code that built absolutely perfectly on their machine.

> Unreal builds in non-unity just fine. There was definitely a time period where it _didn't_, but for the last few years it's been much better than that.

last time I tried 5.1 refused to build non-unity on Linux. Did they fix it with the version after that?


May be worth rethinking your module arrangements - unity builds have been stable for a while.


Sadly our codebase is full of legacy nonsense - which makes modularizing it a chore to say the least.


Have you considered adding something like ccache to your compiler to speed up your CI?


Really? That hasn't been my experience at all. What engine version are you on?

We target windows, linux, xsx and ps5 and have probably 15-20 programmers making contributions daily and probably only hit maybe one or two of these issues per week, and they rarely get checked in as we get them during our mandatory preflight build during code review, similar to what you describe. We run all the preflight on on-prem machines now so the cost is minimized compared to our former cloud solution.

We did a lot of work to modularize our codebase so maybe that is helping?


> We did a lot of work to modularize our codebase so maybe that is helping?

Almost definitely the case. We have 100+ devs on a >1M legacy codebase with a lot of cross dependencies, and it's incredibly troublesome without a strict pre-build stage on CI (on premise of course, it's too expensive to run all of that on AWS)


You mean it does unreal builds.


IMO separation of interface and implementation is one of the "good" things about C/C++ (I used to be confused about it when starting out), it gives you a good overview of a piece of code and how it is supposed to be used. In other languages with no such "feature", you'll have to scroll hundreds of lines of implementation details you don't care about to understand the interface. You say this as a possible solution:

> You can still use header files if you want to, they’re just no longer strictly necessary. You’re free to put struct definitions and function prototypes into a header file if you’d like.

But is it really? not enforcing header files across the codebase means that you'll definitely end up with some inconsistency sooner or later that will be hard to deal with.

> The order that you include the source files in all.c matters. In the above example, bar.c had to be included before foo.c because foo.c used a struct and function that was defined in bar.c.

This is just additional overhead that you don't need while implementing something new.

And in general, this goes against how normally C/C++ codebases are structured, I'm sure I'll be hella confused about a file called `all.c`.


I worked at a company which had a Unity builded codebase. The company had actually branched the codebase for two separate products. One of the products kept the .h/.cpp build running in CI in addition to the Unity build, the other product only used the Unity build.

My experience was that there was decent benefit to keeping the .h/.cpp build running. Most 'normal' C++ tools and IDEs are not going to assume you are using a Unity build and tend to choke on it. Even though we never really shipped anything from the non-Unity build, having it around was useful for avoiding 'phantom' errors in the IDE and having static analysis tools work properly.


Sqlite calls this an "amalgamation" build

Details here: https://sqlite.org/amalgamation.html

It's also touted as an easy way to embed sqlite.


Unity builds are strictly slower in the most common case - changes to just a few files in a project that has been built previously. This so why Google developed the Blaze build system.


Getting rid of header files for “jumbo builds” is, imho, a bad idea. “I don’t want to update two places” is, imho, insufficient reason. Thankfully this is optional.

Combing many cpp files into a single translation unit is a good idea. More projects should do this. Infact I’d go so far as to say that most non-trivial, popularC++ projects on GitHub could and should probably be boiled down to a single translation unit.

The amount of redundant compiling in C++ is insane. Any project that uses STL is compiling the same crap over and over and over and over and over. Then counting on the linker to deduplicate the billions of cycles of wasted work.


I don't undestand why there's still no solution for this in STL. Surely compilers like GCC could support a setup where STL instantiations are cached somewhere? If it was just one extra compile flag telling compiler to cache all template instantiations in some directory however it pleases, that would solve more than STL problems.


Modules, standardised in C++20, are the official solution both for the standard library and for other code but they are not universally well supported by all major compilers and build systems yet.

Transitioning existing code to use modules is also not entirely straightforward, though probably no more problematic than introducing unity builds.


> though probably no more problematic than introducing unity builds.

A "Unity build" really just means typing #include "foo.cpp" a few times. It's trivial.

Meanwhile, neither Clang nor GCC support standard library modules. They have only partial support for modules themselves. C++ module support is non-existent in almost all build systems. https://en.cppreference.com/w/cpp/compiler_support

The idea of C++ modules is great. It's badly needed. In practice I'm not sure if they're ever going to be genuinely functional and widespread. Which makes me sad. Toy projects don't count.


It's been a lot of chicken-and-egg, but C++ modules finally have momentum. FWIW, Clang does support `import std` with `libc++` now. While I've implemented CMake support for modules, I've worked on laying the groundwork for compilers to be able to provide the required information (P1689) and I've been involved in advocating support from a number of build systems (e.g., Bazel, xmake, Meson, Tup). Some have been more receptive and made progress; others have had less.

Yes, it's very late, but progress is being made.


> It's unfortunate that "Unity Build" is the prevailing term because it's impossible to do a search without getting a lot of results about the Unity game engine.

I ignored this article on the front page because of that. Only because it stayed on the front page for several hours did (and because I care about C++ build issues) did I eventually click on it.


> and because I care about C++ build issues

Look no further, these builds will give you more than enough issues on any sizable project.


Yes, glad I eventually clicked on it!


Unfortunately this isn't consistent with how these terms are typically used. This article only explains the simple case where your program is small enough that you can compile all source files at once. That's not realistic or practical for much larger projects.

Both WebKit and Chromium support unity/jumbo builds. They combine around 20 source files at a time into a single compilation unit, which provides a reasonable tradeoff - making the full build noticeably faster without overflowing RAM and without making the cost of recompiling after a single change too large. You also get lots of parallelism.

Making a unity / jumbo build work for a project with 10,000+ files and millions of lines of code is not simple at all.


Firefox uses “unified” builds, concatenating .cpp files only within each directory. Unified Firefox builds are about 2-5x faster on my Mac.

Non-unified builds are still built to make sure there are no unexpected side effects or accidental header file dependencies.

https://firefox-source-docs.mozilla.org/build/buildsystem/un...


Firefox's unified build was discussed here:

https://news.ycombinator.com/item?id=35825683 - Unity builds lurked into the Firefox Build System (2023)

The page that was referenced by that thread has moved, current location is

https://serge-sans-paille.github.io/pythran-stories/how-unit...


WebKit calls them that too.


> Making a unity / jumbo build work for a project with 10,000+ files and millions of lines of code is not simple at all

For years we rolled our own unity build, but now CMake supports it directly through CMAKE_UNITY_BUILD and CMAKE_UNITY_BUILD_BATCH_SIZE, making it straightforward to enable.

When first enabling it on a large project, you'll run into clashes where different files contain identically-named file-scoped function or variables. Sometimes this reveals copy-pasted code, where the fix is to refactor the duplicate code anyway. Other times you just pick more specific names to avoid the clash.

We find unity build gives a solid 3X build speedup. We haven't eliminated header files, and in fact keep one slow CI job building the code without unity to ensure our code still builds either way.


> Chromium support unity/jumbo builds

used to https://groups.google.com/a/chromium.org/g/chromium-dev/c/DP...


  Making a unity / jumbo build work for a project with 10,000+ files and millions of lines of code is not simple at all.
True, but the number of such large projects is tiny (< 100?) compared to the tens of thousands of other smaller projects that might benefit. Good writeup.


I bet there are many more projects than 100. What makes you think the number is that low?


I worked at companies that had more than 100s of code bases each over a milion lines of c++.


One issue that can happen that the article didn't mention is running out of AST node identifiers in Clang.

Clang uses a 32-bit number to identify AST nodes. If a single translation unit is large enough, it can overflow this and you can get some very weird compilation errors.


Unless your amalgamated source file is getting into the 100s of megabytes to gigabytes range, I doubt that would be an issue.


I suspect extensive use of Boost-style template metaprogramming is a great help in making this goal realistic.


If template instantiations use up AST node identifiers (I don't know whether whether they do), I can see scenarios where crazy amplified template instantiation chains can eat substantial chunks of that range.


They do.


Linked article describes manual approaches that create less reusable code.

Luckily there are unity build tools that do not require manual changes to code and work with normal .h/.cpp files automatically.


This is easily the best overview of unity builds in C/C++ that I've seen. I'll definitely save and reference it in the future.


I agree, I'm glad I read the article. Seeing the title jumbo/unity builds thinking this was not for me, but I'm doing some small C++ stuff, so I might try it out.


It's only faster if you are building from scratch, and even then it might not be faster. The resources required to build one huge compilation unit are also higher than compiling separately. If you support this type of build, it should not be the only option.


It is always faster in all cases from my experience. It should not, but it is. Try.


This is silly. If I’m only rebuilding 1 file out of a 200 file source base, I guarantee the non-unity build will be faster. Just the number of characters I have to tokenize in this thought experiment should be enough to convince. If this isn’t the case then you’re doing some sort of n^2 c++ boost every header includes most other headers shenanigans and you need to stop that rather than doing a unity build.


I've worked with unity builds a lot in my career. Rebuilding a unity module can be 90s-2minutes (plus linking), whereas a single file might only be 3-5 seconds on its own.

The best possibility is if your build system can detect which files have changed and exclude them from unity builds, as this gives you the "slow path" the first time you change a file, but from then on you get the fast path behaviour.


> You don’t have to literally have only one translation unit.

Having 2 units per cpu core in your machine is a much better idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: