
An introduction to deterministic builds with C/C++ - ingve
https://blog.conan.io/2019/09/02/Deterministic-builds-with-C-C++.html
======
W0lf
So, it's actually the first time in history of Conan where I personally think
it became a useful tool and am using it professionally in several projects for
my clients. It still has a lot of quirks though and one can easily get into
trouble when setting up dependencies across different platforms.

I've looked more closely into Conan several times in the past and also added
bug reports here and there, but it finally became somewhat usable (for me and
my projects at least). Also, I'm quite picky when it comes to CMake
integration, because I don't want my package manager to interfere with my
build system configuration. Conan finally reached a state where one can simply
check in a conanfile.txt in the repository and use the `cmake_paths` generator
to fetch all dependencies before calling CMake to generate a project for you.

I also decided to setup my own Artifactory server as the _official_ bintray
comes with a max. size limitation of 500 MB per artifact which is way too low,
considering that Qt5 with Core, Gui and Widgets statically built on Windows
(release build) easily outgrows this limit. However, once you have your
infrastructure setup, it's easy to add build flavors such as sanitizers etc.
for a given set of thirdparty libraries.

~~~
tbenst
Curious why you use Conan vs Nix? Reproducible builds and dependency
management is the Nix raison d'être.

~~~
lenkite
Nix only runs on Unix operating systems. Conan runs on windows also.

------
steeve
Bazel [1] gets a lot of this right. Everything the author recommends is done
automatically.

1\. [https://bazel.build/](https://bazel.build/)

~~~
nwlieb
In my experience this is only true for C/C++ (with a decent amount of work to
setup CROSSTOOL properly). As soon as you start to get into Python, and
Python<->C++ interop, it becomes very leaky.

I heard that Google has some tools internally that build the Python
interpreter with Bazel and use that in order to guarantee hermeticity, but
that doesn't seem to be possible with public tooling (at least not without
some major hacks, for example
[https://github.com/bazelbuild/bazel/issues/4286](https://github.com/bazelbuild/bazel/issues/4286)
)

It would be interesting to see how Google manages languages such as Python at
scale (and other languages that have similarly leaky package management).

------
dwheeler
Reproducible builds are extremely important. Most people use pre-compiled code
so it's important to have a way to verify that the pre-compiled code was
generated from the expected source code.

Lots more info is here: [https://reproducible-
builds.org/](https://reproducible-builds.org/)

------
gumby
We had a Cygnus customer back in the 1990s, DSC, whose customer SLA was
measured in minutes/decade (I believe it was less than three minutes of
downtime per decade). They paid extra for long term support on a specific GCC
release. When they submitted a bug report and got a fix they would examine the
binaries to make sure that every change in the binaries could be traced back
to that one patch and nothing else. No upgrades, general bug fixes, or
anything like that.

Ironically a few years later one of their customers had a multi hour outage
and that was the end of (quite a large company) DSC.

~~~
ferzul
how is that ironic? what actually caused the downtime?

~~~
gumby
Software bug in their code. The elaborate steps they too to avoid
unanticipated glitches were perhaps not worth he effort.

------
NickGerleman
Another benefit of deterministic builds is that build output caching becomes
more reliable. Microsoft has a system used for this that is heavily used
internally that was recently open sourced:
[https://github.com/microsoft/BuildXL/blob/master/README.md](https://github.com/microsoft/BuildXL/blob/master/README.md)

~~~
seanmcdirmid
Hasn’t google been doing C/C++ build caching for more than a decade now?

~~~
malkia
RBE has been mentioned in the other reply (for bazel, and other compatible
systems). For GN (Chromium, Fuchsia, others) - there is
[https://chromium.googlesource.com/infra/goma/server/](https://chromium.googlesource.com/infra/goma/server/)
and client.

------
Ididntdothis
I work in medical devices and we could use reproducible builds a lot. A lot of
tools will spit out different binaries at each run and it’s really hard to
justify why you think the source files you claim to have built from actually
were used to create the build.

~~~
yitchelle
>> why you think the source files you claim to have built from actually were
used to create the build

Wouldn't a good and robust configuration management plan overcome this
problem?

~~~
human20190310
How do you know your configuration management plan is doing what you think
it's doing if you're getting different output from the same source? It's hard
to tell if the sources of variation in the output are harmless or meaningful.

~~~
yitchelle
Well, if the target is to have the same binary from the same set of sources
every time you do a build, then the configuration management plan is not
working. A good configuration management plan is to ensure that things that
can change are managed. It is not working, it needs to be revised.

In some of the plans I have seen, some deviations are tolerated but those
deviations are spelled out in excruciating detail.

How have you dealt with this issue?

~~~
human20190310
I've dealt with it by wishing I had reproducible builds.

If I'm debugging a production issue and want to create a debug build from the
same source to step through things, it would be nice if I could just build
both debug and release, check the hash on the release to confirm it matches
production, then start debugging, without reference to any external records or
other systems.

Reproducible builds reduce the number of links in the chain needed to verify
what you're really doing.

~~~
bmm6o
You could just have debug boundaries be part included in the output of the
build.

~~~
Ididntdothis
Most of the time when somebody says “you could just do x” I stop listening
unless they have intimate knowledge of the situation.

~~~
bmm6o
I don't get it. Are you saying it's too hard or it doesn't solve the problem?
It's worked in my experience.

I mean, my response isn't entirely a rhetorical question. If there's a reason
it wouldn't work I'd be curious what makes his situation different from mine.

~~~
Ididntdothis
“I'd be curious what makes his situation different from mine.”

This should have been the first question to ask yourself before trying to give
advice.

~~~
bmm6o
So I should have said, "Is there a reason you can't just build debug binaries
at the same time as you build release"? I mean, I guess, but it's a little
disappointing that this is the nugget that you've been dancing around.

~~~
Ididntdothis
The world "just" is my pet peeve. Before you say "why don't you just xxx" make
sure you actually understand the problem thoroughly. Otherwise it's impolite
to make a suggestion to "just do ...". If there is one thing I have learned
it's that good devs make sure they really understand the situation before
making suggestions.

------
rwallace
TIL: Microsoft C++ does reproducible builds! I tried it just now and the
/Brepro flag works with the command line compiler.

~~~
mehrdadn
Compiler? or linker?

~~~
rwallace
It works if you just supply it to the compiler.

------
umvi
Is there a reason you can't just achieve deterministic builds using docker
containers as your build environment (which have the appropriate sysroot,
compilers, dependencies, etc. inside)?

~~~
knorker
Yes. Did you read about date, time, and actual calls to RNGs? And
nondeterminism of listdir? I don't think you actually read the article.

~~~
umvi
I did, but those seemed like obscure macros/corner cases almost no one uses.

I was more arguing for use of Docker instead of "Conan" or whatever tool they
were selling.

~~~
knorker
Docker won't help with the issues mentioned. I _still_ don't believe you
actually read it.

~~~
umvi
The article: "watch out for these super obscure corner cases that could impede
you from making deterministic builds. Use our tool to mitigate them!"

Me: "who cares about those corner cases, almost nobody would run into them.
99% of developers can achieve deterministic builds by standardizing
compilers/dependencies inside of a docker image and using that to build the
binaries."

~~~
knorker
I don't disagree that they seem to be pushing their tool for use cases that
are not needed.

Still, your "just do X" doesn't address the majority of the provided "sources
of variation".

------
attilakun
In the "The importance of deterministic builds" section they write:

> Security. Modifying binaries instead of the upstream source code can make
> the changes invisible for the original authors. This can be fatal in safety-
> critical environments such as medical, aerospace and automotive. Promising
> identical results for given inputs allows third parties to come to a
> consensus on a correct result.

I don't understand this argument, or rather what deterministic builds have to
do with this. Isn't modifying the binary instead of the source just a very bad
idea in a high stakes scenario?

~~~
e12e
Authors could publish official builds with a signature - but you can't build
the same binary, so you can't be sure published source and binaries match.

Say you checkout "libmagnificent" from github, browse the source and like what
you see. You can build a binary and see it matches the upstream build. This
gives you some (more) confidence in upstream builds.

It also lets you know you're starting from a known good source, if you want to
make modifications; the changes in the build come from the changes you made to
the source - not through some side effect in the build process.

------
dgellow
I can see the arguments in favor of deterministic/reproducible builds. What
are the arguments against it?

~~~
pjc50
There's no good reason to make builds deliberately nondeterministic, but it is
time-consuming to set up and maintain, and forces tooling choices along the
way.

I suppose it rules out including build date in the binaries.

~~~
dfgdghdf
You can make the date an input to the build and push it to the configuration.

------
wiineeth
So i'm planning to learn C++ to get into HFT industry can anyone suggest me
what all things do i need to learn to get into HFT as a software engineer?

------
malkia
FYI - The links are all broken - but still can be tracked down - it's just he
underlying link actually points back to the page.

------
patthebunny
I spent the better part of a year trying to get these switches and changes
working in windows when I was at MS.

Wasn't full time for a year, but it was a lot of "try to build, oops compiler
error contact them and then wait for a new compiler build. oh look found more
non-determinism".

The fun was when we changed the PE header and started getting pushback from
random teams.

