Outtake: "With reproducible builds, multiple parties can redo this process independently and ensure they all get exactly the same result. We can thus gain confidence that a distributed binary code is indeed coming from a given source code."
For example, debug information for several languages I've worked with embed the full path to where the source code file was originally compiled. This can easily vary across two machines in a way that when debug information is turned on, the builds are different, but when turned off they are the same.
however debug symbols and "debug mode" are two different things.. at least in debian packages are generally compiled with debug symbols on, then separated into a separate package automatically and stripped into the final package as a separate step.
more here: https://reproducible-builds.org/docs/build-path/
I'm very grateful for the work that this project has done and continues to do. Thank you!
I think its great that we have come to a point where packagers shift mindset from "it works" to "we can reproduce the results" in more than one package manager.
I tripped over this a couple weeks ago and was both amused and annoyed, since it seemed that packages were being listed in the file in a random order. I'm asking here because it might already be fixed; we're using a slightly old version of the package/repository tools.
You're sorting file names, right? Is this guaranteed to DTRT when packages are updated? (i.e., a version number changing can't result in two packages switching order?)
What does "build reproducibly" even mean in this context?
In my personal case, so that when I build a repository which has some new packages and some old packages, when I look at the resulting pull request in github I can see that the packages which haven't changed have indeed not changed.
What does "build reproducibly" even mean in this context?
Two repositories with the same packages have identical Packages files. Or for me, slightly more generally, when the Packages file changes, it changes as minimally as possible.
Nix does some tricks to improve output reproducibility like building things in sandboxes with fixed time, and using tarballs without modification dates but output bit-by-bit reproducible is not their goal. They also don't have the manpower for this.
Currently, a build is built by a trusted builderver for which you have the public key. And you look up the built by input hash but have no way to check if the thing the builderver is serving is legit. It's fully based on trust.
However, with debian putting so much effort in reproducible output, Nix can benefit too. In the future, we would like to get rid of the 'trust-based' build servers and instead move to a consensus model. Say if 3 servers give the same output hash given an input hash, then we trust that download and avoid a compile from source. If you still don't trust it, you can build from source yourself and check if the source is trustworthy.
Summary: Nix does not do bit-by-bit reproducibility, but we benefit greatly from the work that debian is doing. In the future we will look at setting up infrastructure for buildservers with an output-hash based trust model instead of an input based one. However this will take time.
Feel Free to ask any other questions.
I think you are wrong.
The Nix people (and the Guix people, including myself) are also involved in the reproducible builds project. I've met with a couple of them in Berlin last year. It's not just Debian doing this.
I can't speak for Nix but for the Guix project bit-for-bit reproducibility is an explicitly stated goal. It's very important and the reason why Guix is used in several HPC environments as the foundation for reproducible research.
Disclaimer: I'm co-maintainer of GNU Guix and currently at a bioinfo conference where I talked about Guix and reproducibility.
Do I need to install a whole other OS, or can I install Guix in Ubuntu?
At work I'm using the same binaries on a cluster with CentOS 6 and on workstations running a mix of Ubuntu, Fedora, CentOS 7, etc.
GuixSD ("Guix System Distribution") is the variant of the GNU system where the principles of Guix are extended to the operating system, but you don't have to use it if all you want is play with the package manager.
The easiest way to get started is to download "GNU Guix 0.13.0 Binary" for your architecture and follow the instructions at http://www.gnu.org/software/guix/manual/html_node/Binary-Ins....
If you are into Lisp you'll feel right at home with extending Guix. If you don't care for Lisp you might at least find the command line interface to be a little easier to understand than that of Nix, but really: that's a personal preference.
However personally I found it limited to building software (nix-shell for reproducible environments and building images), and weird to use for desktop stuff. It was just weird to have two package managers competing. Maybe someone has some neat ideas how to make it work, but I just went to NixOS.
NixOS is a GNU/Linux distro that uses Nix, and leverages it to build the whole system configuration. There, you have generations, right in the bootloader, and can boot to any of those. Which is extremely nice, as I had just stopped to worry before upgrades. If they fail (unless they break the bootloader, which is quite unlikely), I can just roll back to where I was.
Installation is performed from a shell session and requires some knowledge about how GNU/Linux works. Just like an, e.g., Gentoo or Arch Linux. Essentials are all covered in the manual: https://nixos.org/nixos/manual/index.html#sec-installation and examples of the rest (desktop environments, etc) can be found online.
Nix/NixOS may be not best in terms of UI/UX/user-friendliness (some stuff would look weird, until you get used to it - guess it's the same with any unfamiliar tech), but I have an impression that community there is very nice and very helpful (and maintains a lot of useful documentation).
I haven't used Guix or GuixSD, so can't comment about it.
tl;dr: I went with installing NixOS on a separate partition, and mounting my old /home there. No regrets and not looking back. (Whenever I need Debian or Ubuntu compatibility - e.g. build a .deb package, I just do things in a debootstrapped chroot or in Docker)
One suggestion: if you have a separate /boot - make sure it's large enough to hold a dozen of kernels+initamfses. Like, at least 256MiB.
I had a 128MiB partition and given that every kernel or initramfs change leaves a new copy there, it made things a little inconvenient.
It didn't break anything, just that `nixos-rebuild switch --upgrade` failed every now and then, requiring to clean up old generations even though / still had plenty of disk space.
This is different in GuixSD. The complete operating system is just another store item. `/boot` hardly grows because all that happens is that a new GRUB menu is installed.
The system is "checked out" from /gnu/store by the init and then booted.
The whole operating system configuration is just a single Scheme file. The configuration is unique, I think, in that GuixSD has a "system services" composition framework that allows for building up a complex graph of "system services".
A system service is not to be misunderstood as just a daemon process. It's much more flexible than that.
Here's a very good introduction to the service composition framework:
Once they have reproducible builds, they can easily prove that each binary package was built from the corresponding source code package: just have a third party compile the source code again and generate the binary package, and it should be identical (except for the signature). This reduces the need to trust that the build machines haven't been compromised.
There's a website that describes this project in much more detail as well as how they worked around the various problems they found. https://reproducible-builds.org/
It would certainly be convenient if you can point to a version/snapshot of Debian (or another distribution) - and it would then be possible to take your (say C) source code and compile and run the same binary used for research.
It's true that often getting the algorithm more-or-less right is enough - but the more research is augmented by computing devices, the more important it becomes to maintain reproducibility - and the more complex and capable these computer devices (say a top-100 supercomputer, software stack in C++ on top of MPI, some Fortran numeric libraries etc) become - the harder it becomes to maintain it.
Imagine verifying research done today by repeating experiments in 50 years.
It has taken, and continues to take, a suprising amount of work to make two builds of a program produce the same output.
There are many sources of issues. For example: date and time stored in output, program's output effected by order in which files are read from a directory (and is not having a fixed ordering), hash tables based on the pointer and the high objects are stored having different ordering on different executions, parallel programs behaving differently on different runs, and others.
If we want to insist that open source code is secure by source code analysis, we need a verifiable build chain, that the code and binaries an analysis uses are the same as what we get later.
It sounds trivial, but the full paths and timestamps that get added at multiple points in the process are enough to screw this up, and those are the easy problems.
If software has reproducible builds that means that third-parties can independently verify that artifacts have been built safely from sources, without any sneaky stuff snuck in.
Right now we can sign source code, we can sign binaries, but we can't shows that source produced binaries. I would feel much happier about installing code if I knew it was from a particular source or author.
Here's several examples:
VLISP for Scheme48 whose papers are here:
C0 compiler + whole stack correctness in Verisoft
CompCert Compiler for C
CakeML Subset of Standard ML
Rockwell-Collins doing crypto DSL compiled to verified CPU
Karger's original paper with the attack from 1970's:
Myer's landmark work on subversion in high-assurance security from 1980:
My framework I developed studying Karger back when I was building secure things:
We're focused on content right now over presentation. So, it will look rough. Hope you all enjoy it or learn something from the projects.
I think the solution is to give those devs who favor such techniques a separate but easy to use fuzzing tool set that they can run just like their unit tests, separate from their usual 'build' command. Give them their ability to discover new bugs, but make it separate from the real build.