Hacker News new | comments | show | ask | jobs | submit login
Towards a fully reproducible Debian (lwn.net)
145 points by lamby 4 months ago | hide | past | web | favorite | 41 comments



I would gladly set a "preproducible_only=yes" or whatever config option to exclude non-reproducible packages from a personal system. This would create a strong incentive to contribute patches. (I'm hoping someone can point out such a setting already exists! It didn't last time I checked.)


So, this actually exists as an extremely WIP patch: https://bugs.debian.org/863622

Underling that you should not apply this patch and/or it is simply a demonstration of one potential user-interface.


Unfortunately, such a flag would probably result in weird package install/update errors as part of the dependency graph would be missing.


Wouldn't have to be a wierd error message. "The package foo cannot be installed because it's dependency bar cannot be reproducibly built."


Can a package even be considered reproducible if its dependencies aren't?


I would argue not, but that's the point, right? Do we have a large enough well-connected graph of reproducibly buildable packages, and if not, what should be fixed first?


That depends on whether the dependencies are build dependencies.



The scope here is much larger as it’s every package in the Debian distro, not just the base system.


and pkgsrc has bulk package builds.

same pkgsrc checkout + reproducable base => essentially reproducable full package builds.

yes, some people will find some niggling way to make this not true because the binaries may be somehow altered due to timestamps, yadda, but for all practical purposes w/r/t the actual code and actual executables generated, this is true, and has been true for essentially the entire existance of bsd derivitaves using ports systems (1996).


> same pkgsrc checkout + reproducable base => essentially reproducable full package builds.

I would be truly surpised if that were the case. Are there no packages in pkgsrc which embed, e.g. timestamps, or even download things while building[1]?

[1] JDK packages are a typical culprit in this type of situation because Oracle JDK requires an "accept license" prompt.

> es, some people will find some niggling way to make this not true because the binaries may be somehow altered due to timestamps, yadda, but for all practical purposes w/r/t the actual code and actual executables generated, this is true, and has been true for essentially the entire existance of bsd derivitaves using ports systems (1996).

Oh, so you're coopting "essentially reproducable" to mean "not reproducible". Ok then.

"Reproducible builds" is about verification and isn't just about "close enough".


I thought NixOS solves the full reproducibility problem really well already.

https://nixos.org


It doesn't. NixOS doesn't do reproducible builds, it does reproducible systems.

Reproducible builds is essentially about compiling source code to a binary and always get the same checksum

Reproducible systems is about always getting the same system/behavior given the same configuration. Which is what NixOS does.


We did have a bunch of patches to actually make a lot of builds reproducible, but integrating changes like that has been pretty slow. I'm pretty sure a lot of these patches have since been adopted, pulling things much closer to bit-by-bit reproducible.

Unfortunately there's no real metric we have to keep track of that at the moment, like Debian ("%XY.Z are working"). But it looks like basically all distros are finally starting to converge on this.


There was no intentions to claim you where not working towards reproducible builds. I have come across several people that conflate what NixOS offers, and what reproducible builds are.

However, great that you have been working towards this! Arch Linux has been cooperating with Debian and currently build our packages on their infrastructure to produce graphs and test our packages. I assume NixOS can do the same thing if the manpower is present.

https://tests.reproducible-builds.org/archlinux/archlinux.ht...


I thought that getting the same system/behavior would require reproducible builds too. Perhaps I am wrong.


If your build process always produces the exact same binaries, and you always use the exact same environment to run these, then yes, that's reproducible. It's not equivalent though, in that you can easily have a system that always exhibits the same behaviour (what NixOS does) but does not use bit-by-bit equal binaries. For example, some symbols in some library's .so file might be in a different order, or some padding bits that are never read might be zero or random depending on your compiler's choice.


The same behaviour can come out of different binaries - the issue that reproducible builds is solving isn’t that programs have different behaviour from compile to compile, it’s that people can’t prove they were compiled correctly.


Nix doesn't "solve" reproducibility. For example, if the thing being packaged builds non-deterministically (e.g. if components are built concurrently, and added to a tarball in whichever order they finish). Nix can make it easier to avoid introducing new problems, and it also gives an audit trail of which versions of things were used to build things.

Also Debian doesn't use Nix, so that wouldn't solve the problem for Debian. Nix (along with everyone else) does benefit from the improvements that Debian is making, e.g. finding non-determinism in programs and getting it fixed upstream.


Did you read the article? It's about Debian, and it explicitly mentions NixOS. It also mentions certain packages (Emacs, certain secure boot stuff) which at the moment are not reproducible.


So I'm glad this is being done, but the "attack surface" that's being resolved sounds like a bit of a joke...

"Alice, a system administrator who contributes to a Linux distribution, is building her binaries on servers that, unknown to her, have been compromised"

This is a bit of weird scenario.. so they're assuming the only build being compromised is the one that ends up in the repo and no one can confirm that easily. So just have two (or more) identical servers in separate locations under different people's control so they aren't both compromised? No need to fuzz the dates and paths and stuff. Or if you really want, you make a special Debian version (super-stable) with fixed clocks, file paths, etc. where things are easy to reproduce.

"Bob, a privacy-oriented developer, makes a privacy-preserving browser, but is being blackmailed into secretly including vulnerabilities in the binaries he provides. Carol is a free-software user whose laptop is being attacked by an evil maid called Eve, the third developer of the title; each time Carol shares free software with her friends, it is pre-compromised by Eve"

Do Debian contributors just upload a binary blob to the Debian servers and that's it? I though they were saying they have build servers for every package and the source from which it's built is available

How about even easier - a bribed/black-mailed Debian maintainer slips in a patch to some of the tens of thousands of packages they tweak, no one notices and it gets distributed to everyone. I'm not expert, but it seems the smarter solution would be to have a much smaller reproducible core/base system (with Debian patches when necessary). Something that can realistically get enough eyeballs on it. And then everything else is built by project maintainers into something like flatpaks/snaps with no patching.


> just have two (or more) identical servers in separate locations under different people's control

"just"... The whole point of reproducible builds is for many 3rd parties to be able to rebuild packages easily and get consistent outputs without having to spend months to set up a perfectly "standard" buildbot.

Many large orgs are already rebuilding stuff anyways - adding security checks is going to be easy.

> No need to fuzz the dates

No, you still need to handle that.

> a bribed/black-mailed Debian maintainer slips in a patch to some of the tens of thousands of packages they tweak

No, other developers and 3rd parties will notice the patch being applied by the build tools.


> other developers and 3rd parties will notice the patch being applied by the build tools.

Not if it is done to the build servers. The code "looks fine".


> So just have two (or more) identical servers in separate locations under different people's control so they aren't both compromised?

How are you going to make them identical without introducing a point where one person could compromise both of them?

> Or if you really want, you make a special Debian version (super-stable) with fixed clocks, file paths, etc. where things are easy to reproduce.

Seems like that would be more work. And it would mean everyone would have to trust whoever maintains this super-stable Debian version, so again you're back to trusting one person.

> How about even easier - a bribed/black-mailed Debian maintainer slips in a patch to some of the tens of thousands of packages they tweak, no one notices and it gets distributed to everyone.

The patches are at least publicly recorded (and I believe signed?) and available for review. Whether anyone is reviewing them is another question, but it's at least possible to do so if you care to. Whereas it's very hard to review the binary build process.


> so they're assuming the only build being compromised is the one that ends up in the repo and no one can confirm that easily. So just have two (or more) identical servers in separate locations under different people's control so they aren't both compromised?

In the end, it should be more like 1000+ servers. It's probably quite trivial for a three-letter agency to hack the notebook of one or two open-source contributors. It's less trivial to hack hundreds or thousands of different servers hosted by different people in different regions, with varying amounts of tinfoil.


Reading those examples reminded me of TrueCrypt, and its very strange shutdown: https://en.wikipedia.org/wiki/TrueCrypt


> So just have two (or more) identical servers in separate locations under different people's control so they aren't both compromised?

Considering that you are a DNS spoof away from installing packages from a compromised server, how does your proposal tackle the underlying problem?


Considering that you are a DNS spoof away from installing packages from a compromised server,

Virtually all package managers now verify package and/or metadata signatures. If a server is spoofed and the system downloads a compromised package, the package manager will simply refuse to install it, because it is not signed by your distribution's packages.

Reproducible builds are useful because it reduces the probability of a compromised build system going undetected, by building the package on several independent builders.

For the user it also answers the question: does a distributor actually compile from the source code that they claim to compile from?


Arch Linux is also working on providing tools so our users can download a package, point a script at it and have the package recreated with all the info provided in .BUILDINFO. This includes the source location and a list of packages present on the system when the package was created.

We are a couple of steps away from actually providing this, but we will hopefully get there soon enough.


> Considering that you are a DNS spoof away from installing packages from a compromised server,

This is false. Debian packages are signed. If the signature fails to validate the package will not install.


Read 'em and weep:

https://blog.packagecloud.io/eng/2018/02/21/attacks-against-...

https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.p...

"GPG signing a Debian package does nothing because package signatures are not verified by default on any major distribution when packages are installed with apt-get install. See your /etc/dpkg/dpkg.cfg file for an explicit comment to this effect."

And:

"When APT software (such as apt-get, or reprepro) or folks offering APT repositories mention GPG signatures in their documentation they are typically referring to GPG signatures on repository metadata, not on packages themselves. Likewise, when you configure the SignWith option of reprepro (documented here), you are telling reprepro to sign your repository metadata with the specified GPG key; this does not sign any of the packages, though."


> If the repository is served ... with Acquire::AllowDowngradeToInsecureRepositories set to true(the default is false, thankfully)

So you're saying if I explicitly disable an important security feature my system is less secure? No tears yet...

That's a good article. It has information relevant to groups setting up apt repositories but the only issue that potentially applies to using the official Debian repositories is the replay attack and that would work even if you were verifying the signatures of individual packages.


> GPG signing a Debian package does nothing because package signatures are not verified by default on any major distribution when packages are installed with apt-get install

That's sliiiiiiightly misleading. :/


On both Debian 9.4 and Ubuntu 16.04 (what I have handy):

  # Do not enable debsig-verify by default; since the distribution is not using
  # embedded signatures, debsig-verify would reject all packages.
  no-debsig


While this is intuitive and obvious those who have used debian (or rather apt) for decades, it seems that a whole generation of computer professionals don't grasp it.

Modern software development involves pulling tons of libraries (often at runtime) from other locations. A lot of damage was done by things like 'sudo pip install xxx' and 'curl http://blah.com/install.sh|sudo bash'

At least docker's install uses https before downloading some random text from the internet, nowhere near as secure as gpg signing a package though -- compromise the website and you win.


> At least docker's install uses https

Anybody with a fake email address can upload a bunch of backdoored images on dockerhub and there's no way to spot that.


Presumably a given repository is secure? At least as much as a https certificate makes it secure (which means a compromise of a docker certificate or website means a compromise of everything on it)

This isn't about downloading random crap from the internet, it's about having the communication between who you trust (redhat, microsoft, dave smith) being secure.

GPG signing of the code you download means that you are running the code from the person with that GPG key. Doesn't matter if the server the reporistory is on is compromised via a DNS/https hack, or via some other means. The worst that will happen is you download the compromised software, it doesn't match the person/organization you trust, and you delete it.


They get their own namespace, they can't overwrite random existing images.


> Do Debian contributors just upload a binary blob to the Debian servers and that's it?

The practice has either been eliminated or curtailed but, yes, this was done for a long time.


Curtailed... "Source-only uploads" (the term to Google for) are not required. I suspect they are not even the norm due to not being the defaults for various tools, etc.


Thanks for the clarification. My impression as a Debian user was that it had been mostly curtailed.

It doesn't matter too much... you either trust your contributors or you don't. But it's still a bit icky.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: