Hacker News new | past | comments | ask | show | jobs | submit login
“Packages should be reproducible” added to Debian Policy (debian.org)
431 points by lamby on Aug 14, 2017 | hide | past | favorite | 43 comments

I helped provide financial backing for the Reproducible Builds project at the Linux Foundation's Core Infrastructure Initiative [0]. Holger, Lunar and the whole team deserve a huge amount of credit for beginning this when it seemed pie-in-the-sky and growing it to now become the standard for a lot of our infrastructure.

[0] https://www.coreinfrastructure.org/projects/reproducible-bui...

Thank you, Dan ;)

>Any packages that absolutely cannot be built in a reproducible way[1] ... [1] Such as random noise added to kernel and firmware data structures during local builds, to be used as a last defense to avoid the herd using same keys effects, etc.

Shouldn't this example of randomness still be pushed to the installation stage, instead of in the distribution. If Debian's binary package contains a "random" key, then we have a pretty large herd already using it.

> pushed to the installation stage, instead of in the distribution

Indeed. During working on reproducibility in Debian, I found exactly this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833885

Nice. I'm very curious what sort of track you were chasing to spot that.

Well, I did two builds of the package and they had different contents varying only on this secret key :) No special grep required!

Oh of course. Rebuild the entire distro twice. Didn't think of that :)

grep -i secret/token/phrase/key/etc and you'll often find hard coded defaults =(.

Well even with packages like this, you could put the randomness in the manifest and just change it when you update the package. I don't see why there should be an exception for this if it's build-time randomness.

It's a last defense, it's still a smaller herd than everyone

Right, gizmo686 is arguing for a line of defense at a different place (installation) which I agree would be far more useful.

The point is that when installing, you are not rebuilding. The hypothetical example here is for some kind of randomizaiton that applies, for instance, to the kernel during build time, such as randomization of structure layout. Having to rebuild every kernel you install to get a security patch, on every system you install it on, would be an enormous amount of time spent (there's a reason Gentoo is a fairly niche distro).

Such randomization can mitigate against certain kinds of attacks, especially making particular instances of an attack work differently between each kernel build, and thus increasing the difficulty of writing exploits that work across a broad range of kernel builds. There are about 30 different patch releases of the kernel for Debian Jessie alone; so such randomness, if applied across the different builds of a kernel within a release, across various different Debian releases, across different architectures, and also applied by other distros, can substantially increase the difficulty of writing an exploit that can affect a substantial fraction of Linux systems on the net.

That helps provide a form of "herd immunity", where the herd is all deployed Linux systems on the net. It doesn't provide any real protection for a particular build; so it's no harder to write an exploit that targets all system running the exact same kernel. But it does dramatically increase the difficulty of writing a widespread worm which relies on the given exploit, and which can easily spread between a variety of different systems.

Anyhow, that footnote was on a policy for explicit exceptions from the general policy. The general policy is that yes, builds should be reproducible, and any randomness should be generated locally. The footnote was giving a few example of potential exceptions which may be required, as a way of demonstrating that it would be fine to have the general requirement be reproducible builds with case-by-case, narrow exceptions for cases in which non-reproducible builds provide significant value.

And note that such exceptions may be based on how an upstream project operates. If the upstream kernel has some modules that build non-reproducibly in such a way, it may be more viable to encode an explicit exception to the policy for that case, than to remove that non-reproducibility.

It's important to keep in mind that someone applying a cleanup once accidentally removed almost all entropy from OpenSSL's random number generator on Debian (they even asked for review from upstream, and failed to get it, because of the confusing naming of upstream's mailing lists). Having an excessively rigid policy could mean that Debian maintainers would be forced to remove features that do provide some kind of benefit.

> Having to rebuild every kernel you install to get a security patch, on every system you install it on, would be an enormous amount of time spent

Rebuild? No. Just relink (and link in a just-generated binary containing the randomness seed). At boot time. Just what OpenBSD is experimenting with.

You don't always have randomness available at boot

Sure but if it is hard baked into the distro it is no longer random any more than '4' is random.

It's a good thing most non-embedded systems have moved to event/dependency based inits then. Otherwise, you'll have to generate RNG state at install or image personalisation time and store RNG state between reboots to preserve available entropy. In the worst case, the system won't have persistent storage (to store RNG state for next boot) -- that's a rather specialised case and, in general, such systems have hardware randomness sources. Either way, it's a rather niche problem.

Is there any way to create some kind of proof-of-work system where people who want to back the project can volunteer computer time to verify builds automatically and serve as a foil to any potential attackers?

Some kind of blockchain-like trust verification system isn't the craziest idea I've ever pitched.

Proof-of-work systems have to be slow to compute but quick to verify. This problem is equally slow to compute and verify, because you also have to compile a package to check someone else's assertion that the source compiles to a given binary.

They necessarily have to be slow to compute, but they don't neccessarily have to be quick to verify. The higher the verification cost is, the less likely people are to pay it, but that doesn't mean it might not be "cost effective" under some circumstances.

Like if 500 people can independently confirm a build produces a binary with result X then they could all share in the reward, whatever that is. Nanokarmas?

Other systems are designed to be more asymmetric in order to facilitate scale. Crypto coins would never work at all if to verify a possible hash you had to spend days mining to reproduce the work. Spending five minutes compiling a program to achieve consensus isn't a problem.

The problem is in verifying that someone actually did the work and didn't steal someone else's solution. Maybe encrypting the result you get and sending it off in escrow to a centralized verification location would work, and once a sufficient number of solutions are collected the solutions are unsealed and the results shared so everyone can see what happened and raise any objections.

Verifying a proof-of-work has to at least be quicker than computing the proof-of-work. Verifying and computing can't be literally the same operation. That's not a proof-of-work.

>Like if 500 people can independently confirm a build produces a binary with result X then they could all share in the reward, whatever that is.

In a cryptocurrency blockchain, every single node verifies every block received from another node in order to check whether it should be included in the node's local copy of the blockchain. If computing and verifying are the same operation, it's not "500 people independently verify a source produces a given binary and the rest of the network rewards them", but "the entire network verifies that a source produces a given binary and they all equally pat each other on the back". Unless you only reward the first to verify a build, but then no node would bother spending time verifying blocks made by other people and building off that chain rather than mining its own blocks if they both take as long as each other.

>The problem is in verifying that someone actually did the work and didn't steal someone else's solution.

Bitcoin uses the hash of the rest of the block (which includes the address for the reward of the miner who is doing the proof-of-work) as an input into the proof-of-work such that the result of the proof-of-work is only valid for that input. It's not apparent to me whether where there could be room to add an input like that into checking whether a source compiles to a binary. (I thought through whether you could make a Lamport-signature-like scheme involving picking specific intermediate values generated during the compilation that correspond to parts from a pre-committed series of hash pairs, but then I realized it wouldn't work because anyone who does the build once would get all of the intermediate values and be able to create as many of these signatures as they wanted for little effort.)

>Maybe encrypting the result you get and sending it off in escrow to a centralized verification location would work, and once a sufficient number of solutions are collected the solutions are unsealed and the results shared so everyone can see what happened and raise any objections.

Sounds like what you're looking for is some kind of web-of-trust reputation system with a trusted authority rather than a cryptocurrency blockchain. (If you have a trusted authority, then nearly all of the design of a bitcoin-like cryptocurrency is ridiculous dead weight. You can shed nearly everything, you don't need a broadcast-everything blockchain, and you could choose to have really cool things like blind signatures for anonymous transactions.) (Though if you have a trusted authority who can afford to be running build processes, it'd be a lot simpler to just have them do all the build-verifying for you, and you could do away with anything discussed in this post and just have them publish a PGP/HTTPS-encrypted webpage with their results.)

because you also have to compile a package to check someone else's assertion that the source compiles to a given binary.

You don't have to do that on the server right away, though. Just give the same task to multiple random users and only verify the result when all of the users return the same hash.

You could have a system that does that. I'm just saying that it's not a proof-of-work and wouldn't work as the foundation of a decentralized cryptocurrency blockchain. The system that does what you say couldn't be a cryptocurrency blockchain.

You would have to figure out how to score each build. Some would be more expensive then others due to compile time.

Does this include CPU optimizations build options? Or specific cpu instructions still count as 'reproducible'?

maybe they should switch to Nix... wait!

I think you may be confusing deterministic reproducible builds (that remove randomness and ensure binaries have the same content hash regardless of who builds them (so you can reproduce what the maintainers did and verify the source and binaries) to merely a repro'd environment where everything still works because deps are included, which seems to be all that Nix promises (and in fact there is at least one open issue to add full deterministic builds to Nix https://github.com/NixOS/nixpkgs/issues/9731 )

Migrating Debian to nix might be possible, if that was desirable. You could have a compatibility layer for a while, and once everything is using the compatibility layer (in about 200 decades?), you deprecate the old package management details.

It was discussed in the Debian mailing list long ago, when Nix was not so polished [1].

Honestly, I think it's a migration really worth it. Nix (and Guix) are quite mature now. The advantages they bring into the table are massive.

The whole Debian ecosystem would become a lot more integrated and robust. It would be possible to develop packages at their own pace, without having to keep all dependencies in sync with the whole package tree. Besides, no more dist-upgrade breaking your whole system. It would look a lot like a rolling release, but with none of its disadvantages.

It would be also possible to turn all Debian flavours into little declarative Nix blurbs. There are countless advantages.

[1] https://lists.debian.org/debian-devel/2013/02/msg00374.html

I think the risk of some software being kept on many versions old, full of security issues libraries is pretty significant, because the major impetus to force an upgrade has gone away, that is it wont even function without updating.

Should this issue be addressed at a technical level or a policy level? No matter how they manage the distro, so stuff is going to come down to policy and process. To me it seems like the sort of thing that should be handled with package audits.

The solution might be to keep nix for system software only and have third-party developers deploy snaps or flatpaks. Auditing package dependencies against a list of invalidated hashes should be easy enough.

I do think that Nix or something like it is the future. Or at least it will be if the tech community as a whole doesn't screw up.

But I also really appreciate the massive effort that Debian maintainers make, and the sheer number of those maintainers.

Combining whatever human processes Debian have in place to keep that going with Nix would be fantastic. Right now, to use Nix regularly, you really have to be willing to read a lot of Nixpkgs source code.

Edit: I should also note that I do actually currently use Nix on top of Debian for my work machine. Servers are all NixOS machines deployed with NixOps though.

Makes me wonder what adding Guix build support to dpkg (and vice-versa?) would look like, or whether it would even be possible in a coherent way.

I think Nix is the superior package manager, but you can already usw it with Debian for years now and not many people do it, so the problem seems to lie somewhere else...

Nix doesn't integrate with Debian well - you can also use yum on Debian and nobody does that either. If you're using Debian, it's because you want to manage a Debian system with Debian packages.

I use Debian and I do not like to manage the Debian system with Debian packages. I use Debian because it is the community-backed distro.

A community-backed distro, surely? There's also Arch, and others.

Yeah, nix doesn't feel right on debian because it can't manage services and doesn't use the "standard" packaging idioms. Is there someone else with a different experience or a workaround?

Two things:

1. Too few marketing. I only heard of Nix for the first time last year.

2. When you boot a virtual server, the UI usually offers you Debian, Ubuntu and Centos/RHEL images, sometimes Suse/SLES or Arch. So again, less visibility, and if the hoster does not allow you to bring your own image, it's a pain in the ass to install anything else.

3. (Going off topic for a second: I probably would have switched to Nix(OS) already if it wouldn't entail effectively abandoning the configuration management tool that I maintain.)

I think the marketing aspect has started to pick up recently. Whether through conscious effort or just word of mouth momentum I don't know.

I had this reflex, but IIRC the reproducible build effort does a few things differently (and some more) than nix. I can't recall which, but there were meaningful differences.

The short version is that they're different kinds of reproducibility.

The reproducibility that this post is about is for auditability ("verify that the binary originated from the claimed source"), whereas the reproducibility in Nix is for reliability ("ensure that the same package either fails or succeeds on every system in the exact same way, regardless of what the environment looks like").

Both are nice to have, but they are only tangentially related.

Nix does not particularly care about the hash of the result of a build, but mostly only the hash of the input.

When you download a package from the binary cache of Nix, you use the input hash. The binary cache contents may differ depending on the system of the build server.

Nix does try to eliminate nondeterminism in their builds though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact