Hacker News new | past | comments | ask | show | jobs | submit login
NixOS Reproducible Builds: minimal ISO successfully independently rebuilt (nixos.org)
548 points by CathalMullan on Oct 29, 2023 | hide | past | favorite | 173 comments



Rebuilding the minimal ISO from source is an impressive milestone on the journey to a system that builds from source reproducibly. Guix had an orthogonal but equally impressive milestone on the same journey recently[0], bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs. These two features may one day soon be combined to reproducibly build a full distribution from source.

[0] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...


That is amazing and it is great to see there are people out there fighting the good fight (while others ask: "but where's the benefit!? if there's a backdoor, everybody is still going to get the backdoor!").

> it gives us a reliable way to verify the binaries we ship are faithful to their sources

That's the thing many don't understand: it's not about proving that the result is 100% trustable. It's about proving it's 100% faithful to the source. Which means that should monkey business be detected (like a sneaky backdoor), it can be recreated deterministically 100% of the time.

In other words for the bad guys: nowhere to run, nowhere to hide.


To me, the largest benefit isn't even related to "bad guys", but rather in being able to understand and debug issues.

Reproducibility makes bugs more shallow. If hydra builds a bit-for-bit identical iso to what you build locally, that means a developer can make a change to the iso inputs, test it, and know that testing will also apply to the final ci-built one.

If a user reports a bug in the iso, and you want to test if a change fixes it locally, you can start from an identical source-code commit as the iso was built from, make some minimal changes, and debug from there, all without worrying that you're accidentally introducing unintended differences.

It minimizes "but it works on my machine" type issues.


Super clarifying, thank you.


It's not yet as far as the Guix stage0, but there was an interesting talk about bootstrapping nix from TinyCC at NixCon: https://media.ccc.de/v/nixcon-2023-34402-bootstrapping-nix-a...


357 bytes for bootstrap compiler binary is VERY impressive!


If I remember correctly, this tiny binary is used to (reproducibly) bootstrap the next binary, which bootstraps the next binary, until eventually GCC can be compiled (and compile other software).



To be fair, it is 357 bytes ... plus a POSIX operating system.

Still, that POSIX operating system bit is also being worked on.


Isn't that what builder-hex0 does?

https://github.com/ironmeld/builder-hex0


At 357 bytes, do you need a reproducible binary at all?

I'd think one could hand-document all 357 bytes of machine code and have them be intelligible.


This[0] is basically the hand-documentation of those bytes then. Handwritten ELF header and assembly code.

[0] https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX...


Just had a read of this to see what it did... And I must admit, I don't understand what purpose this is supposed to serve.

All it seems to do is convert hex into binary and dump it to a file. Not sure how that's any more useful than just copying the binary for next stage directly, after all this binary had to get on the system somehow.


The program does also dispose of comment lines.

One could argue that this is just a kind of trick so they can say the next "binary" is actually a "source" file because it happens to be written by a human in ASCII.

Still the phase distinction between what is a source and what is a binary becomes blurry at this low level. I believe the next stage of compiling is to, writing in ASCII represented machine code with comments, to allow for the existence of labels and then compute offsets for jumps to labels. And then more and more features are added until you have a minimal assembler letting you write somewhat machine independent code, and then continuing to work you way up the toolchain.

So at which point does the translation from "source" to "binary" become a real thing and not just a trick of semantics? Is it when we have a machine independent assembly code? Is it when we computed offsets for labelled jumps? It is when we started stripping comments out of the source code?


Yeah, I kind of agree, but my issue is kind of with this statement (in the link from the peer post):

> What if we could bootstrap our entire system from only this one hex0 assembler binary seed? We would only ever need to inspect these 500 bytes of computer codes. Every later program is written in a more friendly programming language: Assembly, C, … Scheme.

And my issue is that this isn't true. hex1 isn't written in assembler any more than hex0 is. Both of those bootstrap files can get onto the system simply by ignoring whitespace and anything after #, converting the hex into binary and writing it to a file.

Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source and both have the same problem of needing to prove the hex in the source matches the assembly (and that the program even does what the comments claim).

hex1 is a more useful bootstrap point, because you can use standard system tools to create the binary from the source (e.g. sed) and also compile itself and verify that the files are the same.

Having hex0 and hex1 just means you need to manually verify both rather than just hex1.

I guess my point is that if you have insufficient trust in your system that you can't e.g. trust "sed" to create the original binary files, or trust the output of "dd -x" or "md5sum" to verify the binary files, you also can't trust it enough to verify that the hex in those source files is correct or that the binary files match.


> Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source

Well presumably you toggle hex0 in on the front panel and then type hex1 with the keyboard, which is easier than toggling in the binary of hex1.


It’s the first stage. Likely piped. Hence the hex out. The context on how it’s called is key: https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304b...


Section 1.6.1 of the GNU Mes manual places these early stages assemblers into context:

https://www.gnu.org/software/mes/manual/mes.html#Stage0


That’s just the first stage. Simple enough to be audited manually.


Or tattooed on oneself! Or etched on a dog tag!


Nix is a great dog name


I get cat vibes from "Nix".


> bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs.

wtf that is mind-boggling. Thanks for the link.


Classic HN, the top comment in a Nix post is about Guix.

Nix has more packages and advocacy (even if the vast majority of people exposed to nix/guix will never actually use it), but Guix is a lot more interesting to me with the expressive power of scheme on offer.

That said, there are some sharp edges[0] that seem a bit harder to figure out (is this just as inscrutable/difficult as nix?).

Does anyone have some good links with people hacking/working with guix? Maybe some blogs to follow?

I care more about the server use-case and I'm a bit worried about the choice of shepherd over something more widely used like systemd and some of the other libre choices which make Guix what it is. Guix is fine doing what it does, but it seems rather hard to run a Guix-managed system with just a few non-libre parts, which is a non-starter.

Also, as mentioned elsewhere in this thread, the lack-of-package-signing-releases is kind of a problem for me. Being source and binary compatible is awesome, but I just don't have time to follow the source of every single dependency... At some point you have to trust people (and honestly organizations) -- the infrastructure for that is signatures.

Would love to hear of people using Guix in a production environment, but until then it seems like stuff like Fedora CoreOS/Flatcar Linux + going for reproducible containers & binaries is what makes the most sense for the average devops practitioner.

CoreOS/Flatcar are already "cutting edge" as far as I'm concerned for most ops teams (and arguably YAGNI), and it seems like Nix/Guix are just even farther afield.

[EDIT] Nevermind, Guix has a fix for the signature problem, called authorizations![1]

[0]: https://unix.stackexchange.com/questions/698811/in-guix-how-...

[1]: https://guix.gnu.org/manual/devel/en/html_node/Specifying-Ch...


> Does anyone have some good links with people hacking/working with guix?

We've just had a conference about Guix in HPC: https://youtu.be/dT5S72x18R8

This is a recording of a stream for the second day with talks about large scale deployments of Guix System in HPC.


Thanks for this! Going to give it a watch :)


How long does a fully bootstrapped build take?


It obviously depends on the hardware, but IIRC for me maybe 3-4 hours building from the 357 byte seed to the latest GCC.

The early binaries are not very optimized :-)


With caching, just the time to download the artefact.


Doesn't caching completely defeat the point of bootstrapping? How do you know the cached artifact is correct? You have to build it manually to verify that, at which point you're still building manually...


Guix has tooling to verify binaries:

https://guix.gnu.org/en/manual/en/html_node/Invoking-guix-ch...

"guix build --no-grafts --no-substitutes --check foo" will force a local rebuild of package foo and fail if the result is not bit-identical. "guix challenge" will compare your local binaries against multiple cache servers.

I build everything locally and compare my results with the official substitute servers from time to time.


1. hash it

2. rebuild it without the cache

3. hash that

4. compare

Or, trust somebody who has. Inconvenient, but is there any other way to establish trust in the correspondence between code and a binary?


You have a hash that n trusted parties agree on. This is enabled by reproducible builds.


Stupid question as I never worked on something like this before: why isn't reproducibility the default behavior?

I mean if 2 copies of a piece of software were compiled from the same source, what stops them from being identical each and every time?

I know there are so many moving parts, but I still can't understand how discrepancies can manifest themselves.


There are many specific causes, time stamps probably being the most common issue. You can see a list of common issues here:

https://reproducible-builds.org/docs/

The main overall issue is that developers don't test to ensure they reproduce. Once it's part of the release tests it tends to stay reproducible.


I agree, although I wouldn't describe the overall issue as developers not testing to ensure reproducibility. The reason most builds aren't reproducible is that build reproducibility isn't a goal for most projects.

It would be great if 100% of builds were reproducible, but I don't believe developers shouldn't be testing for reproducibility unless it's a defined goal.

As generalized reproducible build tooling (guix, nix, etc.) becomes more mainstream, I imagine we'll see more reproducible builds as adoption grows and reproducibility is no longer something developers have to "check for", but simply rely upon from their tooling.


It's also because the cost of making things reproducible is still too high.

We have the tooling, but it still takes a bit of effort from the developer's side to integrate those into their CI pipeline.

Eventually we will get to a place where this will be the default. It will be integrated into day-to-day tooling like `cargo release`, `npm publish`, ...


Typo: I don't believe developers shouldn't be -> I don't believe developers should be


Loads of things. Obvious ones where the decision is explicitly taken to be non-reproducible include timestamps and authorship information. There are also other places where reproducibility is implicitly broken by default: e.g. many runtimes don't define the order of entries in a hashmap, and then the compiler iterates over a hashmap to build the binary.


I can see why devs would want "This Software was built on 10/10/2007 by bob7 from git hash aaffaaff" to appear on the splash screen of software.

How do you get similar behaviour while having a reproducible build?

Can you, for example, have the final binary contain a reproducible part, and another section of the elf file for deliberately non-reproducible info?


if you have a reproducible build, then the notion of "software was built on date by user" is kind of useless information, no? Because it does not matter - if you can verify that a specific git hash of a codebase results in a particular binary through reproducible builds, a malicious adversary could have built it yesterday and given it to me and i can be almost surely confident (barring hash-collisions...) it's identical to a known trusted team member building it.

Having information about which git has was used, as well as the time it was published, is part of the source distribution so an output can contain references to these inputs and still be deterministic w.r.t. those inputs.

If you REALLY want to know when/who built something, you could add in an auxiliary source file which contains that information, which is required to build. Which is essentially what compilers which leverage current time do anyway, it's just implicit.


The usecase is: user wants an easy way to know, from the GUI of some running software, exactly what build/version/git commit/branch/date they're running - perhaps to file a bug report for example.

The actual build date doesn't matter if the software is reproducible - but its a proxy for 'how out of date is this software'.


If you actually had reproducible builds, the build date would not tell you anything about how out of date the software is—you would only need the date of the source code the binary was built from. By definition, the binary you'd get from building a version of the source today would be identical to the version you'd get building it the day that version of the source was finished.


In that case, you can report the Git SHA and still be reproducible.


You can still include the git commit id and its date on the build.


Your source would also have to have a reference to which exact version of which compiler to use, which versions of which external headers to use, etc. and now you're inventing Nix.

Conceivably there could be a standard for a sidecar file to specify how something was built (e.g. nixpkgs commit hash, or all of the parameters that went into the build). Or content address the inputs, i.e. invent Nix again.

So we could solve this problem by having everyone standardize on using Nix.


Such standards do exist: https://slsa.dev/spec/v1.0/provenance


Yeah, "who built this" information belongs in a signing certificate that accompanies the build artefact, not in the artefact itself. The Git hash can certainly appear in the binary (it's a reproducible part of the build input), and the date can instead be e.g. the commit date, which is probably more relevant to a user anyway.


Much as I like Git, I'm not sure I like the idea of the artefacts depending on the git commit and therefore on the entire git history. I rather feel the artefacts should only depend on the actual source and not on a particular version control system used for storing the source.


You're welcome to include full sources, or not-tied-to-git directions to acquire them, with your release binaries.

Regardless, whether or not you do that is a discussion of distribution format, not binary reproducibility. Your distribution can contain as much (or as little) additional material as you like along with your release binaries.


Absolutely. I'm just stating my personal preference, that's all.


You can still include the git hash or a git tag/release version info, since the reproducer has the same git repo anyway.

But including timestamp of build would necessitate “spoofing” the timestamp by the reproducer to be the same as the original.


Parallelism. There might be actions that are not order-independent, and the state of the CPU might result in slightly different binaries, but all are correct.


Why does this matter though? Why does order of compilation result in a different binary?


Just some random, made up example: say you want to compile an OOP PL that has interfaces and implementations of that. You discover reachable implementations through static analysis, which is multi-threaded. You might discover implementations A,B,C in any order — but they will get their methods placed in the jump table based on this order. This will trivially result in semantically equivalent, but not binary-equivalent executables.

Of course there would have been better designs for this toy example, but binary reproducibility is/was usually not of the highest priority historically in most compiler infrastructures, and in some cases it might be a relatively big performance regression to fix, or simply just a too big refactor.


Because order of completion of the parallel tasks is not guaranteed, if all tasks write to the same file you might get a different result each time.


> There might be actions that are not order-independent, and the state of the CPU might result in slightly different binaries, but all are correct.

Well no: that's really the thing reproducible packages are showing: there's only one correct binary.

And it's the one that's 100% reproducible.

I'd even say that that's the whole point: there's only one correct binary.

I'll die on the hill that if different binaries are "all correct", then none are: for me they're all useless if they're not reproducible.

And it looks like people working on entire .iso being fully bit-for-bit reproducible are willing to die on that hill too.


"Correct" does not mean "reproducible" just because you think lowly of irreproducible builds.

A binary consisting of foo.o and bar.o is correct whether foo.o was linked before bar.o or vice versa, provided that both foo.o and bar.o were compiled correctly.


See my reply to the sibling post — binary reproducibility is not the end goal. It is an important property, and I do agree that most compiler toolchains should strive for that, but e.g. it might not be a priority for, say, a JIT compiler.


Here is a very recent post from the Go team on things they had to do to make the Go toolchain fully reproducible.

https://go.dev/blog/rebuild


Sometimes it's randomized algorithms, sometimes it's performance (e.g. it might be faster not to sort something), sometimes it's time or environment-dependent metadata, sometimes it's thread interleaving, etc.


a very common one is pointer values being different from run to run and across different operating systems. Any code that intentionally or accidentally relies on pointer values will be non-deterministic


Would be nice if you could explain how/why this happens, given that normally, pointers aren't persisted.


Languages such as Standard ML and others (Scheme? Lisp? Not sure...) have implementations that can save the current state of the heap into a binary.

This is used in theorem provers, for example, so that you don't have to verify proofs of theorems over and over again (which can be very slow).

Instead, you verify them once, save the state of the heap to disk (as a binary ELF, for instance) and then you can run the binary to continue exactly where you left off (i.e. with all the interesting theorems already in memory, in a proved state).

This is what the HOL4 theorem prover's main `hol` script does, i.e. it runs HOL4 by loading such a memory state from disk, with the core theories and theorems already loaded.

Presumably, to make this reproducible you'd need to make sure that all the memory objects are saved to disk in a deterministic order somehow (e.g. not in memory address order, as it can change from run to run, especially when using multiple threads).

Edit: Presumably you'd also need to make sure that you persist the heap when all threads are idle and in a known state (e.g. with all timers stopped), to avoid random stack states and extraneous temporary allocations from being persisted, which would also affect the resulting binary.


Thanks, yeah. So I guess the concrete example I would cite here is that the most natural (and most efficient?) way of persisting std::map<ptr, ....> would introduce pointer ordering into the output.


Just like the most natural (and most efficient?) way of persisting any std::unordered_map<...> can result in a completely randomly-ordered output, due to a DoS mitigation that some commonly-used language runtimes have.


I think they meant if you cast a pointer to an integer, do some math on that and then store that. Then you will a stored result that will likely differ from run to run


That sounds like runtime differences not a difference between two binaries


The difference in binaries must be caused by some runtime difference of a compiler.


That's right, look at this thread for example:

https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20...

The Global Value Numbering pass in LLVM was iterating over `DenseMap<BasicBlock*, ...>`, so the iteration order was dependent on the value of BasicBlock pointers. This could lead to the same source files and compiler producing different binaries.


That’s runtime behavior


A surprising amount of compiler and program behavior depends on how pointer values compare.

These comparisons don't have to go the same way for everything to be correct.


I don't develop enough to give a particularly good answer, but one example I've heard of involves timestamps

Imagine the program uses the current date or time as a value. When compiled at different moments, the bits change.

Same applies to anything where the build environment or timing influences the output binary


Laziness and carelessness of compiler developers.


As others have mentioned, there’s sorting issues (are directory entries created in the same order for a project that compiled everything in a directory?), timestamps (archive files and many other formats embed timestamps), and things that you really want to be random (tmpdir on Linux [at least in the past] would create directories of varying length).

I’ve successfully built tools to compare Java JARs that required getting around two of those and other test tools that required the third. I’m sure there are more.


Sorry for being dense, but I thought one of the main reason for nixos's existence is reproducibilty. I thought they have these kinds of things solved already.

I have only ~2 hours experience with Nixos, wanted to try hyprland, I thought it would be easier on Nixos since hyprland needs a bit of setup and maybe it's easier to use someone else's config on nixos, than on some other distro. Finding a config was hard too, found like 3 on some random github gists, thought there would be more... and none of them worked, at that point I gave up.


> Sorry for being dense, but I thought one of the main reason for nixos's existence is reproducibilty. I thought they have these kinds of things solved already.

Nixos has the advantage that everything is built in its own sandbox with only its explicitly declared (and hashed) dependencies available, unlike in mainstream distros where it's the full system environment, so in many cases you already get the same binary every time. But this doesn't immediately lead to reproducibility because the build process might be nondeterministic for various packages.


> unlike in mainstream distros

Debian has been building in a clean sandbox with only required, tracked dependencies since decades.

It's also building the large majority of packages reproducibly including the binary and whole installation packages (not just the sources like nixos)


> not just the sources like nixos

Not sure what you mean by that, the Nix packages that are reproducible have reproducible binaries.

In the Nixos world there isn't really a concept of a "binary/installation package" like in Debian or elsewhere. Everything can be rebuilt from source on any machine, but because everything is hashed, if the official binary caches have already built something with the same inputs, they can just give you the outputs directly. So it's more like memoization than a .deb or something that you install.

Nix is a functional language that builds recipes (derivations) to build stuff, with all the inputs and outputs hashed. If the derivation you want to build has already been built by a cache you trust, the system will just fetch it instead of building locally.

What the Nix reproducability project checks is that the same derivation produces the same output regardless of what machine it's built on.


> In the Nixos world there isn't really a concept of a "binary/installation package" like in Debian or elsewhere. Everything can be rebuilt from source on any machine

Thats not actually the case. A derivation is just an abstract concept that combines a hash with a "thing". Here is an example [1] of a pre-compiled mono binary that gets downloaded and installed (after patching paths).

[1] - https://github.com/NixOS/nixpkgs/blob/0cbe9f69c234a7700596e9...


> unlike in mainstream distros where it's the full system environment

Usually packages are built in an environment which has only a minimal base system plus the package's explicitly dependencies. They don't have random unnecessary packages installed.


This is a really good comment, I have no idea why it’s going grey.

Upvote from me FWIW.


There are two senses of reproducible.

The sense you're thinking of is that you can easily rebuild a binary package and it will use the same dependency versions, build options, etc. There should be no chance of a compiler error that didn't happen the first time (the old "but it worked on my laptop" syndrome).

The sense used here is that every build output is byte-for-byte binary identical. It doesn't depend on the machine name, the time it was compiled or anything like that (or, in a parallel build, the order in which files finish compiling). That is much harder.


> The sense you're thinking of is that you can easily rebuild a binary package and it will use the same dependency versions, build options, etc. There should be no chance of a compiler error that didn't happen the first time (the old "but it worked on my laptop" syndrome).

And that's just for Nixpkgs, the packages themselves that also work outside NixOS. NixOS has reproducibility of the entire system complete with configuration.


Yeah, Nix is a tough tool to learn. It's probably never the right tool to pick for "I just want something that works right now" if you're unfamiliar with it.

> I thought one of the main reason for nixos's existence is reproducibilty

NixOS uses "reproducible" to mean "with the same Nix code, you get the same program behaviour". This is more/less what people hope Dockerfiles provide.

This is the level of reproducibility you want when you say "it works on my machine" or "it worked last time I tried it".

Whereas "reproducible build" aims for bit-for-bit equality for artifacts build on different machines. -- With this, there's a layer of security in that you can verify that code has been built from a particular set of sources.

> Finding a config was hard too

What search query were you using? Searching "nixos configuration" on https://github.com/search?q=nixos%20configuration&type=repos...

Or searching for hyprland specifically, there seem to be many using that https://github.com/search?q=wayland.windowManager.hyprland&t...


> NixOS uses "reproducible" to mean "with the same Nix code, you get the same program behaviour".

Note that ”Nix code” also includes the hashes of all non-Nix sources. One way to think of it is that Nix has reliable build cache invalidation.

> This is more/less what people hope Dockerfiles provide.

Indeed, but importantly they do not provide input-reproducibility (while Nix does) because, at least, there are no hashes for remote data.


PSA: you can build OCI images with nix, then they'll be a pure function of their input like we've all wished was the case with Dockerfiles.


(And Nix derivations compose, whereas Dockerfiles entirely do not.)


I don't remember, some of them needed some other tools installed(like flakes whatever it is), I looked for configs, that looked like they don't need a few more hours to learn and to setup some other tools for them to work.

I just wanted to take a quick look at hyprland, I imagined I just use an existing config, I never thought it would need hours of research. Later I installed an arch vm and managed to install hyprland with some basic components in less than an hour from the first guide I found.

Looks like I misunderstood, what nix was made for. I just want a system I can more or less set up with a simple config file.

I saw this os, didn't have time to try it yet, but I thought this is how nix works. https://blendos.co/

For example you just define gnome like this, the nix configs I found looked similar, they just didn't work.

>gnome:

> enabled: true

> style: light

> gtk-theme: 'adw-gtk3'

> icon-theme: 'Adwaita'

> titlebar:

> button-placement: 'right'

> double-click-action: 'toggle-maximize'

> middle-click-action: 'minimize'

> right-click-action: 'menu'


I am on a similar journey

I built https://github.com/mikadosoftware/workstation (hey nearly 500 stars!) as the idea of defining a reproducible laptop build.

I don't think docker is the right level - so my next project when i have free time (!) is to do a box build that then might compile to docker

I think there is a sensible point of being able to define via nix both developer workstations and servers


Except it's Docker, and like virtually all Dockerfiles, it immediately runs "apt-get update", tossing reproducibility out the window.


yes, hence the (planned) move to flake.nix and then compile to docker.


Music to my ears! God speed, good luck!


> I just wanted to take a quick look at hyprland, I imagined I just use an existing config, I never thought it would need hours of research.

It shouldn't.

You'd want a simple flake to start with that has home-manager (for higher chance of finding declarative vest practice configs and modules) and to add small things to that.

I imagine you tried grabbing someone's complex config, modifying it, and ran into issues?


Flakes will hopefully be that soon but I wouldn't recommend starting with flakes when learning Nix in 2023. They're experimental and you still need to learn most of flake-less Nix (except channels and NIX_PATH) anyways.

When I started learning/using NixOS about two years ago I found it useful to start out with just Nixpkgs (i.e. what you get out of the box) and only add libraries when I felt they would help me. My first configs where ugly as hell and full of bad practice but the cool thing about Nix is that it gives you a lot of safety nets to enable experimentation and refactoring.


> Flakes will hopefully be that soon but I wouldn't recommend starting with flakes when learning Nix in 2023. They're experimental and you still need to learn most of flake-less Nix (except channels and NIX_PATH) anyways.

I've used Nix for a decade and wouldn't recommend the confusing and horrible user experience of Nix without flakes.

Additionally, if you are using github for code examples, you'll have far more success using flakes.

Many experienced people a new user would get help from, including myself, have long since washed their hands of prw-flakes issues and arcana like channels issues.


> Flakes will hopefully be that soon but I wouldn't recommend starting with flakes when learning Nix in 2023.

That Flakes provide a consistent entrypoint (and a consistent schema for such) into a codebase would have deferred a significant amount of confusion I had when getting started with Nix.

> They're experimental

The functionality as-is hasn't been changed. The 'experimental' flag itself hasn't been a practical problem.

However, flakes still have some rough edges & design problems to them, and there's some disagreement in the community over how flakes were rolled out.

I'd say for an end user, the benefits far outweigh the costs.

> ... and you still need to learn most of flake-less Nix (except channels and NIX_PATH) anyways.

I think the phrase "flake-less Nix" paints the wrong idea. I'd instead put it: Most of what you need to learn about Nix is unrelated whether the Nix evaluation started from a Flake or not.


Check out https://github.com/donovanglover/nix-config . Flake based config with hyprland and cool stuff.

> at that point I gave up.

NixOS is not for the weak or time constrained, currently. Hopefully it will be one day. Still if you push through, you reap the benefits.


Another good option: https://github.com/Misterio77/nix-starter-configs

I started with this one, the minimal version, then moved on to something more like the standard version, and now I'm moving on to something based on his much more complicated and flexible build in a different repo. I had been flailing, then this repo made it click.


Same here, that repo is fantastic.


His documentation has gotten better too! I actually just rebuilt my entire config based on his updated "standard" version. I want to use his "non-starter" config too, bc it seems remarkably powerful, but ... I need more time and brain to do that.

And, just to be clear, I rebuilt my entire config in less than an hour, copying and pasting where necessary, then it took about 20 minutes (!!) to download and setup all my software. Only one error, so I consider that a significant My NixOS Journey accomplishment! Granted, because it's NixOS, I have NO IDEA AT ALL what the error was, lol.


> Finding a config was hard too, found like 3 on some random github gists, thought there would be more..

That sounds odd, did you use github code search?

Find relevant home manager options:

https://mipmip.github.io/home-manager-option-search/?query=h...

Then search those on github:

https://github.com/search?utf8=%E2%9C%93&q=lang%3Anix+hyprla...

Note some option searches imply more casual or advanced users.


Nix is reproducteable in tbe environment sense, meaning you can get the exact same setup every time, but not in the bit-for-bit sense, meaning that the compiled binaries will be identical.


For those wondering : it should be remembered that the reproducibility of Nix / NixOS / Nixpkgs is only a reproducibility of the sources: if the sources change, one is warned, but it is not a question of the reproducibility of the binaries (which can change at each build). This binary reproducibility of Nix / NixOS / Nixpkgs is indeed not really tested, at least not systematically.

Guix, Archlinux, Debian do the binary reproducibility better than Nix / NixOS / Nixpkgs.

Sources :

- https://r13y.com/ ( Nix* )

- https://tests.reproducible-builds.org/debian/reproducible.ht... ( Debian )

- https://tests.reproducible-builds.org/archlinux/archlinux.ht... ( Archlinux )

- https://data.guix.gnu.org/repository/1/branch/master/latest-... (Guix, might be a bit slow to load, here is some cached copy https://archive.is/lTuPk )


To emphasize chpatrick's point below, there are two definitions of "reproducibility" in this context:

* Input reproducibility, meaning "perfect cache invalidation for inputs". Nix and Guix do this perfectly by design (which sometimes leads to too many rebuilds). This is not on the radar for Debian and Arch Linux, which handle the rebuild problem ("which packages should I rebuild if a particular source file is updated?") on an ad-hoc basis by triggering manual rebuilds.

* Output reproducibility, meaning "the build process is deterministic and will always produce the same binary". This is the topic of the OP. Nix builds packages in a sandbox, which helps but is not a silver bullet. Nix is in the same boat as Debian and Arch Linux here; indeed, distros frequently upstream patches to increase reproducibility and benefit all the other distros. In this context, https://reproducible.nixos.org is the analogue of the other links you posted, and I agree Nix reports aren't as detailed (which does not mean binary reproducibility is worse on Nix).

Your comment can be misinterpreted as saying "Nix does not do binary reproducibility very well, just input reproducibility", which is false. That's the whole point of the milestone being celebrated here!


> Your comment can be misinterpreted as saying "Nix does not do binary reproducibility very well, just input reproducibility", which is false.

It's only "false" as nobody has actually tried to rebuild the entire package repository of nixpkgs, which to my knowledge is an open problem nobody has really worked on.

The current result is "only" ~800 packages and the set has regular regressions.


I am probably misunderstanding your point BUT I have actually depended on Nix for "reproducible docker images" for confidential compute usecase so that all parties can independently verify the workload image hash. Rarely (actually only once) it did fail to produce bit identical images every other time it successfully produced bit identical images on very different machine setups. Granted this is not ISO but docker images, but I would say Nix does produce reproducible builds for many real world complex uses.

Ref: [1] https://gitlab.com/prateem/turning-polyglot-solutions-into-t... [2] https://discourse.nixos.org/t/docker-image-produced-by-docke...


I'm very sure you are actually just rebuilding the container images themselves, not the package tree you are depending on. Building reproducible ISOs, or container images, with a package repository as a base isn't particularly hard these days.


I see what you mean. Thanks for clarifying. Even so, Nix is no worse placed than those other distributions for bit reproducibility. Correct?


It's unclear at the moment because of the limited testing (minimal ISO and a Gnome ISO) vs Arch/Debian/Guix rebuilding entire package repositories.


I think you might want to read the article.

it's about binary bit by bit reproducibility of not just the binaries but also how they get packed into an iso (i.e. r13y.com is outdated, the missing <1% where also as far as I remember a _upstream_ python regression as reproducability of binaries (ignoring the packaging into an iso) was already there a few years ago)

now when it comes to packages beyond the core iso things become complicated to compare due to the subtle but in this regard significant different ways they handle packages, e.g. a bunch of packages you would find on arch in aur you find as normal packages in nix and most of the -bin upstream packages are simply not needed with nix

in general nix makes it easier to create reproducible builds but (independent of nix) this doesn't mean that it's always possible and often needs patching which often but not always is done if you combine this with the default package repository of nix being much larger (>80k) then e.g. arch (<15k non aur) comparing percentages there isn't very useful.

through one very common misconception is that the hash in the nix store path is based on the build output, but it's instead based on all sources (weather binary not) used for building the binary in an isolated environment

this means it has not quite the security benefit some people might think it has, but in turn is necessary as it means nix can use software which is non reproducible buildable in a way which still produces reasonable reproducable deplyments (as in not necessary all bits the same but all functionality, compiler-cfgs, dependencies versions, users, configurations etc. being the same


> but it is not a question of the reproducibility of the binaries (which can change at each build). This binary reproducibility of Nix / NixOS / Nixpkgs is indeed not really tested, at least not systematically.

Isn't that exactly what your first source and OP are about? They check that the binaries are the same when built from the same sources on different machines. The point is exactly that the binaries don't change with every build.

> How are these tested?

> Each build is run twice, at different times, on different hardware running different kernels.


Yeah, that represent maybe 1% of the packages in nixpkgs (only the installation iso).


Sure but the goal is the same, binary reproducibility, and it is systematic. It's just less far along than Debian.

Also I'm pretty sure a big percent of nixpkgs is already reproducible, we just don't know for sure.

They say the next step might be the GNOME-based ISO, which would be a big achievement because it's basically a full-featured system.


> Guix, Archlinux, Debian do the binary reproducibility better than Nix / NixOS / Nixpkgs.

Huh, didn't know that Arch Linux tests reproducibility. It's apparently 85.6% reproducible: https://reproducible.archlinux.org

I wonder how much work would be needed for NixOS, considering it has more than 80k packages in the official repository.


I think that's also a bit of an unfair comparison given the number of AUR packages you usually use on Arch. With nixpkgs there isn't a distinction between official and community packages.


Sure there is, the NUR has a few thousand community packages that are not ready for release

The nixpkgs are all official packages, it's just really easy to become a maintainer (you make a pull request adding the package you want to maintain)


I'm just saying that X% of arch official packages being reproducible isn't a complete statistic when many day to day things are in AUR, most of which are in nixpkgs not NUR.


AUR is unsupported, and the fact that nixpkgs decides to support everything is for them to decide.

Reaching for reproducible builds support in Arch is a more attainable goal than for nixpkgs. Properly maintaing 80k packages without regressions is going to be a lot more work in the long term.


That is not true at all, with respect to the aims or the reality of nixpkgs. The original post here is talking about reproducing the (binary) minimal iso, which contains a bunch of binary packages.


It is true. The original post writes about reproducing the minimal iso, which contains probably around 1% of the packages in nixpkgs. The remaining packages are not tested regarding binary reproducibility, or, at least, not in a systematic manner, which means regressions may happen regularly (which is exactly what happened with the .iso, see the previous announcement from 2021: https://discourse.nixos.org/t/nixos-unstable-s-iso-minimal-x... .)


While I would love testing reproducibility more systematically, it would not really have helped for the Python 3.10 regression: in this case we knew full well that it would break reproducibility even before we merged the change, but the performance advantage it unlocked seemed too big to ignore. Such trade-offs are luckily rare - I'm happy that with Python 3.11 we can now have both :)


r13y.com is outdated vs. https://reproducible.nixos.org/


Doesn’t the content-addressed derivation experimental feature address this issue? Instead of store hashes being input-addressed as you mention, the derivation outputs are used to calculate the store hash, which ensures binary reproducibility.


Ish. This is covered in section 6.4.1 of Eelco's thesis (https://edolstra.github.io/pubs/phd-thesis.pdf). It all becomes much simpler if evaluating a build many times can only ever result in one output, but the Nix content-addressed model does permit multiple outputs. In such cases, the system just has to choose a canonical output and use that one, rewriting hashes as necessary to canonicalise inputs which are non-canonical.


Not really:

With input-addressing you look things up in the store based on the input hash. You can determine the input hash yourself, but you have to trust the store to provide a response that corresponds to the sources. With Reproducible Builds you can have a third party confirm that output matches that input.

With content-addressing, you look things up in the store based on the output hash. You no longer need to trust the store here: you can check yourself the response matches the hash. However, you now have to trust who-ever told you that output hash corresponds to the input you're interested in. With Reproducible Builds you can now have a third party confirm that output hash matches that input.

I have not worked with content-addressed nix in depth yet, but my understanding is that this stores the mapping between the inputs and their output hashes in 'realizations' which are also placed in the store. Reproducible Builds will still be useful to validate this mapping is not tampered with.


I find it funny(ironic) that the OpenBSD project is trying hard to go the other way, every single install has unique and randomized address offsets.

While I understand that these two goals, reproducible builds and unique installs, are orthogonal to each other, both can be had at the same time, the duality of the situation still makes me laugh.


If the address offsets can be randomized with a provided seed, then demonstrating reproducibility is still possible.

Alternatively, randomizing the offsets when starting the program is another way to keep reproducibility and even increase security; the offsets would change at every run.


OpenBSD does randomised linking at boot time. Packages themselves can still be reproducible. All the randomisation is done locally after the packages are downloaded and their checksums validated.


Now if only they would have maintainers sign packages like almost every other linux distribution has done since the 90s, so we have any idea if the code everyone is building is the same code submitted and reviewed by known individuals.

Until signing is standardized, it is hard to imagine using nix in any production use case that protects anything of value.


My impression of nix package maintainers is that they are providing a useful interface so that software can be composed easily. Most of them are not making any kind of assertion about the contents of the package. Expecting to use their signatures for anything meaningful strikes me as a bit like expecting product support from a delivery driver.

You don't need to trust it wasn't packaged maliciously, nix does reproducible builds so you can just look at the derivation and build it yourself if you don't feel like relying on the binary cache.

As for whether the underlying contents are malicious, that's between you and the developer. If other distributions have have lead you to believe otherwise, then I think they have misled you.

The only exception I can think of is Tails, and they don't exactly have the breadth that Nix does.


"Expecting to use their signatures for anything meaningful strikes me as a bit like expecting product support from a delivery driver."

And yet most of the packages from most major linux distributions are signed. If you are going to spend hours maintaining a package, it takes only an extra half a second to tap a yubikey to prevent someone from impersonating you.

Package maintainers from say Arch and Debian go through a vetting process, multiple people sign their keys, and it is a responsibility. Yes, it is volunteer, but there are also volunteer firefighers. Some volunteer jobs are important to keep others safe, and they should be done with care.

If Arch, Debian, Fedora, Ubuntu can all sign packages, then this excuse does not really hold for Nix.

"You don't need to trust it wasn't packaged maliciously, nix does reproducible builds so you can just look at the derivation and build it yourself if you don't feel like relying on the binary cache."

Reproducible builds and package definition signing solve totally different problems. Assume you trust a given developer has been maintaining a package non-maliciously, then you see they made a new update, and so you and other people trust it and build it. You get the same result, so you trust the binaries too. However, you still end up with malware. How? Simple. The developers github account was compromised due to a sim swap on their email account while they were on vacation, and someone pushed a fake commit as that person.

Or maybe a malicious Github employee is bribed to serve manipulated git history only to the reproducible build servers but to no one else, so it goes undetected for years.

Supply chain attacks like this are becoming very common, and there is a lot of motivation to do it to Linux distributions which power systems worth billions of dollars regularly.

It is also so easy to close these risks. Just tap the damn yubikey or nitrokey when it blinks. It is wildly irresponsible not to mandate this, and we should question the motivations of anyone not willing to do something so basic to protect users.


Nix doesn't have maintainers sign anything because it isn't necessary. The Nix binary cache is built and signed but that's done by builders only the NiXOS foundation controls.

Individual maintainers just commit Nix code to build the packages... Do you mean you want their Git commits to be signed?

Edit: I guess that is what you mean. That is distinct from package (binary) signing. How do you know that a distro's repos built using the signed commits? NixOS is actually better suited to prove that an particular commit produced the package.


Two people being able to build the same source and get the same result, does not mean the source that was built was the same source the developer originally contributed. Signing git commits would be a great start, easier than what other distros do to solve the same problem. With reproducible builds in place, the biggest remaining risk is someone tampering with git history.

Also packages being built by a central party is actually a problem. What stops someone with ssh access to the build systems from tampering with the results?


Nix has two types of derivations (builds). Input addressed, and content addressed.

An input addressed derivation’s hash is based on the the hash of all its input derivations and its own derivation. Therefore trusting the association between the cached binary and the derivation requires trusting the builder and a signature. All non-derivation inputs like source code, must be content-addressed.

A content addressed derivation can then be produced easily by rewriting all the derivations with `nix make-content-addressed`. This doesn’t require trust / signatures as every stage of the build is now content-addressed. The final hash could be confirmed through social consensus of multiple distrusting parties.

There’s nothing in theory stopping you from starting with a content addressed derivation other than it being a pain in the ass as you’d have to know the output hash before you built it, or TOFU (trust on first use) it which is then just the same as using the `nix make-content-addressed` approach.

I’m not sure why you think commit signatures are required. Git is content addressed, you can’t tamper with the derivations without changing the hashes and nixpkgs is primarily developed through GitHub. If someone has access to your SSH keys or GitHub account password it stands to reason they’d have access to your GPG keys too.


I totally grant that the nix system is good at making sure that a given git commit produces a given result. They do a fantastic job at everything from miles 2-10. They just skip the first mile of integrity, which is also one of the easiest to tamper with.

9/10 popular package developers I audit have SMS as a backup account recovery method on their email accounts. Easy to see when I try to reset their email passwords and it says "sms sent to ....". One sim swap and I have the account of a popular package maintainer.

Or, even easier, developers often use custom email domains. Right now there are -thousands- of custom email domains used by package maintainers in NPM for instance, that are expired. Buy domain, and you buy access to contribute commits to an unpopular unnoticed package that a popular package depends on.

Supply chain attacks are my core area of research, and they are easy to do. They are hard to defend against unless you have reproducible builds already. It is crazy NixOS has done so much good work for supply chain integrity and refuses to do the first mile almost every other popular linux distro already makes at least some attempt at.

As for PGP key theft, most people that use PGP today keep the keys on personal HSMs, like a yubikey or nitrokey, such that the private key never comes in contact with the memory of an internet connected computer. Stealing one in most cases would require keylogging the pin, and physically stealing the key.

If you got malware on someones machine, you could still manipulate them at that point in time, though this still will require timing your attack so that they are online and tricking them to tap their key, which dramatically increases attack complexity.

Next level once people sign commits though, is signing code reviews. Then you have removed all single points of failure from your software supply chain.


You're talking sense, but this is due diligence for a developer, not for an operating system or a package manager.

You're free to map your package definitions to the commits they contain and verify any signatures that you find there, but that process will have nothing to do with whether other people have signed the instructions that your machine follows to fetch and build the contents of that commit.


Sure I could write a lot of tooling to try and do this sort of basic verification manually in Nix every single time i pin a new package, but at that point why am I using nix over distros that have native support for maintainer level supply chain integrity?

Compare to the Arch model where all official packages must be signed with keys belonging to a reasonably well vetted web of trust.

Developers outside the web of trust that want to contribute yolo unsigned packages to Arch still can, but those must go into AUR where users must opt-in to and manually review each individual untrusted/unsigned package.

https://wiki.archlinux.org/title/Pacman/Package_signing

Nix decided to have the yolo AUR model by default, with no method to elect to use only signed packages, because signing and web of trust are not even supported at all, even optionally.

This is wildly irresponsible given how many people use Nix today.

Nix is two steps forward in deterministic, immutable, and unprivileged package management, and one giant leap backwards in supply chain integrity.

This is why we cannot have nice things.


Nix is two steps forward in these ways because they chose to focus on making things composable and repeatable instead of being curators of quality and trustworthiness.

One magical thing about Nix is that there's a very small divide between managing to install software in the first place and creating an artifact that others can use for the same purpose.

Because of this, practically every NixOS user has their configuration in source control--we're basically each building our own Linux distro with only packages that we trust. Some of us probably sign those commits too.

The ecosystem is useful because it encourages this kind if participation. Having a list of privileged maintainers would interrupt this.

It's unsurprising that a group of people who has worked quite hard to make this possible would be uninterested in creating a scheme whereby they are now responsible for determining which of their users creations is legitimate. Nix attracts users who are interested in that sort of thing, so let it be a userspace problem.

If you want to curate a list of trustworthy packages and work with their developers to set up a chain of trust that starts in a yubikey extends to a signature in a flake output, then I'll help because that sounds like useful work, but I wish you would stop criticizing a brick for not already being a house.


My criticism is not so much on NixOS itself. If it wants to be a fun, composable, and easy to work on with minimal participation friction, fair enough. The yolo approach to contribution integrity has clearly resulted in fast development of ideas that might have been unable to grow in other distros.

I suppose where my, perhaps misdirected, anxiety comes from is that I run a security consulting company and see NixOS being used as-is in high risk applications, to compile binaries responsible for protecting peoples property or safety, and on the workstations of production engineering teams. Places where it has no business being because supply chain integrity is a non goal.

Maybe you are right though, and the answer is not trying to add supply chain security practices onto a community that does not want them, but to create a security focused fork of that distro that can inherit all that great community work and be a drop-in replacement for NixOS in environments where supply chain security is of critical importance.

I tried and failed to get buy-in for even -optional- expression integrity support back in 2018 and gave up on nix after that. https://github.com/NixOS/rfcs/pull/34

I did prototype a git multisig solution in the years following that though: https://git.distrust.co/public/supsig https://git.distrust.co/public/git-sig

It is already being used in production by some as nothing better exists atm.

A security focused Nix fork that imports, reviews, and git-sig (or similar) signs commits from NixOS and has signature verification built into the package manager, is probably the only way forward, and I would be willing to ally with others interested in this.


"A fork of nix" sounds awful drastic. Seems like you just need a curated list of packages which conform to some standard re:

- getting a dev sig when they get the code

- checking that sig

- adding a packager sig

And then you need to somehow inject a check before the user comes in contact with the outputs:

> are these signers in my trusted list?

Your users can then enable dev-sig-mode and point their sig checker at the list of keys. Hopefully that's less than whatever a fork of the whole OS entails.

Rather than framing it as a move towards signatures for all of NixOS, I'd frame it as you have a community of users who are willing to maintain a list of trusted keys and work with developers to standardize signature hand-off, and you want to add experimeny features to serve the little pocket of Nixdom that you're carving out for those users.

Avoid anything that smells like you're expecting NixOS act as an authority over which packages are trustworthy and for its devs do the political work of maintaining that list on behalf of the users. Many of us have landed at NixOS because we want less of that top-down nanny business, but we probably like the idea that users would themselves configure such a list.

I can maybe help come up with a standard flow for the signature hand-off, but the harder thing will be intercepting the myriad ways that a user might come in contact with derivation outputs and putting a sig check in their path. You may need another ally that knows the guts better than I do, not just the packaging side of things.

Hopefully we don't have to tamper with nix-env and nix-shell and `nix flake build` and `nix flake run` and `nix flake develop` all separately. Hopefully there's some common place where inbound bits can be checked regardless of how the user has asked for them.


I went to Nix liking everything about it except the fact there was no way to even -optionally- add signatures to NixOS core component contributions and package expressions, let alone verifying either of those as an end user.

I was informed the way to get changes into Nix was go ask the Nix Gods in the form of an RFC. I did that, and there was some interest, but ultimately it descended into bike shedding about optimal signing schemes and the idea was ultimately rejected by said Nix Gods. I felt very nannied to be honest, and felt I had to give up on nix.

I have had no viable alternatives but to use Arch and Debian for high security build systems across the industries I service and write all sorts of kludgy tools to hash pin packages to get deterministic results since that is what I can prove the integrity of back to the authors.

If you feel you have the social capital with nix leadership to try another RFC for an optional signature verifier hooked into nix user tools for both nix core and expressions, with optional signing from either authors or community members, that would be welcome progress and would certainly make me give the path of directly working with the nix community (vs forking) a second look.

My ideal would be to point nix at a repo of public keys, and nix would not download or execute anything whose supply chain did not start with those keys. This would be incredible for say docker build containers that only need small subsets of packages like compilers for a start... and then eventually a full bootable set of packages supported.

I would happily review any such proposals are capable of meeting threat models requiring high supply chain security.

Contact info on https://lance.dev


I certainly don't have that social capital at right now, but I do intend to be hacking around in these areas in the future. If I get something useful working then I will probably be in a position to build that capital.

Because now that we have reframed it as something that puts users in explicit control of their trust graph and not something that would make NixOS like all the other distros, it's a feature that I'd like to use.

----

I'm working on a data annotation layer which I intend to apply as a filesystem and use nix and my test case. Edits are conceived of as annotations, so there's this sense that every file can be built by following a path of edits from the empty file. This makes for slow reads, but near-instant copies (you're just adding a new pointer at the same position in the edit graph as the file you're copying). It's a strange design choice, but it would solve some problems when using nix flakes with large repositories (everything gets copied into the nix store and it can take a while).

Signatures are exactly the sort of thing I was imagining living in an annotation. Ideally, the annotation adheres to patterns in the data and not the named file, so if the package applied a patch which invalidates the signature then the signature doubles as a link to the original, which can be diffed with the patched version and the user can be shown why it fails, not just that it fails. It's a long shot but that's the dream.

With a bit of luck I'll find a more elegant way to handle this without playing whack-a-mole with each user facing utility that builds a derivation. Maybe some kind of dashboard which shows you which files the system is rendering and whether they have associated signatures (or other metadata, known malicious, etc). The challenge, in the signature case, will be knowing which files are ok to be unsigned and which need to fail on read if not signed. Certainly we can't require a separate signature for every file we render.

It might be a long time coming though, this work proceeds on weekends and holidays and it's pretty far from a useful state. I'm still fiddling with tuning the rolling hash based fragmentation algorithm such that files are constructed out of right-sized fragments which end up being reused if the files are similar.


There are content-addressed derivations that don't require specifying the hash up front either.


> The developers github account was compromised due to a sim swap on their email account while they were on vacation, and someone pushed a fake commit as that person.

...which is why it's irresponsible to sign packages unless you have a strong enough relationship with the developer (or are that developer) such that you would notice the malicious release. Signing packages without that level of trust creates a false sense of security in the user.

The Nix approach is to solve the packaging problems in a transparent and verifiable way so that the users can more quickly move from scrutinizing the package to scrutinizing the underlying software--which is where the scrutiny is most helpful anyway.

> we should question the motivations of anyone not willing to do something so basic to protect users.

There are a lot of valid criticisms you could reach for about Nix or its community, but lazy about security just isn't one of them. Our strategy involves different trade-offs than is typical, but that doesn't make them negligent or malicious.


If a Github account was compromised, the attacker would still be unable to sign with the key that user has historically used. If all that was done was pin keys of developers and sound alarm bells if the key changes or changes revert to being unsigned, then this is still of significant value as a first step.

I agree most of the value of signing happens if we establish a web of trust with signed keys of known developers with a history of behaving non maliciously -and- also have signed code review by people in the same trust network to limit risk of coercion, or malware stealing yubikey taps on an endpoint.

Also, saying they are lazy about security is unfair. They just invested in areas of security most distros ignored, but sadly at the expense of the security most distros get right. The regression feels irresponsible, but as I said in my other post maybe we need to separate concerns between the AUR-style rapid development expression repohsitory and a new separate one select packages graduate to that have web of trust with well known maintainer/reviewer keys.

I could get behind that.


What distros do this? As far as I can tell Debian doesn’t anymore (the build system signs builds), Fedora doesn’t, and Arch…I can’t tell but I don’t think it does. But maybe I’m wrong.

The NixOS build system signs all build outputs with its key and the signature is verified upon download. If you’re paranoid, Nix at least makes it easy to just build everything from source.


Yes packages are reproduced and signed. I get that and this is fantastic. That is not the big gap I am talking about which is making sure the expressions themselves even use the correct sources and are not tampered with. Making sure the central signing key is not compromised is pretty critical too, but that can be mitigated with independent reproduction servers that also sign, requiring a minimum of 3 before a new package is trusted.

As for Debian, maintainers must still all be part of the web of trust with an application process and a registered key in the web of trust to contribute packages. https://wiki.debian.org/DebianMaintainer#step_5_:_Keyring_up...

Arch also still has each maintainer sign the packages they maintain, and there is a similar approval process to earn enough trust in your keys to become a maintainer, so there is a decentralized web of trust. Then you add reproducible builds on top of that to almost complete the story.

Signed reviews would remove the remaining SPOFs in debian and arch but sadly no one does this. Nix however does not even allow -optional- key pinning or verify signatures of any authors that choose to sign expressions or their commits, so NixOS is way behind Arch and Debian in this respect.


Ah so you’re talking about the package sources (e.g. Nix files, PKGBUILD, etc.). I guess that’s closer to commit signing, I think? Well Nixpkgs maintainership is much more bazaar than cathedral, for better or worse, and that stymies some of that. However most Nixpkgs maintainers do not have commit access at all. Nixpkgs is at least moving the right direction (I think) by limiting the power of committers to submit changes without review. Should Nixpkgs committers sign their commits? Probably, yea.


Very impressive milestone, congrats to those who made this possible!

> [...] actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created.

Can anyone shed some light on the fix for "how the ISO was created"? I attempted making a reproducible ISO a while back but could not make the file system create extents in a deterministic fashion.


For NixOS, it's in the 'how did we reproduce' section of the article: the last step of that process produces the iso in the ./result/iso directory.

It sounds like what you're looking for is the commands that that build invoked, but I'm not sure what step you're looking for. For example, the xorriso invocations are at https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/make-...


Don't you have to fake the system time to do this? The time often ends up inside the binaries one way or another.


Indeed time stamps are probably the most common sources of indeterminism. So common that a de-facto standard variable to fake a timestamp has been implemented in many compilers:

https://reproducible-builds.org/docs/source-date-epoch/


Could you name an example of how (and for what reason) this might happen?


Typically part of a "version string":

    $ python3
    Python 3.10.7 (main, Jan  1 1970, 00:00:01) [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
Perhaps a relic from when software had to be manually updated?


On NixOS, I think the release time or commit time is used:

    $ python3
    Python 3.10.11 (main, Apr  4 2023, 22:10:32) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
That is more useful than the build time.


How is that possible? Is nixpkgs an input to the Python derivation? Or do packagers "hard code" a value every time they modify the Python build code? Automated tooling that sets it after pull requests? Something else? :-)


GCC respects SOURCE_DATE_EPOCH, and Nixpkgs has specific support for setting that environment variable: https://github.com/NixOS/nixpkgs/blob/92fdbd284c262f3e478033... (although I haven't proved that this is actually how it works for cpython's build).

Irrelevant spelunking details follow:

That string is output by cpython to contain the contents of the __DATE__ C macro (https://github.com/python/cpython/blob/fa35b9e89b2e207fc8bae... which calls to https://github.com/python/cpython/blob/fa35b9e89b2e207fc8bae... which uses the __DATE__ macro at https://github.com/python/cpython/blob/fa35b9e89b2e207fc8bae... ).

Cpython is defined in nixpkgs at https://github.com/NixOS/nixpkgs/blob/92fdbd284c262f3e478033... which I imagine (but haven't proved) uses GCC.


Thank you! Setting SOURCE_DATE_EPOCH to the most recent file timestamp found in the source input is a clever hack.


The source for the cpython build is the release tarball (https://github.com/NixOS/nixpkgs/blob/master/pkgs/developmen...).

In that case, NixOS sets SOURCE_DATE_EPOCH (which I suspect will be picked up by the python build) to the latest timestamp found in that archive (https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-supp...)


2023-04-04T22:10:32 is the timestamp of Python-3.10.11/Misc/NEWS from https://www.python.org/ftp/python/3.10.11/Python-3.10.11.tar...


GCC embeds timestamps in o/gcno/gcda files to check they match.

It's mostly annoying as gcov will actively prevent you from using gcda files from a different but equivalent binary than what generated the gcno.


You would just either not include the timestamp at all in all builds, or set 0, so the build date is 1970 everywhere.


Wouldn't this help solve the problem Ken Thompson wrote about in 'reflections on trusting trust?' If you can fully bootstrap a system from source code then it's harder to have things like back-doored compilers.


It indeed helps, but it is not a full 'solution': you could still in theory envision elaborate backdoors in the 'environment' in which the ISO is built. If you really want to 'solve' the problem describe there, you could look into Diverse Double Compiling (https://dwheeler.com/trusting-trust/) or bootstrapping the entire environment (https://bootstrappable.org/) - see also the 'Aren’t there bootstrap problems with the above approach?' section of the post.

Reproducing the build already goes a long way in making such attacks increasingly unlikely, though.


I've lived in the Red Hat ecosystem for work recently. How does this compare to something like... Fedora Silverblue? Ansible? Fedora Silverblue + Ansible?


The closest equivalent to the nixos ISO builder and reproducibility related to it in the fedora ecosystem is osbuild / imagebuilder - https://www.osbuild.org/guides/introduction.html

Imagebuilder claims reproducibility, but as far as I know it mostly installed rpm packages as binaries, not from source, so it's not really proper reproducibility unless all the input packages are also reproducible.

If the descriptions of building packages from source, building distro images, and reproducibility in the linked thread didn't make sense to you, you're probably not really the target audience anyway.


Nix is a declarative OS, where you describe what the OS should look like, instead of Ansible where you give the OS steps to follow. Silverblue and Nix are orthogonal aside from being Linux distributions--Silverblue is attempting to change how software is delivered using only containers on an immutable host.

If you're interested in an Ansible alternative that uses Jsonnet and state tracking to somewhat mimic Nix, check out Etcha: https://etcha.dev


> Nix is a declarative OS

I think precision is important.

"Nix" refers to the package manager (and the language the package manager uses).

Whereas it's "NixOS" that's the OS which makes use of Nix to manage the system configuration.


Thank you. This is important. Too bad our website doesn't make it clear at all.


Ansible makes mutable changes to the OS, task by task.

Nix is immutable. A new change is made entirely new, and only after the build is successful, all packages are "symlinked" to the current system.

Fedora Silverblue is based on ostree [1]. It works similarly like git, but on your root tree. But it requires you to reboot the whole system for the changes to take effect. Since Nix is just symlinked packages, you don't need to reboot the system.

More detailed explanation here [2].

[1]: https://github.com/ostreedev/ostree

[2]: https://dataswamp.org/~solene/2023-07-12-intro-to-immutable-...


This is a great explanation of the technical differences between the available options. What are the practical differences - is one option better from a maintenance and usability standpoint for creating systems that are reproducible?


I love that there are people out there who cares about things like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: