Hacker News new | comments | show | ask | jobs | submit login
Reproducible Arch Linux (vdwaa.nl)
243 points by nwah1 11 months ago | hide | past | web | favorite | 48 comments



How does this relate to what NixOS is doing? I've been casually following it. I know they build everything from source and then build results get hashed. The hash is not just the output, but also contains all input of the build. The assumption is that once a package and its dependencies are built, it will end up producing the same output.

Is the difference then that reproducible Arch or say Debian would ensure that the source when compiled would produce the same binary output on _any_ the machine of the same type. As opposed to say maybe NixOS would just make the output different if it runs on a machine with a hard drive mounted on a different directory or the cpu has more cores?


Nix allows you to perfectly reproduce the on-disk inputs going in to a build, and also completely isolate the build by heavily sandboxing it. But build processes, being a large and complicated function, may not give you bit-for-bit identical output even if all their on-disk inputs are bit-for-bit identical. So it's a good start, but not enough.

Distributions like Debian and Arch need to put in a fair bit of effort to reach the same level as Nix. But, in a sense, reaching that level is the "easy" part - it's an interesting technical challenge.

The hard part of the reproducible builds effort, the part which Nix doesn't give you, and the part which developers from Debian, Nix, and other distros are still actively working on, is ensuring that bit-for-bit identical on-disk inputs, along with sandboxing, will give you bit-for-bit identical outputs. That is a task which requires cleaning up the build process of the entire open source world.


We are all still living on 30 year old package management systems and we need a modern approach.

I still have hope for flatpak. I want a system where the folder is the application and deleting the folder is the way you remove applications. The devil though is in the details of how or what is shared.

Sadly flatpak and snap don't seem to have much in community support. MacOS and OSX seem to be the closest to this idea of the folder being the application. Trying Nix it is just impractical, but I liked the idea.


I still don't understand the fascination with flatpak, snap, etc. In my experience the package managers for Debian, Fedora, and Arch Linux work just fine. I really would prefer not to have 20 different minor versions of the same packages installed because the maintainers were lazy and used fat packages. It may be arguable to use fat packages for games (e.g., to support Steam) because it's my understanding that they have the most quirks of all programs. Nix and Guix do intrigue me though as a potential replacement of the various Linux package managers.


> Nix and Guix do intrigue me though as a potential replacement of the various Linux package managers.

I am looking at those as well. One appeal is they allow precise tracking of dependencies and seem to support Mac as well, so could set things up easier on developers machines.


> I still don't understand the fascination with flatpak, snap, etc. In my experience the package managers for Debian, Fedora, and Arch Linux work just fine.

It isn't fine for developers or for the Linux ecosystem.

Sure it works for users for installing what has been a complex development and packaging system.

The problem is exactly that it is hard to distribute applications in Linux.

> Flatpak is the next-generation technology for building and installing desktop applications. It has the power to revolutionize the Linux desktop ecosystem. https://flatpak.org/


Application developers aren't supposed to be distributing applications. Obviously this is unrealistic and fat packages are a pretty good solution but they're best positioned to be supplementary rather than replacement for disto package managers.

> The problem is exactly that it is hard to distribute applications in Linux.

For who? Distributions generally seem to like their own packaging processes. You can distribute a package on the AUR in a few lines of bash and it's filled to the brim with software because of it.


> For who? Distributions generally seem to like their own packaging processes.

Me and the lack of software on Linux. I am talking about making it so that Linux can finally get some killer apps that we can't get our hands on.

Flatpak is the replacement for applications and not OS specific packages. You would still use a package manager for OS related packages and for flatpak/snap and then flatpak/snap installs and uninstalls applications. Much better system for everyone.

The complexity for deployment and its complexity and diversity is the number one issue with Desktop Linux and applications. This is why we don't have Adobe applications or many others. I use several commercial programs and to them deb = Linux and I use OpenSUSE so I constantly have to go through hoops until they start supporting rpm, which they normally will start doing if they get enough traction in Linux.


> Flatpak is the replacement for applications and not OS specific packages. You would still use a package manager for OS related packages and for flatpak/snap and then flatpak/snap installs and uninstalls applications. Much better system for everyone.

Now this is the key difference I feel has been left out of the messaging surrounding flatpak. It's not meant for libraries and OS packages, just end-user applications? That makes more sense now.


Sadly this is totally lost. Flatpak actually works with repos. If people would try them they would actually know what this is about


Please forgive my ignorance because my experience with packing is limited PyPI. What makes building packages so difficult? Don't you essentially just include your binaries, related files, specify dependencies, maybe include pre or post processing scripts, and then finally build the package for the respective packaging system? I'm sure there are subtle differences in configuration and organization between the various systems, but that sounds more like an initial setup pain than something overly complex.


> PyPI. What makes building packages so difficult?

Well the achiles heel for Python is also deployment. The reason why we have so few applications is that each system has differences. RPM, DEB to other Distros or forget you just read this but (SystemD and not SystemD). With a flatpak you have a runtime and the runtime makes the hooks for the development.

Linux has a great tool for deploying different packages and it is hardly used. You build the package here and then it will rebuild it for other distros and systems. It could even make Windows and MacOS if they extend it. It is complex to do these things. https://build.opensuse.org/

With a flatpak or snap you just build the one package and it runs everywhere.


you just build the one package and it runs everywhere

If I had a Euro for everytime I heard that, or some variation on that, I would have a lot of Euros


As with every packaging approach we have so far, no.

Package manager have nuances, and so have flatpak and snap installations.

Also, add Windows and macOS into the mix. Down the drain it is. It's basically what Docker was trying to get out of the way. Funny thing is that actually Go and Rust might be more the solution to the whole packaging dilemma that and packaging solution.


https://chocolatey.org/ is Amazing on Windows and we have several things you can add and it makes for a great application distribution system.


But it is not the system default package manager, which is what I miss.


Containers are orthogonal to what the parent poster was talking about. Reproduceable builds are a build system problem, not a package distribution or package interoperability problem.

Even with containers, being able to reproducibly rebuild the containerized application from its sources if desired is still a challenge.


> Even with containers, being able to reproducibly rebuild the containerized application from its sources if desired is still a challenge.

1) flatpak runs on a runtime and the application would than be exactly the same on all systems. Just like a Java program or any other programming language that runs on a runtime.

2. flatpak and snap are not for servers, which containers makes more sense because they are solving a different problem than desktops'. So they are more like a sandbox like what we find on tabs in Chrome.

> Is Flatpak a container technology?

"It can be, but it doesn't have to be. Since a desktop application would require quite extensive changes in order to be usable when run inside a container you will likely see Flatpak mostly deployed as a convenient library bundling technology early on, with the sandboxing or containerization being phased in over time for most applications. In general though we try to avoid using the term container when speaking about Flatpak as it tends to cause comparisons with Docker and rkt, comparisons which quickly stop making technical sense due to the very different problem spaces these technologies try to address. And thus we prefer using the term sandboxing." https://flatpak.org/faq.html


Java bytecode and the other technologies you mentioned solve the problem of distributing a program so that it can run unmodified on various platforms.

Reproduceable builds are not about software distribution, they are about compilation. The Arch Linux maintainers would like to be able to verify that the Java bytecode that they are sending to their users was indeed built from the Java source files that they were expected to be built from. With reproduceable builds anyone can double check that the Java bytecode that is being distributed is the correct one by recompiling it on their own machine. Without reproduceable builds you need to trust that the person who compiled the file didn't do anything malicious (either intentionally or unintentionally)


> I want a system where the folder is the application and deleting the folder is the way you remove applications.

Why though? There is no difference in complexity between `package-manager uninstall package-name` and `rm -rf /apps/package-name`. If you implement the latter, it will probably be aliased to the former.

The biggest problem in any case is not the application's assets (which all package managers manage competently), but its state and configuration files. It took Windows about a decade of carrot and stick to get (most) applications to put their files in a few defined locations instead of all over the place.


> There is no difference in complexity between `package-manager uninstall package-name` and `rm -rf /apps/package-name`

The issue is reproducible and developer complexity. But in terms of complexity you are confusing user complexity and developer complexity. Developer complexity is still high and is the big barrier to Linux development.

Also we have the crazy apt-get update vs upgrade fiasco that chocolatey has gotten caught up in. They even made choco update be replaced by upgrade.


"We are all still living on 30 year old package management systems and we need a modern approach.... Trying Nix it is just impractical, but I liked the idea."

There's a causal connection between those two things....


> I want a system where the folder is the application and deleting the folder is the way you remove applications.

ROX has been doing that for like 15 years. http://rox.sourceforge.net/desktop/AppDirs.html


This has absolutely nothing to do with my comment or with reproducible builds. Flatpak does absolutely nothing to help reproducible builds. That's just a statement of fact: It works at a different level of the stack, and solves a different problem.


The main difference between the Arch/Debian approach and that of Nix and Guix is that Nix and Guix "normalize" the build environment using containers, whereas Arch, Debian, and others aim to achieve bit-reproducibility in spite of differences in the build environment.

Thus some of the issues that apply to Arch or Debian do not apply to Guix and Nix, but many apply to both (timestamps are an obvious example.)

For Guix we are currently at ~80% reproducible packages: https://gnu.org/software/guix/news/reproducible-builds-a-sta... . That's not an entirely fair comparison because, for example, Python packages in Guix include .pyc files whereas on Debian they don't (not sure about Arch).


You're close. For NixOS, I distinguish between "determinism" and "reproducible":

  - Deterministic: the same input always gives the same output source, no exceptions.

  - Reproducible: The steps to *create* some build result are always the same, but the results might not be bit-for-bit identical.
(Please note that for most people they use "Reproducible" to refer to what I call "Determinism", but this is only in the context of NixOS, so just go with me.)

NixOS is always reproducible. Any build can be exactly reproduced by another person, and the build will run the exact same steps. But it is not always deterministic.

This seems strange but it really isn't. For example, although I may always take the same steps to compile some software (let's say "./configure; make; make install"), that does not guarantee determinism -- for example, perhaps the 'make' stage will run `date` and then embed the date in the source (e.g. `gcc -DBUILD_DATE="$(date)"` or something). That breaks determinism.

However, Nix is always reproducible. If you run 'nix-build' today, tomorrow, or 5 years from now -- if you have the same description of the package, the build always proceeds the same way. In practice, this means "You need to have the same git revision of the nixpkgs source code". But if you have that -- nothing should ever change.

Of course you can monkey-wrench this and make up various weird counter examples to this ("My build system only compiles stuff when rand() % 4 == 10, hah!"), but that's the general idea. NixOS will always take the exact same steps -- down to the letter -- in order to build your software. But that doesn't guarantee its deterministic.

Most other systems are, in a sense, also reproducible, because you could just "run all the commands the same way, every time" -- but not in the highly automated way Nix is. In general a single hash identifies everything, to the point that I can literally recreate exact copies of a machine with a single command. I can reformat my laptop and have it back to the exact same software in 10 minutes, etc. It is also more nuanced than that: on something like Arch Linux, you and I may be running two Arch machines. They may feel identical. We might even both run './configure; make; make install' and the software will work similarly. But if I haven't run 'pacman -Syu' in week, and you ran it yesterday -- that isn't the same thing! Our environments are not reproductions of each other; they are only vaguely similar. Maybe you got a bugfixed glibc for example that I did not. If that was the case in NixOS, it would cause a rebuild (because one of the inputs -- some of the steps you must take to create something, including installing its prerequisites -- have changed).

In this case, "reproducible" does not mean "this piece of software is obtained by running this exact set of steps", that's too narrow. It means "this result is obtained by running the exact set of steps, for every single thing this result has ever depended on, all the way up the chain".

> As opposed to say maybe NixOS would just make the output different if it runs on a machine with a hard drive mounted on a different directory or the cpu has more cores?

No, this won't happen. CPU cores and the "location" of the source code are not included in the hash[1], they are not part of the input "source" to a build result.

[1] What actually happens is that the source code is copied into a location inside the "Nix store", and it is built from there. Same source code? Same destination. So in other words the first thing Nix really does is "create an build result from the input source code", i.e. a directory containing a copy of all the code, and that is given as the input to another step that actually builds the results, using the source as input. You could think of "building software" as two packages: one package, which is built by unzipping the source code and keeping the results, and another that is created by building the previously unzipped code.

So it uses the same mechanism as it always does: source code is just an input to something we want to "build", in this case, the executables. It's no different than "zlib" the library an input to "libpng" the library (because zlib needs libpng). It is just an "input" in an abstract sense.


Thanks for explaining. Great answer. Along with catern's sibling comment it cleared the confusion for me.


What do you think of NixOS vs GuixSD?


> Please note that for most people they use "Reproducible" to refer to what I call "Determinism"

Which I would argue makes fall more sense.

actually I think a better definition would be that determinism is bit-for-bit identical under controlled inputs. A reproducible build is the act of controlling those inputs to feed into a deterministic build.

I'm curious why you chose the definition you did, as proving a build is reproducible sounds impossible hard.

Of course I've always looked at it from the standpoint of "reproduce this binary from source, thus proving your source code (modulo compiler trust of course)".


> The hash is not just the output, but also contains all input of the build.

The hash doesn’t contain the output at all, it only contains the inputs. As such you know the hash of all packages before you start building anything. It would be difficult to do anything different as the package may contain an absolute path to itself.


Ah you're right, of course.


Tails is a smaller, specialized distribution that recently reached this milestone: https://tails.boum.org/news/reproducible_Tails/index.en.html

Given the goal of Tails to provide a private computing environment that leaves no traces on the host system after shutdown, it's especially valuable to be able to confirm that the source truly produces the same binary image that the site offers for download.

(If anyone from Tails is reading, please update libccid to 1.4.27!)


In the future, Arch also wants to make their iso also reproducible :-). But first the packages have to be reproducible.


For those like myself who have never heard of the term:

https://reproducible-builds.org


Headline is misleading – according to the article, 76% of the 17% of packages that are being tested are reproducible.


The blog is missing the link of the page. The awesome folks over at debian, mainly Holger Levsen, is helping Arch Linux achieve this with their infrastructure. The progress and status can be seen here: https://tests.reproducible-builds.org/archlinux/archlinux.ht...

Currently at 76% of 70% built packages.


for comparison, debian stretch on amd64 is at 94.1% of 24821 packages. glad to see the knowledge is being shared.

stats: https://tests.reproducible-builds.org/debian/reproducible.ht...


We work together with a few Debian developers involved with reproducible builds in the reproducible-builds project, so yes a lot of work is shared which is awesome :)


If it's a random sample, that seems to be a reasonable extrapolation (of course standard caveats apply).


http://a.co/a8yEzzt A Manga Guide to Statistics. No I'm not being snarky, it's a great read and I highly recommend it. Sampling, why it works and the philosophy behind statistics are all a bit unintuitive and so a light read is perfect for helping you develop that intuition.


I have a degree in mathematics, I'm fine on the statistics front. It's very likely to NOT be a random sample - you would only be testing the packages which have had effort put in to make them reproducible, for reproducibility. I would expect the non-sampled packages to be 0% reproducible, since the reason they were not tested is that nobody is trying to make them reproducible (yet).


>A Manga Guide

No thanks.


Well I didn't mean to claim it's 100% reproducible in the title. But we finally started working towards reproducible builds, we still need a way for users to easily reproduce builds, publish the results somewhere. And test with more differences, luckily Debian has done a lot of effort in this regard!


The more the merrier. I'll be sticking to NixOS in the meantime.


So far I have been experimenting with using nix package management on top of other OSes, how much extra do you feel like running NixOS adds? I've certainly run into missing packages a fair bit, so far nothing beats the coverage of the AUR.


I'm mostly using it as a base, as a hypervisor of sorts for docker and libvirt. Anything else I need I can access with those. State of the system declaratively described in a few files, what's not to like?

From limited experience I had with AUR - for not so rigorously maintained packages it's not much better than building from the source yourself.


My main issue so far has been that the spec language isn't as simple as I hoped it would be and there are many packages missing or not up to date.

The ability to try out a new package and safely downgrade has been exactly as it says on the tin.

So far I have primarily used it as an add on to other OSes and put it in the PATH before everything else. That way if something is missing in the nix packages I'll fall back to the other OS.

And yes the AUR has no standard of quality, but the coverage is greater than anything I have ever seen. Figuring out exactly what build system and assumptions a package might have made is something I only do once the AUR package has failed.


I have not heard of this concept for binary distributions. It's a great idea.

I've been living in Gentoo world for over a decade, so of course, every Gentoo system is going to be a little (or a lot) different. :-P




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: