I'm not a fan of the "Reproducible tarballs" section, because it's explicitly about pre-processing the source code with autotools, instead of distributing a pure, unaltered git snapshot (which `git archive` can already generate in a deterministic way).
The section following then mentions signing the pre-processed source code, which I think is the wrong approach. It makes a difficult situation because of how strongly some people encourage signed source code, yet I think autotools is part of the build process and should run on the build server (and double checked by reproducible builds). If people pre-process the .orig.tar.xz they upload to Debian, this pre-processing won't be covered by reproducible builds because it happens undocumented.
The patch for "reproducible tarballs" is quite involved[0] and has rookie mistakes like "pin a specific container image using `@sha256:...` syntax, but then invoke `apt-get update` and `apt-get install` to install whatever Debian ships at that time".
6. Patching the hell out of a project without pushing fixes upstream.
7. Inability or failure to source upstream from multiple independent sources, compare them, and verify chain/web of trust using cryptographic signatures.
8. Not following reproducible build guidelines.
9. Not using build caching like sccache.
10. Not building from reproducible sources periodically on the client-side to verify binaries are identical to those built by others.
11. Dependency hell of traditional (non-nix) packages importing zillions of packages.
Maybe you could get involved by pointing out the mistake and proposing the alternative? I imagine that downstream can't easily switch to another distribution method without notice.
The GitHub implementation of git archive does it's best to be deterministic. Some reproducible build systems like e.g. Bazel heavily rely on that.
GitHub had a bug early last year[0] that broke that determinism and it caused a huge uproar. So through a mixture of promises to individual projects and just so many projects relying on it, GitHub's git archive has been ossified into being deterministic (unless they want to lose a lot of goodwill among developers).
In my experience, yes. Provided it is done with a known git binary etc.
Best to containerize workflows like this with hash-locked deps so they can be easily verified by others even far in the future with any OCI compatible toolchain.
> We do not provide checksums for the tarballs simply because providing checksums next to the downloads adds almost no extra verification. If someone can tamper with the tarballs, they can probably update the webpage which a fake checksum as well.
In the past, I have occasionally downloaded a tarball from one mirror, and verified against a checksum from a different mirror (or from the official website). Back when release announcements were primarily made on mailing lists, using a mailing list archive to get a copy of the "real" checksum was also a possibility.
I definitely remember being advised to get the checksum from a different source than the tarball, a number of different times.
There are many ways to induce download of the wrong file, not just tampering with the origin website. An old-school MITM remains a real hazard and in some organisations/nations it is directly enabled by the local infrastructure.
Ultimately this is why we also sign the checksum.
(Now you have to protect your signature verification supply chain. It never ends)
I don't understand his attitude towards "anonymous maintainers". Right now ALL contributions to curl are pseudonymous, including his own. There is just no such organization as "Curl". Want to see non-anonymous contributors - go to Google/Intel/etc, they ask for IDs when they hire employees.
> A (to me) surprisingly large amount of contributions are done by people who do not state a full real name
Again, strange attitude, given that he personally had legal issues with US in the past, the reasons for which were never disclosed[1].
Typical good developers are not Rambo. When law enforcers come to them and force them at gunpoint to make them commit/add a new maintainer, they should not expect active resistance. Minor reminder: curl is not just some http library, they maintain their own CA list[2]. They don't need any intricate hidden lines for backdoor, CA list is a backdoor on its own.
> Right now ALL contributions to curl are pseudonymous, including his own.
How so? He has a rather extensive autobiography page[1] with his full name (Daniel Stenberg), plus over a dozen photos of himself[2] (and videos), and his git commits are under his full name and email address (hidden on github.com web interface just like on all repos, however). Perhaps you mistook his handle of "badger" for a pseudonym?
> he personally had legal issues with US in the past, the reasons for which were never disclosed
You thought his blog posts about it[3] were insufficiently descriptive? Or you blame him somehow for the fact that the US government never explained itself?
Of course I know name "Daniel Stenberg". But at any moment there is no guarantee that next release published under his account was actually done by person named "Daniel Stenberg". Single person outside of organization is a huge weak point.
Let me explain by comparing curl with XZ (and it was not me who started this comparison). XZ was initially released in 2009 by Lasse Collin. But how would you know that "Lasse Collin" actually exists? Is it a single person or a group? Maybe some papers? (unlikely, for papers see Igor Pavlov's dissertation in LZMA[1]). Do we know his country? No, all that we know that Lasse was in some "hotel with limited internet connection" at the time of the incident (hello to Russian and Chinese prisons). Who would benefit from having an autobiography page, which nowadays can be made up in 5 seconds in ChatGPT?
> Or you blame him somehow for the fact that the US government never explained itself?
Not for that. In his "Administrative Purgatory" post he makes some assumptions on reasons of visa denial. But the whole picture he showed there is that he is just a dutiful developer of a popular opensource library. Also he owns a domain. And that was enough for the US to equate him with criminals. Then why didn't he take a lesson from this that in order to remain free (this applies to traveling as well), it's acceptable to separate hobbies for information security, cryptocurrency, export-restricted technologies like HPC, AI and quantum computing and other legally complex stuff under a separate profile?
Pseudonymity in open-source (as in any other charity) is long overdue to be accepted as the norm. Let's stop building security theater on autobiographies from other people's personal pages.
> Who would benefit from having an autobiography page, which nowadays can be made up in 5 seconds in ChatGPT?
So now you're saying that the fact that (scare-quotes) Daniel Stenberg has a public autobiography page and photos and gives video-recorded in public makes you less convinced he's a real human individual?
OK, sure, you can never be sure that your neighbor or coworker or Linus Torvalds is not a CIA or FSB agent. But you are basing this on nothing.
Every package is bootstrapped all the way up from a heavily reproduced 256 byte assembly seed (Stage0/live-bootstrap) and built by two or more maintainers with confirmed matching hashes, and with signatures from well known keys.
100% of commits in our repos are also signed, and every PR merge also comes with a signed merge commit by the reviewer.
If the curl team wants a similar level of supply chain security for their own official binaries or Dockerfile we would suggest cloning our Containerfile and hash-locking all dependencies to the latest stagex release (please by all means reproduce, verify, and sign that too!).
This should be easy for curl maintainers to build and get identical hashes for their own release binaries to the ones we build and sign.
Stagex could also be used to produce source tarballs with generated files from similarly deterministic/multi-signed versions of autotools etc.
I am not convinced there is a good case for having any auto-generated files in the source archives though. Force distros to bring their own autotools, etc., imo.
All of this is nice but none of it stops the style of hack that happened to xz besides the fact that it's very unlikely Dan is going to be bullied into handing over maintainership to someone else.
Every other element on the list can be attacked if the maintainer themselves is the malicious party carrying out the attack and it is being performed with the level of sophistication in the xz attack.
So ya, I'm not saying it's security theater in all contexts, but if the context is the attack vector used in the xz attack, it's security theater.
> unlikely Dan is going to be bullied into handing over maintainership to someone else.
Most everyone has a love for money, their own health, or people they wish to protect. This weakness will be exploited by those that understand a tampered version of curl in the right cryptocurrency company CI job can net billions of dollars.
We absolutely must never trust any single person in a widely relied upon software supply chain for their own safety and ours.
Any trust in any single human or machine is a very real security vulnerability.
Even without jumping to coercion, people let personal email domain names expire, they get phished, their workstation is compromised by a new steam game or IDE plugin with an RCE, etc.
I have performed and been read in on a number of attacks like this in the wild as part of pentests. It is often brutally easy.
We have to move away from prerun autotools scripts in tarballs entirely, but honestly I don't expect curl to be at the forefront of that.
It's not clear (to me) that Lasse knew that Jia Tan was pseudonymous. I agree with him that it's going to be harder to target curl though, as it's a larger project (maintainer-wise) and he's personally releasing all of the tarballs.
> A (to me) surprisingly large amount of contributions are done by people who do not state a full real name.
Why would someone state their full real name on the internet to contribute to curl? If someone wants to boost their CV, they can just write "curl, Linux kernel, GCC contributor" and provide the link to my GitHub profile upon request. Yes, someone in HR will (gasp) learn that orthoxerox is actually called Boris Kozlov in meatspace, but there's no need to broadcast this information.
Some projects want to know your real identity, and/or have you sign some sort of CLA or copyright transfer under it before accepting your submissions. I think it's dumb, but it happens.
I went as far as building code in PRs and reusing those blobs when the code gets merged in into main. I find it super weird that in many projects the code gets rebuild and packaged on merge to main, and when released, all while a lot of underlying processes aren't locked.
apt install xyz
can be different.
a python requirements.txt as well. (which is why we use Poetry).
C build systems are so cursed. Autotools is obviously horrible, but projects like curl cling to it, because somehow every other C build system sucks too.
It's frankly amazing how deeply fragmented and backwards-looking C is that this continues to be a problem. In the last 30 years, a countless number of C build systems have been created, and yet an ugly pile of obscure macros outlives them all.
Some packages are more central than others. For instance, the basic container image for a popular Linux distribution has only around 150 packages (and yes, it includes curl, which counts as two packages because the library and the command are packaged separately). A text-only "server" install of that distribution probably adds another hundred or so (for things like kernel, firmware, bootloader, ssh daemon, and so on). Even a full desktop install of that distribution, with lots of extra junk installed on top, doesn't get anywhere near 8000 packages.
Until you install a single node application, or consider all the applications running on the package developers system, or the systems of the developers who provide the packages installed on the developers systems.
I imagine the total number of trusted packages would be substantially higher.
The section following then mentions signing the pre-processed source code, which I think is the wrong approach. It makes a difficult situation because of how strongly some people encourage signed source code, yet I think autotools is part of the build process and should run on the build server (and double checked by reproducible builds). If people pre-process the .orig.tar.xz they upload to Debian, this pre-processing won't be covered by reproducible builds because it happens undocumented.
The patch for "reproducible tarballs" is quite involved[0] and has rookie mistakes like "pin a specific container image using `@sha256:...` syntax, but then invoke `apt-get update` and `apt-get install` to install whatever Debian ships at that time".
[0]: https://github.com/curl/curl/pull/13250/files