Hacker News new | past | comments | ask | show | jobs | submit login

Thanks to autoconf, we're now used to build scripts looking like gibberish. A perfect place to hide a backdoor.



This is my main take-away from this. We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them. (Debian already does something like this I think.)


> We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them.

I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.

Distributing these generated autotools files is a relic of times when it could not be expected that the target machine would have all the necessary development environment pieces. Nowadays, we should be able to assume that whoever wants to compile the code can also run autoconf/automake/etc to generate the build scripts from their sources.

And other than the autotools output, and perhaps a couple of other tarball build artifacts (like cargo simplifying the Cargo.toml file), there should be no difference between what is distributed and what is on the repository. I recall reading about some project to find the corresponding commit for all Rust crates and compare it with the published crate, though I can't find it right now; I don't know whether there's something similar being done for other ecosystems.


One small problem with this is that autoconf is not backwards-compatible. There are projects out there that need older autoconf than distributions ship with.


The test code generated by older autoconf is not going to be work correctly with newer GCC releases due to the deprecation of implicit int and implicit function declarations (see https://fedoraproject.org/wiki/Changes/PortingToModernC), so these projects already have to be updated to work with newer autoconf.


Typing `./configure` wont work but something like `./configure CFLAGS="-Wno-error=implicit-function-declaration"` (or whatever flag) might work (IIRC it is possible to pass flags to the compiler invocations used for checking out the existence of features) without needing to recreate it.

Also chances are you can shove that flag in some old `configure.in` and have it work with an old autoconf for years before it having to update it :-P.


There are, and they need to be fixed.



Easily solved with Docker.

Yes, it sucks to add yet another wrapper but that’s what you get for choosing non backwards compatible tools in the first place. In combination with projects that don’t keep up to date on supporting later versions.


A yes, let's replace "binary" blobs in release archives with even bigger blobs of mistery goo.


Why do we distribute tarballs at all? A git hash should be all thats needed...


> Why do we distribute tarballs at all? A git hash should be all thats needed...

A git hash means nothing without the repository it came from, so you'd need to distribute both. A tarball is a self-contained artifact. If I store a tarball in a CD-ROM, and look at it twenty years later, it will still have the same complete code; if I store a git hash in a CD-ROM, without storing a copy of the repository together with it, twenty years later there's a good chance that the repository is no longer available.

We could distribute the git hash together with a shallow copy of the repository (we don't actually need the history as long as the commit with its trees and blobs is there), but that's just reinventing a tarball with more steps.

(Setting aside that currently git hashes use SHA-1, which is not considered strong enough.)


except it isn't reinventing the tarball, because the git hash forces verification that every single file in the repo matches that in the release.

And git even has support for "compressed git repo in a file" or "shallow git repo in a file" or even "diffs from the last release, compressed in a file". They're called "git bundle"'s.

They're literally perfect for software distribution and archiving.


People don't know how to use git hashes, and it's not been "done". Whereas downloading tarballs and verifying hashes of the tarball has been "good enough" because the real thing it's been detecting is communication faults, not supply chain attacks.

People also like version numbers like 2.5.1 but that's not a hash, and you can only indirectly make it a hash.


> I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.

This and the automated A/B / diff to check the tarball against the repo, flag if mismatched.


The backdoor is in an .m4 file that gets parsed by autoconf to generate the configure script. Running autoconf yourself won't save you.


That's not entirely true. autoreconf will regenerate m4/build-to-host.m4 but only if you delete it first.


It seems like this was the solution for archlinux, pull directly from the github tag and run autogen: https://gitlab.archlinux.org/archlinux/packaging/packages/xz...


it's shocking how many packages on distros are just one random tarball from the internet with lipstick


Oh come on, please, let's put autotools out to pasture. I've lost so much of my life fighting autotools crap compared to "just use meson".


As long as we also exterminate it's even more evil brother - libtool.


I don't think it would help much. I work on machine learning frameworks. A lot of them(and math libraries) rely on just in time compilation. None of us has the time or expertise to inspect JIT-ed assembly code. Not even mentioning that much of the code deliberately read/write out of bound, which is not an issue if you always add some extra bytes at the end of each buffer, which could make most memory sanitizer tools useless. When you run their unit tests, you run the JIT code, then a lot of things could happen. Maybe we should ask all packaging systems splitting their build into compile and test two stages, to ensure that a testing code would not impact the binaries that are going to be published. I would rather to read and analysis the generated code instead of the code that generates it.


I always run autoreconfig -ifv first.


In this case it wouldn't be sufficient. You had to also delete m4/build-to-host.m4 for autoreconf to recreate it.


Thanks. At least the distro I use (Hyperbola) it's LTS bound, so is not affected.


Maybe it's time to dramatically simplify autoconf?

How long do we need to (pretend to) keep compatibility with pre-ANSI C compilers, broken shells on exotic retro-unixes, and running scripts that check how many bits are in a byte?


Not just autoconf. Build systems in general are a bad abstraction, which leads to lots and lots of code to try to make them do what you want. It's a sad reality of the mismatch between a prodecural task (compile files X, Y, and Z into binary A) and what we want (compile some random subset of files X, Y, and Z, doing an arbitrary number of other tasks first, into binary B).

For fun, you can read the responses to my musing that maybe build systems aren't needed: https://news.ycombinator.com/item?id=35474996 (People can't imagine programming without a build system - it's sad)


Autoconf is m4 macros and Bourne shell. Most mainstream programming languages have a packaging system that lets you invoke a shell script. This attack is a reminder to keep your shell scripts clean. Don't treat them as an afterthought.


I'm wondering is there i.e. no way to add an automated flagging system that A/B / `diff` checks the tarball contents against the repo's files and warns if there's a mismatch? This would be on i.e. GitHub's end so that there'd be this sort of automated integrity test and subsequent warning? Just a thought, since tainted tarballs like these might be altogether be (and become) a threat vector, regardless of the repo.


Maybe the US Government needs to put its line in the sand and mandate the end of autotools. :D


It looks like they are trying to get rid of C, so maybe in luck!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: