Abusing SHA-1 collisions for Chromium updates

Scaevolus · on June 7, 2019

If I'm understanding this glorious hack correctly, this allows Nix to determine if Chrome has an update available:

1) Nix network access is only allowed for things where you guarantee the hashes of what's output. This is generally used to do things like "download http://example.com/release-1.2.tar.xz, it will have SHA1 93f3025c7802a1a11e4f16186089b583ef1095b8"

2) There are known pairs of strings that have equivalent SHA1 hashes.

3) To determine if a fetch would succeed in a "pure" way, write a network-accessing function that will return a "true" or "false" string with identical hashes, then you can use that (supposedly deterministic) string to return a (nondeterministic!) True or False to the caller.

4) This is used to run a command like `curl -s -L -f -I https://commondatastorage.googleapis.com/chromium-browser-of... `. If the command succeeds, we know we can use this version of the browser to update.

I don't know how it reads the channel version data without running into the same determinism issues.

Here's the latest update to the hack, moving from MD5 to SHA1: https://github.com/NixOS/nixpkgs/commit/ed8f3b5fa3cebfc3662a...

DumbBoy · on June 7, 2019

Even after reading this 10 times I still don't understand what this hack is about.

I understand these things independently: 1) SHA1 collision weakness 2) Nix checking package SHA1 when updating packages 3) Chromium returning different SHA1 for each download

Seriously, what is this?

mbrock · on June 7, 2019

Someone made a Nix thing that's supposed to check whether a new version of Chrome is available, and if so, generate an updated package.

While doing so, they found themselves needing some kind of "tryFetch" function that would return true or false depending on whether a certain URL is reachable.

But there's no such function in Nix. Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.

So... they really wanted to do it anyway, so they invented a clever hack, probably too clever.

What can you do in Nix? Well, you can make a package that downloads a certain URL and uses the downloaded result, provided that all the observable outputs of that package are deterministic. So after the package's build script runs, Nix verifies that the result matches a hash specified in the package definition. If the hash doesn't match, the package fails to evaluate.

So this hacker decided to make such a package that tries to download the Chrome update and results in the boolean information about whether the update was available. But the result needs to have the same hash in both cases. That's where a hash collision comes in handy.

So this hacky build script uses a couple of well-known PDF files that both have the same SHA1 hash. If the update exists, it gives PDF 1, otherwise it gives PDF 2.

The update script then depends on that hack package. It "installs" that package, and then checks whether it actually contains PDF 1 or PDF 2, and now it knows whether the Chrome update was available or not.

tiborsaas · on June 7, 2019

> Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.

That's the part I always feel when looking at FP languages. They sound good on paper, examples are very tempting, but when reality kicks in to this pure, predictable, perfect world, it turns into a massive pain.

derefr · on June 7, 2019

The key point is that the Nix scripting environment (think “a Ports manifest”) is an intentionally restrictive language intended to have deterministic results for every operation. It’s not intended to be a Turing-complete programming language; that’s the whole point.

What the author of the script has done here, you’re supposed to do by writing code in some other language that generates a Nix manifest (or just by hand-rolling a Nix manifest.) And yet, the author here managed to get Nix to non-deterministically generate Nix.

Considered from the point of view of Nix’s goals, this is more an “exploit” than a truly-needed feature.

mbrock · on June 7, 2019

Total nitpick but Nix is indeed Turing complete since it contains the untyped lambda calculus; you can write arbitrary recursive functions.

mbrock · on June 7, 2019

Nix's use of determinism actually has an important purpose, it's not just some arbitrary annoying restriction, it's what makes the whole system work properly. This script is a kind of funny meme that should probably be deleted, and anyway isn't crucial to the NixOS system at all, just a minor convenience and probably the hack was just fun to make.

Generally speaking these sandbox determinism requirements in Nixpkgs/NixOS are not annoying, they are a crucial feature: you know what you get when you install something. But yeah, it is a constraint, and sometimes when you try to package some weird program where the Makefile does some arbitrary network operations, you might find it annoying -- and you can locally disable the sandbox -- but the whole Nix philosophy is that build scripts should be reproducible, so then you just have to fix it.

captaincrowbar · on June 7, 2019

What's the point of this? If you can't download something without knowing its hash in advance, then you can never download a new version of Chrome or anything else, so why do you care whether one is available?

strangecasts · on June 7, 2019

Nix tries to achieve deterministic, reproducible builds, but Chrome's update process is non-deterministic because of (3). This hack lets the update check appear deterministic.

zerga23 · on June 7, 2019

Can you explain how? I don't understand how a hash collision of two PDF files would help with this. Surely if it wanted to download the file at 'https://commondatastorage.googleapis.com/chromium-browser-of..., it would need the actual hash of that specific file?

edit: I just read mbrock's comment above which explained it perfectly. Didn't realize it was testing the url to see if there was an actual update available, I thought it was doing the actual update.

DumbBoy · on June 7, 2019

So it basically returns "true" (with hash H) from network-accessing function if "curl" succeeds, and "false" (with hash H) if "curl" fails?

tomsmeding · on June 7, 2019

The cheap way to return more complex data than a boolean is to return more booleans -- i.e. split the version number up in bits and return them one by one. Not saying you should do this though :p

WorldMaker · on June 7, 2019

This is funny because it is more the opposite case: they wanted the simplicity of returning a Boolean but the system doesn't allow that in this particular part of the pipeline. So they built a much more complex data structure to implement that simple Boolean.

Kenji · on June 7, 2019

Allowing broken hashes makes the entire system insecure. If you use broken hashes for security, you might as well not use hashes at all. The entire point of a hash is that collisions are pretty much impossible to generate.

Dylan16807 · on June 7, 2019

It depends on what you're trying to prevent. Only the provider of the hashed file can set up this type of collision. If you trust them, it's still secure. Third parties can't collide with an innocent file.

smilliken · on June 8, 2019

Even if you trust the package author is not malicious, you may not trust that they are infallible. They could use this trick to sneak in a bug fix which has an unintended consequence.

One of the great things about Nix is that it keeps packages honest about versioning; you can't sneak in an updated package with the same version number like is possible with other package systems that don't pin to hashes.

Dylan16807 · on June 9, 2019

> They could use this trick to sneak in a bug fix which has an unintended consequence.

They would have to have set up the hash collision beforehand. It's no longer an 'innocent file'. It's nothing something you can decide to do at a later point.

dillonmckay · on June 7, 2019

What assumptions about DNS and certificate pinning are we making?

tialaramex · on June 7, 2019

There aren't any assumptions about DNS or certificate pinning here at all? That you think it's relevant strongly suggests you haven't the faintest idea what's going on.

Second pre-image attacks aren't possible (a theoretical pre-image attack exists for MD5 with difficulty just marginally better than brute force, this is a further good reason to stop using MD5 but isn't an immediate problem). So only the person who made file X1 could have produced file X2 with the same MD5() or SHA1(), since they could have produced both with a deliberate collision, whereas anybody else would be obliged to create a second pre-image.

dillonmckay · on June 7, 2019

My apologies.

grhmc · on June 7, 2019

Something I think is a bit misunderstood here is what this is for, and why it was made.

The person who wrote this (aszlig) is truly a hacker, contorting tools to do interesting things -- and indeed, you see his cleverness at work here.

This is an update script, which is run by hand, by a maintainer, to update the Nix expression for Chromium.

This script could have been written in Python, Bash, Perl, or PowerPoint for all it matters. Or even Brainfuck using aszlig's own brainfuck interpreter written in bash[0]. This script could be deleted today and make no difference to using Nix. It is not part of any running system.

This doesn't mean Nix is insecure or broken, and no abuse like this is used anywhere in the Nix package set.

[0] https://github.com/aszlig/shellfuck/blob/master/bf.sh

tomsmeding · on June 7, 2019

Summary by _max_bo_ in the twitter thread (https://twitter.com/stdlib/status/1136629930060636162) linked elsewhere:

> chrome wants things different. nix wants things to be the same, always. nixpkg programmer invokes horrifying incantation from the depths of hell to trick nix into thinking two different things are the same.

mbo · on June 7, 2019

Hey that's actually me :) I think there's better explanations elsewhere in that thread (and this one as well). It a was vastly simplified description for a non-technical friend of mine that I replied to.

craigds · on June 7, 2019

Why would Chrome return non-deterministic bytes for a versioned release?

vbezhenar · on June 7, 2019

So they could return special build for people targeted by NSA and nobody would be surprised that their download is different.

idle_zealot · on June 7, 2019

A/B testing?

ktta · on June 7, 2019

Relevant thread: https://twitter.com/stdlib/status/1136629930060636162

obituary_latte · on June 7, 2019

O/T but anyone else have problems with mobile.twitter on iOS? It never seems to work for me - just times out. Both WiFi and cellular.

Edit: this is really strange. Tried opening in a new tab and as it was stuck loading (progress bar halted at about 1/4), I opened “view source” app I have installed. It showed empty html/head/body tags and upon closing, twitter loaded up immediately.

EvilTerran · on June 7, 2019

I have a similar problem on my firefox-on-android: when I load a mobile.twitter link, it usually pinwheels for a bit, then throws up an error - either "you're rate limited" or just "something went wrong" (or, occasionally, it just pinwheels forever); then, regardless of which way it failed, loading the page a second time almost always works - but only if I do a proper reload from the browser UI, the in-page "try again" button doesn't help.

I figure something's timing out on the first attempt, but various bits get downloaded & cached before it fails, meaning the second try runs fast enough to not hit the same timeout.

ikeboy · on June 7, 2019

It's a referer header issue. Caused by the mitigation in https://blog.twitter.com/engineering/en_us/topics/insights/2...

ikeboy · on June 7, 2019

Typically need to reload. Twitter needs to recognize the referer header which doesn't work properly on some platforms, if you reload from Twitter the header is fine.

See https://blog.twitter.com/engineering/en_us/topics/insights/2...

jen729w · on June 7, 2019

I often get the old generic “an error occurred” nonsense. I’m not signed in. Tin-foil-hat-me wonders if I’d see it as much if I were...

userbinator · on June 7, 2019

Although collisions have been found for both SHA1 and MD5 (and the latter are much easier to generate), as far as I know it's still extremely difficult to generate a file with a given hash ("preimage attack")... I wonder if that property might make it useful for some things, since generating collisions even for MD5 is still much more time-consuming than verifying them, and generating a colliding pair means that finding another block with the same hash is still near-impossible.

tialaramex · on June 7, 2019

> as far as I know it's still extremely difficult to generate a file with a given hash

I think this might give a false impression, "still extremely difficult" here means that you'd need to do on average far more than 2^120 hash operations to do this. If you had a billion computers that could do this one billion times per second, and you did that for a billion seconds (ie several decades) you're still doing a billion times too few operations to have any significant chance of finding your pre-image.

However, because all Merkle–Damgård hashes (MD4, MD5, SHA-1, SHA-256, SHA-512) are subject to length extension, you don't need to generate new collisions you can just length-extend an existing one at leisure.

That is - take the | operator to be a string concatenation. At a high level if I know A, B such that MD5(A) = MD5(B) then I can trivially choose x and have MD5(A|x) = MD5(B|x) without doing any of the collision work again.

"Sponge" constructions like SHA3 don't do this. If somebody finds a way to make A, B such that SHA3(A) = SHA3(B) then almost certainly SHA3(A|x) is not equal to SHA3(B|x) for arbitrary values of x because the output from SHA3() is a summary ("squeezing out" the sponge) not the internal state itself.

rurban · on June 7, 2019

This is would I would call a hack. Finally a real hack in the wild after many years!

ga-vu · on June 7, 2019

This doesn't actually look like a SHA-1 collision attack, tbh