
Abusing SHA-1 collisions for Chromium updates - lelf
https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/networking/browsers/chromium/update.nix#L96
======
Scaevolus
If I'm understanding this glorious hack correctly, this allows Nix to
determine if Chrome has an update available:

1) Nix network access is only allowed for things where you guarantee the
hashes of what's output. This is generally used to do things like "download
[http://example.com/release-1.2.tar.xz](http://example.com/release-1.2.tar.xz),
it will have SHA1 93f3025c7802a1a11e4f16186089b583ef1095b8"

2) There are known pairs of strings that have equivalent SHA1 hashes.

3) To determine if a fetch would succeed in a "pure" way, write a network-
accessing function that will return a "true" or "false" string with identical
hashes, then you can use that (supposedly deterministic) string to return a
(nondeterministic!) True or False to the caller.

4) This is used to run a command like `curl -s -L -f -I
[https://commondatastorage.googleapis.com/chromium-browser-
of...](https://commondatastorage.googleapis.com/chromium-browser-
official/chromium-74.0.3729.169.tar.xz) `. If the command succeeds, we know we
can use this version of the browser to update.

I don't know how it reads the channel version data without running into the
same determinism issues.

Here's the latest update to the hack, moving from MD5 to SHA1:
[https://github.com/NixOS/nixpkgs/commit/ed8f3b5fa3cebfc3662a...](https://github.com/NixOS/nixpkgs/commit/ed8f3b5fa3cebfc3662ad5fff098567616220cf8)

~~~
DumbBoy
Even after reading this 10 times I still don't understand what this hack is
about.

I understand these things independently: 1) SHA1 collision weakness 2) Nix
checking package SHA1 when updating packages 3) Chromium returning different
SHA1 for each download

Seriously, what is this?

~~~
mbrock
Someone made a Nix thing that's supposed to check whether a new version of
Chrome is available, and if so, generate an updated package.

While doing so, they found themselves needing some kind of "tryFetch" function
that would return true or false depending on whether a certain URL is
reachable.

But there's no such function in Nix. Why not? Because you're not really
supposed to do stuff like that in the Nix paradigm which is all about
determinism.

So... they really wanted to do it anyway, so they invented a clever hack,
probably too clever.

What can you do in Nix? Well, you can make a package that downloads a certain
URL and uses the downloaded result, provided that all the observable outputs
of that package are deterministic. So after the package's build script runs,
Nix verifies that the result matches a hash specified in the package
definition. If the hash doesn't match, the package fails to evaluate.

So this hacker decided to make such a package that tries to download the
Chrome update and results in the boolean information about whether the update
was available. But the result needs to have the same hash in both cases.
That's where a hash collision comes in handy.

So this hacky build script uses a couple of well-known PDF files that both
have the same SHA1 hash. If the update exists, it gives PDF 1, otherwise it
gives PDF 2.

The update script then depends on that hack package. It "installs" that
package, and then checks whether it actually contains PDF 1 or PDF 2, and now
it knows whether the Chrome update was available or not.

~~~
kowdermeister
> Why not? Because you're not really supposed to do stuff like that in the Nix
> paradigm which is all about determinism.

That's the part I always feel when looking at FP languages. They sound good on
paper, examples are very tempting, but when reality kicks in to this pure,
predictable, perfect world, it turns into a massive pain.

~~~
derefr
The key point is that the Nix scripting environment (think “a Ports manifest”)
is an intentionally restrictive language intended to have deterministic
results for every operation. It’s not intended to be a Turing-complete
programming language; that’s the whole point.

What the author of the script has done here, you’re _supposed_ to do by
writing code in some other language that generates a Nix manifest (or just by
hand-rolling a Nix manifest.) And yet, the author here managed to get Nix to
non-deterministically generate Nix.

Considered from the point of view of Nix’s goals, this is more an “exploit”
than a truly-needed feature.

~~~
mbrock
Total nitpick but Nix is indeed Turing complete since it contains the untyped
lambda calculus; you can write arbitrary recursive functions.

------
grhmc
Something I think is a bit misunderstood here is what this is for, and why it
was made.

The person who wrote this (aszlig) is truly a hacker, contorting tools to do
interesting things -- and indeed, you see his cleverness at work here.

This is an update script, which is run by hand, by a maintainer, to update the
Nix expression for Chromium.

This script could have been written in Python, Bash, Perl, or PowerPoint for
all it matters. Or even Brainfuck using aszlig's own brainfuck interpreter
written in bash[0]. This script could be deleted today and make no difference
to using Nix. It is not part of any running system.

This doesn't mean Nix is insecure or broken, and no abuse like this is used
anywhere in the Nix package set.

[0]
[https://github.com/aszlig/shellfuck/blob/master/bf.sh](https://github.com/aszlig/shellfuck/blob/master/bf.sh)

------
tomsmeding
Summary by _max_bo_ in the twitter thread
([https://twitter.com/stdlib/status/1136629930060636162](https://twitter.com/stdlib/status/1136629930060636162))
linked elsewhere:

> chrome wants things different. nix wants things to be the same, always.
> nixpkg programmer invokes horrifying incantation from the depths of hell to
> trick nix into thinking two different things are the same.

~~~
mbo
Hey that's actually me :) I think there's better explanations elsewhere in
that thread (and this one as well). It a was vastly simplified description for
a non-technical friend of mine that I replied to.

------
craigds
Why would Chrome return non-deterministic bytes for a versioned release?

~~~
vbezhenar
So they could return special build for people targeted by NSA and nobody would
be surprised that their download is different.

------
ktta
Relevant thread:
[https://twitter.com/stdlib/status/1136629930060636162](https://twitter.com/stdlib/status/1136629930060636162)

~~~
obituary_latte
O/T but anyone else have problems with mobile.twitter on iOS? It never seems
to work for me - just times out. Both WiFi and cellular.

Edit: this is really strange. Tried opening in a new tab and as it was stuck
loading (progress bar halted at about 1/4), I opened “view source” app I have
installed. It showed empty html/head/body tags and upon closing, twitter
loaded up immediately.

~~~
EvilTerran
I have a similar problem on my firefox-on-android: when I load a
mobile.twitter link, it usually pinwheels for a bit, then throws up an error -
either "you're rate limited" or just "something went wrong" (or, occasionally,
it just pinwheels forever); then, regardless of which way it failed, loading
the page a second time almost always works - but only if I do a proper reload
from the browser UI, the in-page "try again" button doesn't help.

I figure something's timing out on the first attempt, but various bits get
downloaded & cached before it fails, meaning the second try runs fast enough
to not hit the same timeout.

~~~
ikeboy
It's a referer header issue. Caused by the mitigation in
[https://blog.twitter.com/engineering/en_us/topics/insights/2...](https://blog.twitter.com/engineering/en_us/topics/insights/2018/twitter_silhouette.html)

------
userbinator
Although collisions have been found for both SHA1 and MD5 (and the latter are
much easier to generate), as far as I know it's still extremely difficult to
generate a file with a given hash ("preimage attack")... I wonder if that
property might make it useful for some things, since generating collisions
even for MD5 is still much more time-consuming than verifying them, and
generating a colliding pair means that finding another block with the same
hash is still near-impossible.

~~~
tialaramex
> as far as I know it's still extremely difficult to generate a file with a
> given hash

I think this might give a false impression, "still extremely difficult" here
means that you'd need to do on average far more than 2^120 hash operations to
do this. If you had a billion computers that could do this one billion times
per second, and you did that for a billion seconds (ie several decades) you're
still doing a billion times too few operations to have any significant chance
of finding your pre-image.

However, because all Merkle–Damgård hashes (MD4, MD5, SHA-1, SHA-256, SHA-512)
are subject to length extension, you don't need to generate new collisions you
can just length-extend an existing one at leisure.

That is - take the | operator to be a string concatenation. At a high level if
I know A, B such that MD5(A) = MD5(B) then I can trivially choose x and have
MD5(A|x) = MD5(B|x) without doing any of the collision work again.

"Sponge" constructions like SHA3 don't do this. If somebody finds a way to
make A, B such that SHA3(A) = SHA3(B) then almost certainly SHA3(A|x) is not
equal to SHA3(B|x) for arbitrary values of x because the output from SHA3() is
a summary ("squeezing out" the sponge) not the internal state itself.

------
rurban
This is would I would call a hack. Finally a real hack in the wild after many
years!

------
ga-vu
This doesn't actually look like a SHA-1 collision attack, tbh

