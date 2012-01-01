Hacker News new | comments | show | ask | jobs | submit login
Linus' reply on Git and SHA-1 collision (marc.info)
Extremely relevant discussion on stackoverflow from 2012 on how would git handle a SHA-1 collision, someone called Ruben changed the hash function to be just 4 bits padded with zeroes and checked what git actually does on collisions: http://stackoverflow.com/a/34599081/308851

Linus has toned down a lot from a decade ago.

> You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected".

> That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead").

> If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles.

> And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_.

> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too.

> I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue.

http://www.gelato.unsw.edu.au/archives/git/0504/0885.html

Do you know if git objects' size header was designed to deal with a possible collision or does it serve another purpose as well?

Just some context - git calculates an object's name by his content in the following way. Say we have a blob that represent a file who's content is 'Here be dragons', then the file name would be:

  printf "blob 17\0Here be dragons\!\n" | openssl sha1
  # => a54eff8e0fa05c40cca0ab3851be5aa8058f20ea
So the object gets stored in '.git/objects/a5/4eff8e0fa05c40cca0ab3851be5aa8058f20ea'

The PDF's released as proof are the same size, so if size and checksum are the same, git could certainly be fooled at checkout time.

So I could imagine in a large source file, it would be possible to have some malicious code plus some data in comment blocks to make the hash match. That said, the PDF's are 422k, and I think it's a much more difficult attack on more typical, smaller size source files that one would typically check out and build from git. Maybe Xcode .nibs and that sort of tool output could become relatively easy attack vectors, though.

The size is useful when you want to know the length of an object without unpacking it. The size is uncompressed length and not the size on disk.

For example, if you want to stream the blob over http you can use it to set the content-length. Otherwise, you have to use chunked transfer.

Actually AFAIK this specific attack doesn't let the two files have different lengths, so the size header is irrelevant anyway.

I prefer the entire thread: http://marc.info/?t=148786884600001&r=1&w=2

the ratio of relevant new relevant to this problem information/new information is low.

If a low signal/noise ratio is still the purpose of information then the thread is less interesting than Linus mail:

- if we add size it will make forgery harder - yes SHA1 should be replaced

What linus is missing is people rewriting history. This will not be a concern for git, but certainly will for any crypto currency relying on SHA1 in a close future. (Hint this transaction belonged to me)

I'm not aware of any crypto currencies that rely on SHA1.

Can you name one that does?

I wonder what would the effect be if there was one.

Bitcoin is a popular one. 2 sha-1 rounds.

No, it uses SHA-256

What does that have to do with git?

Normally I eye roll whenever I see that there is some sort of reply to anything from Linus, but this time I agree with him.

