Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft is Abandoning SHA-1 Hashes for Updates (columbia.edu)
121 points by sohkamyung 28 days ago | hide | past | web | favorite | 58 comments



The ars link is a little more detailed, specifically it points out that SHA-1 will be replaced with SHA-2, something the current article fails to mention, I was left very confused about what microsoft would be actually doing.



Agreed. This article led me to believe that there won't be any hashing/signing/protection at all for these updates, until I read the Ars article.


> However, given an existing file and hence its hash, it is not possible, as far as anyone knows, to generate a second file with that same hash. This attack, called a "pre-image attack", is far more serious. (There's a third type of attack, a "second pre-image attack", which I won't go into.)

I think the attack just described is the second pre-image attack. The second file is the second pre-image. A pre-image attack would be to start with only the hash, no file, and create a file with that hash.


>However, given an existing file and hence its hash, it is not possible, as far as anyone knows, to generate a second file with that same hash.

Interesting...I thought google did exactly this in 2017. Although, they controlled the PDF prefix.

https://security.googleblog.com/2017/02/announcing-first-sha...


Yeah, this is the difference between a collision and a preimage attack. A collision is, given a hash function, come up with two documents with the same hash, but you can control both documents. A preimage attack is, given a hash function and one document (or one hash), find another document with the same hash. That's much harder - you don't have the ability to force the algorithm's internal state into something convenient.

A preimage attack is what you worry about if you think the person giving you the signature might be untrustworthy, e.g., they're currently promising one thing but they might swap it out with something else later. A collision attack is what you worry about even if you don't think the person giving you the signature is untrustworthy and you're expecting someone else to swap out the document before it gets to you. So there are a good handful of scenarios where you don't care about preimage attacks but you do care about collisions - e.g., if Microsoft is signing Windows Update files, presumably they don't need to conduct a preimage attack to send you a malicious update, they can just directly sign a malicious update. But you don't want other people sending you malicious updates that appear to be signed by Microsoft.


> A preimage attack is what you worry about if you think the person giving you the signature might be untrustworthy, e.g., they're currently promising one thing but they might swap it out with something else later. A collision attack is what you worry about even if you don't think the person giving you the signature is untrustworthy and you're expecting someone else to swap out the document before it gets to you

You mean vice versa right?

Untrustworthy person: could make a collision before giving you the input and mess you around.

Trustworthy person doesn't do that (because they are trustworthy). But someone might find a second pre-image attack and do it later on


Yes, sorry. Please swap "preimage attack" and "collision" in that entire paragraph: I meant to say there were cases where you don't care about a collision (as demonstrated for SHA-1) but you do care about a preimage (not yet demonstrated, but certainly a risk).


Thanks for the clarification. I thought Columbia's definition of a pre-image attack was incorrect, due to the vagueness of the verbiage "existing file". Ie, I considered the chosen document created by google as an "existing file".


Is your second paragraph reversed? It seems like a collision is where you'd worry about someone having prepared two documents that they could swap.


Yes, you're right, sorry, I completely swapped the two terms in that entire paragraph.


Nope, as it says right at the top, that's just a Collision.

In metaphor terms, that CWI Amsterdam + Google announcement is about them making two documents that are different but which you can't tell apart (using SHA1). They could fool you into thinking you had one when it was really the other. But, for a Bad Guy this is only useful under very specific circumstances.

If a shady guy offers you what seems to be... his own self-published novel "Interdimensional Hat Monkeys 4" you don't care whether it's the real thing or a "fake", who cares?

Whereas Second Pre-Image lets you find a convincing forgery for any document at all.

Now the shady guy offers you a suitcase full of what appear to be genuine $20 bills. It really matters if those are fake! A suitcase full of real ones are worth a lot of money, whereas a suitcase of fakes is a recipe for jail time.

Online, the main application of Collision is obtain bogus certificates. You make two documents A and B, A is an ordinary seeming true statement which you can get certified, while B is something outrageous nobody would certify, but you have chosen them to have the same SHA1. If the Certificate signatures use SHA1, you can get a CA to sign document A, then just attach that signature to document B.

So that's why SHA1 was not allowed for new certificates since 2016. Old ones don't matter because bad guys can't travel back in time and make old documents to try this trick on them.

We did have some other counter-measures, to make this trick impractical for real bad guys in the Web PKI. Most important was, we required the CA to choose random serial numbers (that's why the "serial number" on your certificate is just random and doesn't gradually increase in newer certificates) and this means a bad guy applying for a certificate can't guess what the serial number will be in advance, which makes it harder to come up with the pair of documents.


To further clarify:

> You make two documents A and B, A is an ordinary seeming true statement which you can get certified, while B is something outrageous nobody would certify,

This does imply that the attacker only has limited choice with respect to A and B. They can’t just be random bytes: A has to be not only a syntactically valid X.509 certificate, but one that a legitimate CA will generate (based on the attacker’s inputs in a Certificate Signing Request) and sign, while B has to be a syntactically valid certificate which is useful to the attacker to have a signature for. However, in practice, collision attacks tend to let the attacker arbitrarily pick parts of both messages, while other parts are chosen by the algorithm and do look like a large amount of random binary gunk. In this case, most of the fields of an X.509 certificate are not adequately controllable and/or not large enough to stick the gunk into, but there’s one exception: the public key field, which just happens to be a large blob of binary data that’s chosen by the attacker and expected to look random. So that’s what was used, in the famous proof-of-concept MD5 certificate forgery from 2006 that resulted in the random serial number requirement being added. More details:

https://blog.cloudflare.com/why-its-harder-to-forge-a-sha-1-...


Correct. You're right, the article is mistaken.


It's just good housekeeping at this point. Attacks never get worse, and there have been structural problems with SHA-1 for a long time. Plus, it does make sense for a company like Microsoft with a large user base to protect against malicious insiders. While I was there, employees in the Patch Tuesday program had been approached about backdooring updates. I don't think creating a colliding update would necessarily be the vector for such a thing (and removing SHA-1 doesn't necessarily protect against it) but overall its just a good idea.


> While I was there, employees in the Patch Tuesday program had been approached about backdooring updates.

What was the general policy about working with law enforcement to prosecute such attempts, to the extent they were traceable?


Microsoft had a large amount of policy, and it could be that there were some policy that applied, but if there was some it didn't permeate the culture and daily routine in the trenches there (from my personal and limited experience).

Both backdoor requests I became privy to while there (2.5 years, circa 2011-2014) came from a certain US TLA, were law enforcement would have been an awkward step, though it seems reasonable to me that there were more requests, including from other organizations, but I was never made aware of those.

There were some foreign spies caught in MSRC in the same timeframe. More of an extradite than a prosecute type situation, though.


[flagged]



If you're working at a tech company, especially one the size of Microsoft, the only possible answer to that is "tell your boss, and cc whatever privacy / legal /... Head officer too".

If Jonny Law goes after you for that the company will cover you, because while they might be OK to take the company decision to cooperate, they can absolutely not have random rogue employees doing that. Since the threat of law is what may make those employees comply, they need to make is as little threatening as possible, thus full legal support.


I would modify this to omit your "boss". If you think you are being approached to commit IP theft or a crime use a voice phone to call the chief legal officer's office and ask to immediately speak to a corporate attorney. Relate the incident to them and follow their instructions. Speak to no one else first.

The reason to omit all others is that at this point you don't know several things: is there an already existing investigation that you are now part of? Is your boss or anyone you work with implicated or suspected in it? And so on.

The only other thing you may wish to consider is if you want to discuss it with your personal attorney first. If you think you may have some legal exposure (it is very difficult to know, as a lay person, wether you do or not) you may want expert advice before informing the company. In any case proceed quickly, do not delay in informing the company unless so advised by your attorney.


I think that the more likely reason is that many corporate security standards (think PCI compliance etc etc) are starting to blanket ban SHA-1, and it was cheaper to switch away from it than continue having to carve out policy exceptions with their large customers who are on support contracts for Win7


If there is no Win7 support for SHA-2, why can't they add just add support for it?



Because they don't want people using 7 anymore I guess?


Historically, NIST had provided transition guidance to Federal agencies circa March 2006 to begin migrating to SHA-2[1], officially deprecated SHA-1 for digital signature generation use between 2011 and 2013[2], and outright disallowed it with exception for the same circa November 2015[3].

FWIW, the memo from the horse's mouth[4]:

> To protect your security, Windows operating system updates are dual-signed using both the SHA-1 and SHA-2 hash algorithms to authenticate that updates come directly from Microsoft and were not tampered with during delivery. Due to weaknesses in the SHA-1 algorithm and to align to industry standards Microsoft will only sign Windows updates using the more secure SHA-2 algorithm exclusively.

I'd imagine orchestrating this push is a real cluster. Would be interesting hear perspective from the inside.

[1] https://csrc.nist.gov/projects/hash-functions/nist-policy-on...

[2] https://doi.org/10.6028/NIST.SP.800-131A

[3] https://doi.org/10.6028/NIST.SP.800-131Ar1

[4] https://support.microsoft.com/en-us/help/4472027/2019-sha-2-...


> updates are dual-signed using both the SHA-1 and SHA-2 hash algorithms

> Due to weaknesses ... and ... industry standards Microsoft will only sign Windows updates using the more secure SHA-2 algorithm exclusively.

This seems like a half step to me. I assume the used both as a transitional process: use both until all consumers of the signed content support the better, then drop the older standard. I wonder why they've not done the same and replaced SHA-1 with a more recent algorithm, so they can easily deprecate SHA-2 if/when needed at a later date?

All the code and processes already support multiple hashes/signatures so that wouldn't be a problem. Or is the next accepted standard not yet fully decided yet? Or do they think SHA-2 will last long enough that SHA-3 (assuming that is the generally accepted next step) will be supplanted at least once before SHA-2 needs deprecating so there is no point implementing that now?


> SHA-3 (assuming that is the generally accepted next step)

That's just the thing, SHA-3 isn't the generally accepted successor to SHA-2. The "go to" hash function is still SHA-2, or BLAKE2.


For those who are paranoid, but can't move away from SHA-1 for whatever reason, consider using SHA-1DC. It's compatible with SHA-1, but will barf on the known collision attack against SHA-1: https://github.com/cr-marcstevens/sha1collisiondetection

It's what Git uses by default, of course there's no guarantee that new SHA-1 attacks won't be discovered, but it's better than nothing.


To be clear, what this is doing is _markedly more sophisticated_ than just comparing to a fixed hash we know is bad and rejecting it, and so "new SHA-1 attacks" would likely actually still trip this.

The idea in all these MD-family (which SHA1 and SHA2 are members of) collisions is the same, using one or two input blocks you trap the hash function in an awkward place so that fewer state bits than normal matter, and then skewered in this way you calculate one final block that will collide it. The SHA-1DC is watching for that situation which its developers call a "disturbance vector" and so it will detect all attacks based on the same approach and "fix" the hash at a small CPU premium for all hashes.

We know for MD5 that independent (presumably nation state given the costs) elements did a similar but different attack that we only found out about after the published MD5 collision - but it trips the same detection (for MD5 in that case) because it's based on the same mathematical approach, even though none of the actual hash values involved were the same. A defence of this sort would have worked, even against an adversary with nation state resources (in this case probably Israel or the US).


All true, thanks for the elaboration. The only minor thing I'll add is that while its default mode of operation is to munge its internal state when detecting such colliding data and returning a hash that's not the same as what stock SHA-1 would return, it can also be made to just return an error. That's the mode Git uses it in:

https://github.com/git/git/blob/v2.21.0-rc2/sha1dc_git.c#L10...


Likely it's just a compliance thing. remove sha-1 and you can stop explaining it all the time to auditors why it's there.


I might not be understanding this right... But can't they dual sign with both SHA1 and SHA2 with no loss of functionality?

New clients would use (and require) the SHA2 hash. Old clients could still use the SHA1 hash, but with the risk of a faked update, but that's probably still better than no update.


https://news.ycombinator.com/item?id=19206527

Sounds like they were already dual signing, & are now phasing out SHA1. Only as secure as the weakest signature you accept


But can't they dual sign with both SHA1 and SHA2 with no loss of functionality?

Furthermore, wouldn't trying to collide both SHA2 and SHA1 be even more close to impossible than just trying to collide SHA2 (which is already currently regarded as practically impossible)? Perhaps newer clients should verify both hashes?


No. Amateurs bring this up _all the time_. Depending on exactly what's going on, it may be no harder to attack _both_ than just the hardest of the two component hashes, plus you just wasted a bunch of effort and maybe introduced extra security bugs with your extra complexity. So don't do this.

The compatibility argument makes sense, but the "maybe it's safer" argument doesn't have any traction at all.

And the politics always ends up being "We don't want to endorse this unsafe thing, let's just remove it".


Depending on exactly what's going on, it may be no harder to attack _both_ than just the hardest of the two component hashes

We have seen published collisions for MD5 (which are incidentally nearly trivial to generate these days), and I'm aware of the one for SHA-1 (which took much longer to generate) --- but have there been any dual-collisions found? According to you, they wouldn't be that hard to generate, but I haven't been able to find any examples of dual collisions, even for trivially-collidable-by-themselves combinations like MD5+MD4.


Note: This article was written by Steve Bellovin, a leading researcher in security and networking.


I hope they put effort into making sure the user experience for those who don't install the update in time is reasonable.

When I did dads old laptop out of the attic and boot it up, I don't want to be faced with the inability to install new updates. At least make an error message that points to a help article saying how to resolve the issue.


Isn't Blake2 the best hash function? If yes why won't everybody just switch to it?


Probably because it isn't officially sanctioned, which apparently was the reason the more common libcs never added support for bcrypt directly.

-

"The security departments of some customers look at recommendations they get and evaluate the deployed systems based on this. In a few places this led to the problem that the NIST warns people indiscriminately about the use of MD5..." (Note, this was from 2007 when md5 would be in roughly the same state that sha1 is now in.)

http://www.akkadia.org/drepper/sha-crypt.html https://access.redhat.com/articles/1519843#ok-ok-but-why-not...


SHA-2 is more widely supported and the de-facto standard these days. I'm not surprised that an entity like Microsoft ends up being conservative with its choices of algorithms. I guess they could've gone for SHA-3 but I suppose even that is a bit too "bleeding edge" for MS.


Microsoft dual signs at the moment with SHA1 and 2. SHA1 is used by older versions of Windows and SHA2 by newer versions. This flips a switch so older versions of Windows use SHA2 too.


Because the different hash functions have different trade-offs.

For example not all hash functions are cryptographic hash functions and can be much faster.


Blake2 is a cryptographic hash though and for updates it's also fairly fast on most systems, comparable with SHA2, so it shouldn't be a problem.


Also aren’t SHAx hardware accelerated on x86 cpus?


Only SHA-1 and SHA-2 and only on current-gen Intel CPUs or all AMD Ryzen CPUs.


Why not replace it by MD-6?


MD-6 never gained popularity and was eliminated early in the SHA-3 competition since it lacked proofs that it could resist differential attacks.

Microsoft are quite correct to use SHA-2 (conservative choice) but SHA-3 would also be fine. Bringing up MD-6 at all seems arbitrary.


I think it would be quite the day to take the hashes of Microsoft's ISO images for Windows, and compare it with the source code to check if there is government backdoors in the code.


> I think it would be quite the day to take the hashes of Microsoft's ISO images for Windows, and compare it with the source code

Wat? How would you translate an ISO hash into anything meaningful to compare to the actual source code (that you don't have access to, but that's beside the point) ?

Does microsoft enforce reproducible builds[0]? Even if they did, you couldn't reproduce it (no access to source). When you hash a large binary, like an ISO, you cannot deconstruct what, eh, sub-binaries it is made of, and you cannot magically decompile those, eh, sub-binaries into source code.

0. https://wiki.debian.org/ReproducibleBuilds


Microsoft enforces reproducible builds since 2017.


Do you have a source for that (no pun intended)? Even if they did, it's impossible for anyone outside of microsoft to verify because, well, it's proprietary. So for those of us who trust microsoft as far as we can throw them, it doesn't really do much.



Yes, you can take a look at the source code and recompile the ISO to match the hash. If the compiled ISO hash doesn't match, then it isn't the same code.


No, you can literally do none of that with microsoft software.


Thats not how backdoors work.

Backdoors are subtle bugs introduced into the existing codebase. They are compiled into the build.

They aren't added to the binary after the compilation by a malicious third party.


I'm not sure the term "backdoor" is that well defined.

In Ken Thompson's "Reflections On Trusting Trust" Turing award lecture from 1984, he talked about "trojan horses" being inserted into the compiler in such a way that any code run through the compiler results in a program that has behaviour which is unable to be determined from examining the source code.

They are "compiled into the build" by a malicious _compiler_, not from malicious bugs introduced into the source code.

It's quite obvious that such behaviours could easily be things which deserve the name "backdoor".

(It's only short, and _well_ worth the time needed to read it if you haven't come across it before: https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7... )


you can't even do that with their open source VSCode, since they add in branding/telemetry/licensing to the actual binaries they distribute. so you have forks that strip it out:

https://github.com/VSCodium/vscodium




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: