Hacker News new | past | comments | ask | show | jobs | submit login
Breaking SHA256: length extension attacks in practice (kerkour.com)
75 points by randomint64 on May 24, 2023 | hide | past | favorite | 37 comments



This is a nice writeup on length-extension attacks, although I wish it had placed the takeaway further up: `H(s || m)` is almost always the wrong signature construction, even if the hashing function itself doesn't enable length extension attacks. You almost always want an HMAC.


> even if the hashing function itself doesn't enable length extension attacks

My impression was that H(s||m) was reasonable with e.g. SHA-3, assuming you have a fixed key size. The only downside I can think of is that you have to remember not to do it with variable length keys or with SHA-2. Am I missing something?


It's not unreasonable, although in the case of SHA-3 (Keccak), it's recommended to use KMAC. It pads the key and adds a domain separation prefix. Unlike HMAC, there's no double hashing.

https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...


I’ve seen recommendations to use HopMAC (roughly H(key, customization=H(message))), to thwart differential power analysis, reducing the need for protection to the shorter outer call: https://www.reddit.com/r/crypto/comments/llaqim/comment/go1w...


Kangaroo12(K12) is a reduced-round KECCAK(aka SHA3) hash. Their HopMAC recommendation is only for K12 (and presumably Marsupilami14), not general purpose like HMAC (it's also quite similar to HMAC[0]). There's also KMAC[1] which is based on full-round KECCAK, avoiding the double hash.

  HopMAC(K,M)=H(K || H(M))
  HMAC(K,M)=H(K || H(K || M))
  
[0]: https://datatracker.ietf.org/doc/draft-irtf-cfrg-kangarootwe... [1]: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...


Even if it's not required, the HMAC construction seem to generally make it much harder to exploit any weaknesses in the hash algorithm. For example SHA-1 is pretty broken by now, but HMAC-SHA1 has proven pretty resilient. (obviously don't use HMAC-SHA1, but it has hung on pretty well considering the state of SHA1)


Are we saying anything other than that preimage resistance on SHA1 has held up? Because that's true of MD5 too. It seems probable that H(k,m), for truncated H, will hold up?

The big reason I can think of not to use H(k, m) (or hand-rolled SHA3 KMAC) is that it will quietly fail open if you switch to a normal SHA2 hash.


Neither mention truncated hash


Ya, SHA384 is the way to go. By discarding some bits, the internal state isn’t public and length extension becomes impossible.


This is my understanding as well but I have seen claims that it is still very possible to do length attacks on SHA384.


Any reference for that? Based on the wiki it has 128 bits of security against length extension. https://en.m.wikipedia.org/wiki/SHA-2


What is || in this scenario? I only know it as logical OR.


concatenate strings.


TIL. I can think of many simple ways to mitigate such attacks. Of course, the real solution is, never try to roll your own crypto. Use a well established library whenever possible.


The reason this sort of thing comes up a lot is that it doesn't feel like rolling your own crypto. Developers will be quick to point out that their language of choice has a well established sha 256 library, developed by experts, who did all the crypto rolling. And then they came up with some workflow using it, not expecting this sort of interaction involving the way they use it.


I agree.

I think (and to be clear this is just addressed at the world as a lament, not directed at you) the issue is that people have an incorrect concept of what the phrase "rolling your own cryptography" even means; when I give talks on security, I always note that while there are tricky issues with some primitives for some use cases involving stuff like "does your code leak information via timing, power usage, caches, etc." that by-and-large the issue isn't about implementing a well-established low-level primitive--or, I will claim (maybe to my peril! ;P), even a high-level protocol--yourself instead of using an existing implementation: it is about coming up with your own design, whether it be your own checksum / hash function / signature algorithm... or your own protocol / scheme for using these primitives to accomplish a goal, as the stuff you will do wrong is not knowing all the corner cases in how to wield the pieces as these low-level cryptographic primitives are not and pretty much can't be leak-proof abstractions: they are little bits of math that often have to be used exactly correctly and even then only still provide some level of protection / risk mitigation against an adversary... when developers waltz in and assume the low-level hash function is in some sense perfect and provides some unbreakable abstract functionality, you are going to think something is trivial that is in fact very very hard.


You can just drop the end of the hash. That is essentially what sha-224, sha-384 , and sha-512/256 are.

Or you can use a hash where the internal state is larger than the output, which is the case of SHA3.

Note that this just protects against length extension, these are not MACs.


SHA3 KMAC is almost just this (a keyed hash) --- with a length appended, and some domain separation. You can make a MAC out of a truncated keyed SHA2 hash (but don't).


Nice.

Side note. The problem with JWTs is bigger than this. I "steal" web tokens all the time to get access to stuff outside the context the tokens were handed out in. If someone comes up with something really clever in this area there is money to be made.


What do you mean "outside the context the tokens were handed out in"?

And isn't this the same as stealing a cookie or something?


It is the same problem. My main issue with how it works is that typically tokens (and cookies) are valid until they expires. You browse some service and then it can be used by anybody for hours after that. And the methods for securing JWTs are either more tokens with shorter expiration time, using sessions or token black lists. Which kinda kills the beauty about having signed tokens.


> Signature = (message || secret) may be safe in certain circumstances, but you should also AVOID IT!

What about `H(s || m || s)`?


With appropriate padding to ensure each s is in its own separate input block, that is called "envelope" or "sandwich" MAC, and its security can be reduced to the compression function's security using mostly the same techniques as HMAC.

[1] https://doi.org/10.1007/978-3-540-73458-1_26

[2] https://eprint.iacr.org/2013/248

[3] https://eprint.iacr.org/2021/097


Bad against other cases such as constant length m (like user db id) that with combination of other weaknesses may be bruteforcable.

Like if your db is mongo and the 12 byte ID != 12 byte of secure randomness:

https://www.mongodb.com/docs/manual/reference/method/ObjectI...

I think the OP point still stands, just use HMAC instead of inventing crypto schemes.


Even with HMAC you have to be careful! A contrived example:

    HMAC(secret, username || email)
Imagine you have `bob` and `bob@example.com` as inputs. This HMAC can trivially be reused (or learned) by registering `bobb` and `ob@example.com`.

If you ever have multiple fields being concatenated, use fixed-width length prefixes before each field:

   HMAC(secret, 0x0003 || bytes("bob") || 0x000e || bytes("bob@example.com"))


Does this really work? Any arbitrary fixed-length prefix?

I always just serialize the inputs with JSON or something. Any weirdness an attacker might try would just be escaped so they'll never be able to match the original input.


> I think the OP point still stands, just use HMAC instead of inventing crypto schemes.

Completely valid point, I was asking purely out of curiosity since this isn't my area of expertise but I find these kinds of vulnerabilities intriguing.


You are now starting the process of independently re-inventing HMAC.


Is there a reason that length is not prepended to avoid length extension attacks?


Because presumably you don't always know the length of the data when you start with hashing. Like if you hash a file you're reading, it could get appended to while you're reading it. Or imagine hashing a video stream as you are recording it.

The solution newer algorithms seem to have settled on is to instead get fancier in the finalization step, for example by having internal state that doesn't make it into the final hash but would be required to compute the longer hash (just like using SHA512/256 to avoid length extension attacks).


The length may not be known ahead of time. Hashing algorithms generally work on streaming data. For example, suppose you wanted to hash compressed data while it's being compressed, you can't reliably predict the size of the compressed data.


The proper answer is to use something like SHA-3, which relies on an entirely different, simpler yet more powerful design. The underlying problem here is, conceptually, that together with the hash you release the entire state of the "hashing machine". SHA-3, to simplify a bit, truncates the output bitstring, and therefore does not release the entire state of the hashing machine.

Length extension attacks are in every crypto 101 class, everyone who designs crypto systems knows about them and acts accordingly. For example by using Galois field arithmetic to generate authentication codes, not hash functions (bonus: it is faster).


But then you down the rabbithole of UTF vs ascii byte length.

The OP advice is reasonable... just use HMAC. As far as I know it always available if hash is.


In this context it would definitely be the byte length


My doubt too. I understand what's happening but not why. :(


Title is missing "(with Go)".

Not sure why it was removed, that makes the whole thing much more interesting to me.


/s ?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: