This is a nice writeup on length-extension attacks, although I wish it had placed the takeaway further up: `H(s || m)` is almost always the wrong signature construction, even if the hashing function itself doesn't enable length extension attacks. You almost always want an HMAC.
> even if the hashing function itself doesn't enable length extension attacks
My impression was that H(s||m) was reasonable with e.g. SHA-3, assuming you have a fixed key size. The only downside I can think of is that you have to remember not to do it with variable length keys or with SHA-2. Am I missing something?
It's not unreasonable, although in the case of SHA-3 (Keccak), it's recommended to use KMAC. It pads the key and adds a domain separation prefix. Unlike HMAC, there's no double hashing.
Kangaroo12(K12) is a reduced-round KECCAK(aka SHA3) hash. Their HopMAC recommendation is only for K12 (and presumably Marsupilami14), not general purpose like HMAC (it's also quite similar to HMAC[0]). There's also KMAC[1] which is based on full-round KECCAK, avoiding the double hash.
Even if it's not required, the HMAC construction seem to generally make it much harder to exploit any weaknesses in the hash algorithm. For example SHA-1 is pretty broken by now, but HMAC-SHA1 has proven pretty resilient. (obviously don't use HMAC-SHA1, but it has hung on pretty well considering the state of SHA1)
Are we saying anything other than that preimage resistance on SHA1 has held up? Because that's true of MD5 too. It seems probable that H(k,m), for truncated H, will hold up?
The big reason I can think of not to use H(k, m) (or hand-rolled SHA3 KMAC) is that it will quietly fail open if you switch to a normal SHA2 hash.
TIL. I can think of many simple ways to mitigate such attacks. Of course, the real solution is, never try to roll your own crypto. Use a well established library whenever possible.
The reason this sort of thing comes up a lot is that it doesn't feel like rolling your own crypto. Developers will be quick to point out that their language of choice has a well established sha 256 library, developed by experts, who did all the crypto rolling. And then they came up with some workflow using it, not expecting this sort of interaction involving the way they use it.
I think (and to be clear this is just addressed at the world as a lament, not directed at you) the issue is that people have an incorrect concept of what the phrase "rolling your own cryptography" even means; when I give talks on security, I always note that while there are tricky issues with some primitives for some use cases involving stuff like "does your code leak information via timing, power usage, caches, etc." that by-and-large the issue isn't about implementing a well-established low-level primitive--or, I will claim (maybe to my peril! ;P), even a high-level protocol--yourself instead of using an existing implementation: it is about coming up with your own design, whether it be your own checksum / hash function / signature algorithm... or your own protocol / scheme for using these primitives to accomplish a goal, as the stuff you will do wrong is not knowing all the corner cases in how to wield the pieces as these low-level cryptographic primitives are not and pretty much can't be leak-proof abstractions: they are little bits of math that often have to be used exactly correctly and even then only still provide some level of protection / risk mitigation against an adversary... when developers waltz in and assume the low-level hash function is in some sense perfect and provides some unbreakable abstract functionality, you are going to think something is trivial that is in fact very very hard.
SHA3 KMAC is almost just this (a keyed hash) --- with a length appended, and some domain separation. You can make a MAC out of a truncated keyed SHA2 hash (but don't).
Side note. The problem with JWTs is bigger than this. I "steal" web tokens all the time to get access to stuff outside the context the tokens were handed out in. If someone comes up with something really clever in this area there is money to be made.
It is the same problem. My main issue with how it works is that typically tokens (and cookies) are valid until they expires. You browse some service and then it can be used by anybody for hours after that. And the methods for securing JWTs are either more tokens with shorter expiration time, using sessions or token black lists. Which kinda kills the beauty about having signed tokens.
With appropriate padding to ensure each s is in its own separate input block, that is called "envelope" or "sandwich" MAC, and its security can be reduced to the compression function's security using mostly the same techniques as HMAC.
Does this really work? Any arbitrary fixed-length prefix?
I always just serialize the inputs with JSON or something. Any weirdness an attacker might try would just be escaped so they'll never be able to match the original input.
> I think the OP point still stands, just use HMAC instead of inventing crypto schemes.
Completely valid point, I was asking purely out of curiosity since this isn't my area of expertise but I find these kinds of vulnerabilities intriguing.
Because presumably you don't always know the length of the data when you start with hashing. Like if you hash a file you're reading, it could get appended to while you're reading it. Or imagine hashing a video stream as you are recording it.
The solution newer algorithms seem to have settled on is to instead get fancier in the finalization step, for example by having internal state that doesn't make it into the final hash but would be required to compute the longer hash (just like using SHA512/256 to avoid length extension attacks).
The length may not be known ahead of time. Hashing algorithms generally work on streaming data. For example, suppose you wanted to hash compressed data while it's being compressed, you can't reliably predict the size of the compressed data.
The proper answer is to use something like SHA-3, which relies on an entirely different, simpler yet more powerful design. The underlying problem here is, conceptually, that together with the hash you release the entire state of the "hashing machine". SHA-3, to simplify a bit, truncates the output bitstring, and therefore does not release the entire state of the hashing machine.
Length extension attacks are in every crypto 101 class, everyone who designs crypto systems knows about them and acts accordingly. For example by using Galois field arithmetic to generate authentication codes, not hash functions (bonus: it is faster).