Hacker News new | past | comments | ask | show | jobs | submit login
SHA-3 Buffer Overflow (mouha.be)
416 points by cbzbc on Oct 20, 2022 | hide | past | favorite | 172 comments



The vulnerability impacts 'the "official" SHA-3 implementation'. How widely used is it for SHA-3 hashing compared to something like OpenSSL?


SHA-3 is rarely used in general. Most applications still use SHA-2 hashes (such as SHA256).


An obvious issue is that many protocols do two-sided negotiations of crypto algorithms during a handshake - so one of "most applications" which uses SHA-2 hashes by default could be vulnerable to a network connection saying "I don't support SHA-2, let's use SHA-3 pls?" and then exploiting the buffer overflow.

In a similar manner, apps that verify signatures/certificates often let the certificate specify which algorithm is used, so again effectively an attacker may be able to force the app to use SHA3 as long as it uses a library that has some support for it.


> two-sided negotiations of crypto algorithms

This is a common problem, and massively increases attack surface, since if a vulnerability can be found in any algorithm, it can be exploited. Downgrade attacks are a varient of the same.

Let me present an alternative:

Each protocol has one specific encryption algorithm and set of parameters. However, that set isn't fixed, but defined based on the current date and time.

For example, HTTPS could be defined as "SHA1 from 1996 till 2006", "SHA256 from 2005 till 2018", "SHA3 from 2017 till 2032", etc.

The dates can be written 10+ years into the future, so that software deployed today is still usable that many years into the future. As soon as the last date period expires, the software is not usable until it has been upgraded (unless both ends of the communication link choose to fake the date).

There is precedent for software 'expiring' - that happens whenever the root certificate of a trust store reaches an expiry date.

The downside is sometimes an algorithm might be broken before its date comes, and other times an algorithm may still be secure at that date. But that still seems better than our current mess of downgrade attacks.


As you say, it could be that an algorithm gets broken before its "expiry date" or remains secure well past that (look e.g. at AES). There is a similar but better alternative, usually called "opnionated cryptography". A protocol specifies a single cryptographic primitive for each type needed (hash function, block cipher etc) so no negotiation is needed. If one of the primitives gets broken, a new version of the protocol is released that uses a different primitive.


It seems to me that it may be better in many cases to support exactly two configurations (or some other small fixed number). This way there is no need to wait until a new version is released and dropping the vulnerable one won't cause any downtime. Ideally both codepaths would be regularly exercised (maybe clients should pick randomly when both choices are available). Then as soon as a vulnerability is discovered (maybe even before it is publicly revealed) all clients and servers could immediately drop support for the one that was discovered vulnerable.

Then the release of a new configuration can be done slowly with reasonable caution to get back to the desired redundancy level.


And then you still have a roll-out period during which those two protocols are available, so the two parties still need to negotiate which version they want. It's not clear to me what the advantage over the current situation is?


>the two parties still need to negotiate which version they want

Not necessarily. If the client contacts the server using version X, the server will reply with version X.


What if the server doesn't support version X? Then the client will try again with version X-1. This is a negotiation, it is just an inefficient one (client might have to try X-1, X-2, X-3 in turn if more versions are still co-existing; and contrarily, if client doesn't support any version the server does, you will not get a detailed error about the version mismatch, because they are entirely different protocols).


>Then the client will try again with version X-1

Who says that? Either it is specified in the protocol, and then you're right that this is still a negotiation, or it is not specified in the protocol, so if the client does that is implementing some out of spec behavior. Which is true that already happened on the Internet for compatibility's sake, but keep in mind that there is still an advantage, as an adversary can't easily mess in the connection setup phase in a way that protocol version X looks like protocol version Y.


If you don't have this downgrade, then clients can't talk with servers that haven't been upgraded yet! This means either that:

* you expect all servers to upgrade immediately, or at least faster than clients, or

* you delay upgrading clients you know to be insecure until servers had time to upgrade, or

* you are ok with breaking a significant portion of the network for every protocol update.

Either way, your proposal makes no sense for the internet.

Additionally, this is already possible with the current negotiation scheme. You can have clients refuse old algorithms (and they do). Your proposal doesn't improve anything there or anywhere else.


Negotiation makes the protocol more complicated, so more room for bugs. Additionally, sometimes (or maybe all times?) you need to explicitly configure a client or server to not accept broken algorithms during the negotiation, so you would need to keep track for yourself for which algorithms are safe to use and which are not. Anyway it would be interesting to look at how the Wireguard folks would handle this, as Wireguard is a protocol with opinionated crypto.


That's sounds a lot like protocol negotiation, just implicit vs explicit and limited to 2 algos.


Rare != bad && rare != unimportant.

The point of NIST standardizing on SHA-3 is to gradually replace SHA-2 due to the rise of computing power and the likelihood it will become as weak as SHA-1 is now in the near future. Unfortunately, like American credit cards vs. European chip & pin, it's going to take forever to adopt.


No. The "rise in computing power" doesn't jeopardize SHA2. There are important design differences between SHA1 and SHA2 (here's where in my younger days I'd pretend that I could rattle off the implications of nonlinear message expansion off the top of my head). SHA2 is secure; don't take my word for it through, you can find one of the Blake2 designers saying SHA2 is unlikely ever to be broken, or Marc Stevens on a Twitter thread talking briefly about why his attacks on SHA1 don't apply at all to SHA2.


I agree, SHA-2 is secure as far as we know. But since it's based on Merkle-Damgard, it permits length-extension attacks - i.e. given H(x), one can derive H(pad(x) || y) without knowing x.

So we need to be careful not to use it in setting where that would be problematic. Or we can use it with workarounds, like double hashing (SHA-256d) or truncating its output.

SHA-3 is sponge based, so its output is always truncated, preventing length-extension attacks. So I think SHA-3 is a better default, though it's fine to use SHA-2 if you know what you're doing.


Truncated SHA512 hashes, such as SHA512/256, defeat length extension attacks by omitting part of the hash state from the output. They're also significantly faster than classic SHA256 for large inputs.


Blake2/3 also doesn't suffer from length extension attacks, but SHA-256 is what everyone uses unfortunately.


We use Blake 3. So far so good…


I'm always leery when primitives mention their speed as a selling point because I'm thinking about the memory and CPU/GPU/ASIC costs required for adversary X years from now. Sure, one can hash or encrypt using N repeated rounds to up the cost but still: speed isn't everything.


Outside of password hashing applications, which message digests like the SHA and Blake families aren't ideal for anyway, hash functions don't really derive their strength from being slow. If a hash is seriously broken -- in the kind of way that MD4 is, for example -- attacks against it may require so few operations that the speed of the hash doesn't matter at all. But, so long as the hash function remains unbroken, any attack against it requires so many operations as to make it completely infeasible, regardless of how fast it is.


Like the sibling says, speed is not really related to hash security. Either the hash is good at any speed or it’s broken. Speed is important if you’re hashing often. Systems that hash often, in my experience, tend to have a stronger security posture than systems that use bearer tokens. So… fast strong hashing is good for applications because it enables good crypto to be applied more thoroughly. And that’s why we use Blake3, fast strong hash/kdf/xof that we can use anywhere (from powerful servers to wasm) without much of a second thought.


SMH. You're conflating "broken" by mathematical attack and having enough computing power to brute it (GPUs or quantum). Rise in computing power always jeopardizes the baseline brute cost of every algorithm, which is why standards shift over time, otherwise 3DES would still be recommended for new applications instead of AES.


"near future"? I doubt that. Eventually, but likely not in my lifetime.

https://www.imperialviolet.org/2017/05/31/skipsha3.html


I've mostly seen SHA-3 used in more modern applications, it's used a lot in cryptocurrencies (with BLAKE2)


I’d love to use SHA-3, but I’m waiting until I see SHA-3 in standard libraries for my most-used languages.


The version in the Golang stdlib defaults to a pure-go implementation... unless you're compiling for amd64, in which case you get an assembler variant apparently directly derived from the XKCP package (https://github.com/golang/crypto/blob/master/sha3/keccakf_am...).

Slightly concerning news for the (mostly-Golang-based) Ethereum ecosystem, which relies on SHA3-256 for pretty much everything...


Easy enough to test, right?

    func main() { 
      h := sha3.New224()
      buf := make([]byte, 4294967295)
      h.Write(buf)
      sum := h.Sum(nil)
      fmt.Printf("%x\n", sum)
    }
Doesn't crash on my amd64 dev machine.

Later

I could have just looked at the code, too: the assembly you've linked to is just the Keccak permutation, not the entire Go hash; the buffer management is done in Go, not in assembly.


This is one of those cases where I'm actually more concerned that it doesn't crash. It's like seeing clearly-syntactically-invalid code that somehow compiles anyway — you wonder what semantics the compiler could have possibly ascribed to it.

Presumably this isn't not-crashing just because the developers of the Golang stdlib somehow found+fixed this bug back in 2015 when this assembler file was baked. The error is in that assembler code, I'm sure. It's presumably getting masked by something. (Something that may be benign, or might be subtly corrupting the runtime.)

For a benign case: maybe the Golang runtime isn't allocating you precisely as many bytes as you're asking for, but rather a little bit more? Perhaps rounding up to a multiple of a page for more-than-page-sized allocations?

Not having access to an amd64 machine at the moment, I'll have to ask you: does increasing the size by one, as in the article, cause an infinite loop?


No, it causes the string

    c5bcc3bc73b5ef45e91d2d7c70b64f196fac08eee4e4acf6e6571ebe
No matter what Go is allocating under the hood, it has to be feeding exactly that many bytes to the SHA3 algorithm.


> it has to be feeding exactly that many bytes to the SHA3 algorithm

Well, yes, but this is supposed to be a buffer overflow — the algorithm itself is reading and/or writing past the end of the buffer it's been handed. My hypothesis was that Golang is allocating in such a way that reading/writing a single byte past the received slice bounds won't result in a protection fault in the way you'd expect if the allocation were exact.


The buffer overflow is in the C version of the algorithm (and likely related to loop condition checks idiomatic to C's for-loops). The Go version is a fresh implementation, not a wrapping of the C version. I'm no Go programmer, but if I'm not mistaken, the Go implementation just eats little slices of the input buffer until no more buffer is left, leaving all the overflow-danger to the Go array implementation:

https://github.com/golang/crypto/blob/642fcc37f5043eadb2509c...

Possibly not as fast as C, but easier to reason about, I'd say.


Just for what it's worth, we're talking about the amd64 assembly version of the SHA3 code in the x/crypto/sha3 library; I didn't look carefully at how it's called, so if it's used incrementally the way this Go code shows, then yeah, it's fine regardless.

A second later

Oh, wait, yeah, this is just the Keccak permutation in assembly, not the entire hash. That was dumb of me. Yeah, this code looks ok?


Indeed, the two assembly variants are just of the 1600 byte block bit fiddling (keccakF1600).

The outer loop of the Write method (function?) is the same for all architectures.


Are you sure this is the same code that triggers the vulnerability? Is the double `update` in the post itself unimportant (i.e., first updating with a 1-byte string, then the 4,294,967,295-byte string)? As I don't think this does that.


Let's try! Gimme a sec. And: to be clear: I make no representation that this code proves anything, and people should of course take a hard look at the SHA3 amd64 assembly code. I'm just having a message board discussion.

One sec and I'll get you an answer to your thing about the extra write.


With the extra write:

    c5bcc3bc73b5ef45e91d2d7c70b64f196fac08eee4e4acf6e6571ebe
With the extra write and an extra byte on the big write (to try to trigger the loop condition):

    ec66be1ebccf055f839fccf2d12e641dcbbda4f5c71a3bdee6509495


Off-by-one error. Didn't do the 1 byte update.


See downthread, but I don't think it matters. I didn't look at the code before writing the test case; the code here is just the SHA3 transform, not the buffer management code around it, which AFAIK is just Go code. I don't see an opportunity for the overflow here?


I read the code carefully, both yesterday and today, and it's correct.

Read it yourself here:

https://cs.opensource.google/go/x/crypto/+/refs/tags/v0.1.0:...

There's obviously no problem.


All the Go code is safe. No bugs.


If you're familiar with SHA-256 and this is your first encounter with SHA-3:

The main differences between the older SHA-256 of the SHA-2 family of FIPS 180, and the newer SHA3-256 of the SHA-3 family of FIPS 202, are:

* Resistance to length extension attacks.

* Performance. The SHA-2 functions—particularly SHA-512, SHA-512/224, and SHA-512/256—generally have higher performance than the SHA-3 functions. Partly this was out of paranoia and political reasons in the SHA-3 design process.

Further reading: https://crypto.stackexchange.com/questions/68307/what-is-the...


Note that truncated SHA-2 (SHA-224, SHA-384, SHA-512/224, SHA-512/256) are not susceptible to length-extensions attacks [1].

With the added benefit of better performance of SHA-512 (on 64 bit systems) [2], there's no good reason to use SHA-256 rather than SHA-512/256 for new cryptographic designs.

[1] https://en.wikipedia.org/wiki/Length_extension_attack

[2] https://crypto.stackexchange.com/questions/26336/sha-512-fas...


There is a good reason to use SHA-256 and not SHA-512.

Many modern CPUs, e.g. all AMD Zen, most Intel Atom, Intel Ice Lake and newer, most 64-bit ARM, have hardware implementations of SHA-256, which are much faster than software computing SHA-512.

Only some more recent 64-bit ARM CPUs also have hardware for SHA-512 and SHA-3.

Whenever the speed matters, SHA-256 is the best choice, unless you choose different hash algorithms based on the detected CPU.

On old 64-bit CPUs without SHA hardware, e.g. Intel Skylake, SHA-512 is faster than SHA-256 (Intel introduced SHA in Atom CPUs many years before also introducing it in Core CPUs, so that Atom could compete in Geekbench scores with the ARM CPUs, which already had SHA).


Allegedly, SHA-256 can often be faster than MD5 https://security.stackexchange.com/a/95697 Not to say the algo is faster, it's not, but the CPU has specialized hardware for SHA-256.


Makes me wonder if there is a microcode back door to these hardware hashers such as one which steers the output after the microprocessor is fed carefully crafted input to cause it to run in a vulnerable state.


Interesting, in general I've noticed SHA-256 hashing is relatively CPU intensive and slow.. is the 2024 Xeon CPU in my server too old to include the hardware implementation?


I think that you might have mistyped the Xeon model number.

Nevertheless, most existing Xeon CPUs are too old to have SHA hardware.

In Intel server CPUs SHA was introduced many years later than in AMD server CPUs or Intel desktop CPUs, i.e. only in "the 3rd generation Xeon Scalable" based on the Ice Lake Server cores, in Q2 of 2021 (Xeon model numbers 83xx, 63xx and 53xx).


Is your software using SHA instructions?


I imagine the sha256sum program would use the best available technique.

Reasonable assumption?


Last time when I have checked what sha256sum (from coreutils) does was one year ago.

At that time, on x86 CPUs (I have not tested on ARM) neither sha256sum nor sha224sum nor sha1sum used the hardware instructions (and I have compiled them from sources, with the appropriate flags).

Because I have a Zen CPU (but that would also be true on Atom CPUs or Core CPUs since Ice Lake), I have to use "openssl dgst -r -sha256" instead of sha256sum and "openssl dgst -r -sha1" instead of sha1sum.

The openssl implementation uses the hardware instructions where available and in that case it is several times faster.

(The "-r" option flag in "openssl dgst" selects an output format that is more similar to the sha???sum tools, i.e. hash value followed by file name, but openssl still outputs an extra character, which must be accounted in scripts.)


I also see similarly slow performance when using the Go built-in Sha256 hashing library.

Thanks for the info! Fortunately for my purposes of integrity checking, slow is okay.


I also use them for file integrity checking, but I frequently check large files, e.g. 50 GB files, and in that case the difference in time until finish between "sha256sum" and "openssl dgst -r -sha256" is extremely noticeable on a Zen CPU.

On the other hand, on older Skylake/Kaby Lake/Coffee Lake etc. CPUs, both programs have the same speed.


*2014, yes it was a typo


SHA-3 was also designed with hardware performance in mind IIRC, as in: if it becomes supported in hardware it will be much faster than SHA-2 hardware support.


Some of the more recent 64-bit ARM CPUs have added hardware support for SHA-3 (SHA3 is optional in architectures Armv8.2-A and later, i.e. in Cortex-A55, Cortex-A75 and later, i.e. starting with the smartphones from 2018).


My understanding was that sha-3 should be faster than sha-2, in a general sense, but sha-2 has hardware acceleration. Is that incorrect?


I think SHA-3 is almost always slower in software, in theory SHA-3 could be hardware accelerated of course, but on both current AMD and Intel systems it's not, where SHA-2-256 is.


AArch64 supports accelerated SHA-3, available on production systems since 2019 with the Apple A13, judging by https://github.com/llvm/llvm-project/blob/c35ed40f4f1bd8afd7...

Power10 also supports accelerated SHA-3: https://www.redbooks.ibm.com/redpapers/pdfs/redp5649.pdf (p150)

Accelerated SHA-3 on x86_64 is probably an inevitability; the question is more when than if.



For those not familiar with QuickAssist, it is a cryptography and compression accelerator connected through PCIe, which is available in some models of Intel server CPUs or server chipsets and in a few models of Intel server Ethernet NICs.

Moreover, the laptop variants of Alder Lake (H-series, P-series and U-series) are said in the Intel Ark web site to include a subset of QuickAssist, but I have not been able to find any public Intel document explaining which functions are supported by laptop Alder Lake CPUs or any report of someone testing these functions on laptop Alder Lake CPUs.


Most Android smartphones support SHA3 starting with the 2018 models (i.e. Cortex-A55, Cortex-A75 or newer).


i thought sha-2 included the message length in the head padding to prevent length extension attacks?

just read the link you provided: it's sha-224 and above.

i wonder if a similar issue exists in the sha-224+ padding code.


I think you've badly misunderstood what's going on here. SHA-2 includes SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256. Six functions. Of these SHA-256 and SHA-512 are subject to the length extension attack (and SHA-224 only has 32-bit worth of safety which is clearly inadequate). Since you probably only needed a 256-bit hash, SHA-512/256 is a choice available today which produces a 256-bit hash without the possibility of length extension.

Length extension works because the hash is the entire internal hash function state. This is not true for SHA-224, SHA-384, SHA-512/224 or SHA-512/256 since for these four functions not all of the internal state is exposed.

No, the length is not included at the start of the data to be hashed, this would mean committing to entire input before we hash it, which is a pretty unsatisfactory API, the length is appended (with the padding).


> I think you've badly misunderstood what's going on here.

that sounds right. been almost a decade since i've dug into sha-2.

i do remember the lengths in sha-256 and sha-512 though and i remember not being a huge fan of the chunking/padding code.

come to think of it, there's another reason for the message length in the padding. i forget what it is now.


> The vulnerable code was released in January 2011, so it took well over a decade for this vulnerability to be found

Ouch


Took well over a decade for this vulnerability to be disclosed.


Between this and "beacown" I came to a realization. I work on codebases worth way less than these amd we do lots of different static analyzers two compilers, lots of unit tests with coverage checks and asan/ubsan. The linux kernel and sha are worth tons of money. Thus, someone wrote all the pipelines and unit tests for that software too. The problem is, the people who paid for all that work aren't the people that rely on thos software's security. It's paid for by people that rely on it's insecurity. So we don't know about this stuff until they or the people they sold it too use these exploits.

Edit: in case the solution I propose isn't obvious, organizations like red hatand google need to pay for these pipelines and unit tests. Getting good unit test coverage is expensive and ubsan and asan don't work without that coverage. So since no one has mentioned it yet, you could rewrite this stuff in rust as a poor man's substitute. It will catch some of the aame things, but ultimately there is no substitute for test coverage with sanitizers. Well, maybe formal analysis?


> So since no one has mentioned it yet, you could rewrite this stuff in rust as a poor man's substitute. It will catch some of the aame things, but ultimately there is no substitute for test coverage with sanitizers.

That's backwards. You'll catch more cases with a Rust-style type system that naturally checks everything, than with sanitisers that can only check the paths that get executed in tests.


Rust isn't perfect. UB is a bug in rust, but it occasionally has bugs. Ideally you'd do rust and asan with good unit tests. And yes, if you only picked one, it should be rust, but don't just pick one. And just because you are using rust is no excuse to skip static analysis like prusti, coverage with gcov, llvm, or tarpaulin, and certainly not unit tests.


There is unlikely ever to be a perfect, but a getting better by constraining behavior that breaks things. Throw all of the compile-time, profiling, checked builds, and binary-level tools at projects for defense-in-depth and checklists, no matter the platform or the application. Fuzzing, valgrind, gperftools, dtrace, {[amt],ub}san, etc. and formal methods like seL4 if you can afford the investment. :>


Does valgrind add anything to asan given the same unit tests? I haven't used in years because it is so slow.


Address sanitizer is not precise and can miss certain classes of bugs that Valgrind would catch.


Yeap. Although Google killed off afl, exercise code with fuzzing too that unit and integration tests didn't catch.


> And just because you are using rust is no excuse to skip static analysis like prusti, coverage with gcov, llvm, or tarpaulin, and certainly not unit tests.

It absolutely is, and this kind of absolutism is what holds back the adoption of things like rust.

In most real world software development, the target defect rate is not zero. The value proposition of something like rust for most businesses isn't that it lets you lower your defect rate; it's that it lets you maintain your existing (acceptable) defect rate at a much lower cost, by letting you drop your static analysis and most of your tests (reducing maintenance burdens) and still have a better bottom-line defect rate.


So your assumption is that most to all defects are language related and not programmer error? I will bet the farm your defect rate will remain the same without any actual validation of what people write. Plenty of errors due to language quirks but lots of times I see missing statements, fixed values which should be variable, input checking failures due to unknown input, incorrect assumptions and lots of non language gelated bugs which will not go away.


You're not accounting for the fact that Rust ships with a mandatory static analyzer called rustc. Btw Google recently gave a talk about switching from Java to Kotlin, which significantly reduced bugs, despite less mature tooling. So...


Rust does validation of what people write, it won't let you make a dangling pointer, whereas C will happily accept it if you hide it from the warning heuristics


> So your assumption is that most to all defects are language related

What does that even mean?

> I will bet the farm your defect rate will remain the same without any actual validation of what people write.

In what sense is a typechecker (or the rust borrow checker) not "actual validation", but a static analyser is?

> Plenty of errors due to language quirks but lots of times I see missing statements, fixed values which should be variable, input checking failures due to unknown input, incorrect assumptions and lots of non language gelated bugs

Most of those sound like type errors to me. Errors where the program isn't doing what the programmer thought it should can generally be avoided by using more precise types. (The more insidious case is where the program is doing exactly what the programmer thought it should, but they've misunderstood the specification or not thought through the implications - but no kind of testing can catch that kind of bug).


I'll give you the static analyzer is much less important with rist, but that is the easy part anyway. I stand firm that tests are always important, and you don't know how well you tested without coverage. I'm not even sure I'd say you need less coverage with rust, not all errors are resource safety related.


Coverage numbers are misleading (to the point that I find they did more harm than good). There are definitely cases where you can't (or aren't clever enough to) express what makes your business logic valid in the type system and need to write a test, but IME they're the exception rather than the rule (both because you generally can encode things in types, and because the majority of code ends up being "plumbing" with no real business logic anyway); I like to follow something like https://spin.atomicobject.com/2014/12/09/typed-language-tdd-... .


I like that article! It doesn't say to do away with coverage though, it simply talks about using good typing to make coverage easier to get with fewer tests. Like you said, sometimes you can't get the type system to enforce correctness, and sometimes you may think you have done so when you haven't, you always need tests to see.


I'm not sure what you mean by "coverage" - the article is advocating deleting (or not writing) tests if you can move the corresponding check into the type system, and the end result is that you will naturally end up with codepaths that aren't covered by tests (because those codepaths essentially "can't go wrong"). That makes the things that I've normally heard called "coverage" pretty useless.


It looks pretty difficult to exploit. Untrusted 4GB input with non-chunked update.


The purpose of running SHA is to determine whether the input can be trusted though. But it is true that relatively few files circulate at or above that size.


And even if they are, you probably aren't going to do a digest on the whole thing all at once.


IDK, I see a lot of "non-production" code that likes to read whole files into a in-memory buffer rather than bothering with streaming. With modern memory sizes it is completely possible that you could read a 4GiB file into memory without having issues.


Thinking about it, it's strange that hashlib doesn't have a function that accepts a file-object as input, to make the efficient method default pit of success. Naive implementations like hash(f.read()) are all too common to see. Such an api would also allow the entirety of the tight loop to be done outside of slow interpreted code.

Guessing the rationale is that if you read a file, you also want to use the content for something else, not only compute its hash.


4+ GB inputs are often fed to hashing functions. Verifying the integrity of file downloads before running them for example.


But that'll probably run in chunks. Not gonna load 4GB into memory.

Edit: Turns out there was another comment like this, and a response was that the file could be mmap'd.


You're optimistic, which is commendable, but unfortunately a naive implementation would probably just read it all. I probably would, because the happy path doesn't deal with 4 GB inputs.


I guess if your test machine always has a lot of contiguous* free memory or you never test it with a large enough file. Otherwise, it's impossible to ignore. One large download, and it wouldn't just be slow, it'd not work at all.

* Which is usually much harder than just free mem. I recall macOS always being able to give a huge chunk of memory, I guess by rearranging on the fly, but it's very slow if you ask for a lot. And Linux just says no.


It doesn't need to be contiguous on any modern operating system I'm aware of. You just need 4gb of free contiguous address space, which virtually every 64-bit application will have.


Oh, then I ought to try my experiment again sometime. Last time was 8 years ago on Ubuntu Server, and maybe I'm missing a detail, like unknowingly running a 32-bit OS. But yeah the behavior you describe is more what I'd expect.


> And Linux just says no.

Linux might say yes, even if the memory isn't available. See notes of malloc:

"This means that when malloc() returns non-NULL there is no guarantee that the memory really is available." [0]

[0] https://man7.org/linux/man-pages/man3/malloc.3.html


This won't trigger overflow. It will only be triggered if file is broken into chunks.


The vulnerability only triggers when you break the file into chunks and make one chunk larger than 4GB.


Could easily happen if you compute the combined hash of a set of files, with one chunk per file (mmap'ed fex).


Didn't they find a vulnerability like this in the official MD5 implementation when they tried to port it to SPARK/Ada and the proofs didn't work? I wasn't able to just find it with web search, but here's a release about something similar happening with another one of the SHA3 candidates (Skein), before Keccak was chosen:

https://www.adacore.com/press/spark-skein

See also: https://www.adacore.com/papers/sparkskein


LibreSSL-based implementations seem unaffected. I can calculate that hash using hashlib without a segfault.


Does this impact most distros? I'd imagine Python and PHP's native crypto bindings would be replaced with OpenSSL which should have assembly variants of SHA-3.


I use pyenv and whatever it installed for 3.10.4 months ago seems to be fine with the example code snippet run in the console. It took a few seconds, but didn't crash.


Oh shit. I better check my native Ruby extension that I believe uses the reference C code.


Phew. It works.

     gem specific_install https://github.com/steakknife/digest-sha3-ruby 
    git version 2.31.1
    http installing from https://github.com/steakknife/digest-sha3-ruby
    Cloning into '/var/folders/x6/87j2gpl54x79m0nvns2lsrpc0000gn/T/d20221020-79527-r1vite'...
    remote: Enumerating objects: 191, done.
    remote: Counting objects: 100% (3/3), done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 191 (delta 0), reused 0 (delta 0), pack-reused 188
    Receiving objects: 100% (191/191), 6.10 MiB | 11.79 MiB/s, done.
    Resolving deltas: 100% (65/65), done.
      Successfully built RubyGem
      Name: digest-sha3
      Version: 2.0.0
      File: digest-sha3-2.0.0.gem
    Building native extensions. This could take a while...
    Successfully installed
     pry -rdigest/sha3
    [1] pry(main)> h = Digest::SHA3.new(224)
    => #<Digest::SHA3: 6b4e03423667dbb73b6e15454f0eb1abd4597f9a1b078e3f5b5a6bc7>
    [2] pry(main)> h.update("\x00")
    => #<Digest::SHA3: bdd5167212d2dc69665f5a8875ab87f23d5ce7849132f56371a19096>
    [3] pry(main)> h.update("\x00"*4294967295)
    => #<Digest::SHA3: c5bcc3bc73b5ef45e91d2d7c70b64f196fac08eee4e4acf6e6571ebe>
    [4] pry(main)> h.hexdigest
    => "c5bcc3bc73b5ef45e91d2d7c70b64f196fac08eee4e4acf6e6571ebe"
    [5] pry(main)>


SHA-3 in Ruby:

    $ gem install sha3
    $ irb
      > require 'sha3'
      > s = SHA3::Digest::SHA224.new
      > s.update("\x00")
      > s.update("\x00" * 4294967295)

    [ Segmentation fault... ]
Tested with Ruby 3.1.2

Gem's code (including C native extension): https://github.com/johanns/sha3


Just over half a million total downloads. So some use but not particularly popular.


Released a fix earlier today (Oct 23rd) (v1.0.5)


Can someone ELI5 the severity of this over the whole internet? What breaks/what not


The full advisory is linked at https://github.com/XKCP/XKCP/security/advisories/GHSA-6w4m-2.... In order for an application to be vulnerable, it must:

1. Break the file into multiple chunks and pass them to SHA-3 individually.

2. Make one of those chunks larger than 4 GB in size. (This requires using 4 GB of memory.)

This is kind of an unlikely thing for an application to do. If you're breaking the file into chunks, usually you'd use smaller chunks. (And if your server is limited to less than 4 GB of RAM, you might not be able to allocate enough memory to trigger the vulnerability in any case.) But it could be exploitable in some really weird or poorly written application somewhere.


Nitpick: it requires using 4 GB of address space, not necessarily physical memory - the app might be eg memory mapping the file.

(mmap() can be reasonably used for streaming usecases like this provided you use madvise() hints on whether you want the data kept resident after use, probably non unix platforms have similar apis)


Right, mmap() lets you get a big buffer without actually using that much physical memory. Good catch!

I still think it's unlikely for an app to be vulnerable to this but having less than 4 GB of RAM isn't an absolute defense. Having a 32-bit address space probably is, though, if you're running on old or embedded hardware.


SHA3 is relatively infrequent compared to sha1/2 and md5. SHA3 also has a few variants, and I don't know for sure if this vulnerability is present in all variants.

The vulnerability is only present in the reference implementation. So it's unlikely that other implementations (rush as Rust or Go) are vulnerable.

I tested with the latest released and master branches of PHP, both of which segfaulted for the code samples mentioned in the article. The article says Python is also vulnerable.this is because both of the languages apparently use the reference implementation.

Ethereum uses SHA3 quite extensively, but I doubt it's vulnerable post migration from PoW, let alone it's unlikely that they use the vulnerable implementation/variant.


Most people limit the amount of input that gets passed to whatever is doing the hashing (SHA-3 in this case) to some amount of bytes that are way less than this issue requires.

Most web servers default to a 50MB limit or something like that, I think, not many things accept by default the size required for this exploit. Most cases would involve services that accept large untrusted files from users.


If you don’t use that library then nothing. If you do and you hash my crafted binary I probably could crash it.


FYI in the healthcare space to meet ONC Cures Update regulations for bulk patient export, its a requirement to use SHA384 (RS384 or ES384). I would think most implementations will be safe due to the 4GB payload criteria, but if anyone is doing anything funky they'll need to look at this closely.


SHA-384 is a hash function from the SHA-2 family, and it has no relation to SHA-3.


It may be dumb question, but is there any realistic use case to use this vulnerability to reveal SHA-3 hashed secrets? Or is it just that attacker can crash systems with suitable input?


From the article:

I’ve shown how this vulnerability in XKCP can be used to violate the cryptographic properties of the hash function to create preimages, second preimages, and collisions. Moreover, I’ve also shown how a specially constructed file can result in arbitrary code execution, and the vulnerability can also impact signature verification algorithms such as Ed448 that require the use of SHA-3. The details of these attacks will be made public at a later date.


What dows generating preimages mean here? The preimage is the set of all messages that hash to some set of hashes. Is the author saying the the hash function is reversible in reasonable time?


It's a buffer overflow. They can get a vulnerable implementation to execute arbitrary code, so they can get it to return any value they want. The actual SHA-3 algorithm wouldn't return the same value as the vulnerable implementation, of course.


I didn't see anybody mention arbitrary code execution


It's in the post:

    a specially constructed file can result in arbitrary code execution



Does Tor use SHA-3?


“I’ve also shown how a specially constructed file can result in arbitrary code execution, ”

Ouch, thats not looking good for a reference implementation for a piece of software deployed on billions and billions of machines that was written by experts and reviewed (and modified) by even bigger egg heads.


Just another nail in the long overdue C/C++ coffin. If not even these experts can get it right, under the best development conditions, public review and highest stakes, on a relatively small component with the most strict specifications, then no one can.


What's to replace C, Rust?


> Just another nail in the long overdue C/C++ coffin

C, f**ing C. By the sake of god. Not C++.

Stop to put both in the same basket, it just show you do not know what you are talking about.

https://github.com/XKCP/XKCP/commit/fdc6fef075f4e81d6b1bc383...

Something like that in modern C++ would have been done using std::span<> and that prevents this kind of out-of-bound access and the party-time that comes with it.


WG21 (the C++ Standards Committee) feels that since idiomatic C++ was unsafe (e.g. array operations don't have bounds checks) it is consistent for new C++ features to also be unsafe by default, even if the whole point of those features in other languages is to provide an idiomatic safe way to do things.

So for example in Rust you have a native slice type reflecting a dynamically sized view into a contiguous sequence, and of course it's safe -- five[20] will either refuse to compile if the compiler can see this slice is too short, or it will panic at runtime. In C++ as others have mentioned actually std::span just isn't safe when used ergonomically. five[20] in C++ is Undefined Behaviour. std::optional has a safe interface for a Maybe / Option type, but is presented with a more ergonomic unsafe behaviour and that's what people use.

This is not an old-fashioned thing the C++ Committee grew out of, std::expected is in C++ 23, that's a Result type and it likewise offers an ergonomic unsafe API. The safe APIs were seen as a nice-to-have which could miss the train because who needs safety anyway? Just don't make any mistakes.


Tragically, what I would call idiomatic C++ with compiler provided frameworks pre-ISO C++98, made use of bounds checking in their collection classes, and then the C++ Committee went completly to the other way.

Naturally we have now ways to enable them on all major compilers, but many still don't.

This is one point I kind of agree with you.


The crucial trick in Rust is not that the index operations are bounds checked, that's easy, as you observed plenty of C++ toolchains can do that.

The crucial trick is providing the unchecked operations (with improved performance) as unergonomic alternatives. *(five.get_unchecked_mut(20)) = k; // looks horrible. Nobody wants to write that, so when they don't need it they won't write it. The fact calling get_unchecked_mut requires unsafe is part of how Rust could achieve its goals, but the choice to not make this ergonomic is why it actually delivers in practice.

What CppFront wants here, and P2687 proposes, and lots of other C++ work has suggested, is roughly:

  [[suppress(bounds_check)]] { five[20] = k; }
Thus imitating what they think Rust does here, rather than what it actually did, and in the process completely missing the point.


Given that on some domains, there is no way around C or C++, unless one wants to be part of building the ecosystem, I was having big hopes on the clang and VC++ static analysis work for those kind of scenarios.

I can't speak for clang, but in what concerns VC++ is mostly useless still, unless one wants to annotate everything with those kind of annotations + SAL, and even then it is only half way there.

Which is not really inspiring.


Mmmhm. Go look in the docs for std::span and see how long it takes to find "The behavior is undefined if idx is out of range" or similar. For example: https://en.cppreference.com/w/cpp/container/span/operator_at


> Mmmhm. Go look in the docs for std::span and see how long it takes to find "The behaviour is undefined if idx is out of range" or similar.

It is regrettable and under fix but:

- You generally do have your own implementation with more advance bound check for safety-critical implementation. - We do have our own where I work right now. - Google has its own in abseil.

- span allows the usage for range-for loop which combined with subspan() makes possible to avoid pointer arithmetic all together.


When you are already in a deep hole, it's a good idea to stop digging.

How much time did it took you to understand all this? What chances does a beginner has to understand these subtle differences and learn the particular way your codebase uses std::span and not footgun themselves? How much productivity is lost to the sheer terror of such footguns - not actual bugs, but time and effort lost that did not move the product forward?

And most importantly, what is the point of accepting that huge cost instead of writing in a language where unsafe data access is impossible?


> And most importantly, what is the point of accepting that huge cost instead of writing in a language where unsafe data access is impossible?

Because the world is a place where billions of lines of code have been written. And these will never be rewritten, but can be modernized.

That the reality, and that's why your web browser right now and most of your OS is still written in unsafe language right now. And will still be for the next 15years at least.


Surely "nails in the coffin of C/C++" does not imply an all out effort to rewrite all this existing and working software from scratch.

The beast will die in the same sense Fortran or Cobol are dead: an outdated language that has no noticeable strengths in the modern world, used by a bunch of (difficult to hire) old timers to fix legacy systems.


> Surely "nails in the coffin of C/C++" does not imply an all out effort to rewrite all this existing and working software from scratch.

Or you use the evolutionary approach: remove the unsafe part from C/C++ bit by bit. And do code migration (Carbon style) when you can do it. This is what some C and C++ committee members try to do and fortunately they are there.

Because that is way more likely to happen in a reasonable time frame (next 15 years) than rewrite Chrome, Linux, OpenSSL, Office and all the other 10 of billions of proprietary code we have around in Rust.


> remove the unsafe part from C/C++ bit by bit.

You cannot do that without breaking backward compatibility. If you maintain compatibility, what results is a sprawling meta-language with a large learning burden that has the theoretical ability to emulate a safe language, and which in practice will be used to mix both safe and unsafe idioms depending on the experience and discipline of the programmer, in an unholy mess that no static analyzer can prove as correct and no programmer can say they really fully understand.

Maybe I'm mistaken, I've last programmed in C++ a good number of years ago, but I have not encountered a (C++ programming paradigm, static linter) doublet that can prove data integrity and thread safety to an extent even close to the Rust compiler. And it would be useless for existing codebases anyhow.


As much as I like C++, std::span<> would only have helped if the build was built with bounds checking turned on.

ISO on their wisdom has turned the safety defaults around in regards to classical C++ frameworks.

So C++ secure code, either has to make use of Microsoft's gsl::span<>, or again turn on bounds checking in release builds, you cannot trust everyone to call .at().


C++ offers safer constructs but they have their own sharp edges and you have to actually use them to get their benefits. In security critical software these can still be enough to cause significant problems.


Code like that in C++ would probably be over-optimized and would skip boundary checks, exposing it to the same problem.


At least it's not SHA-2, which is probably more widely used. Or am I mixing them up?


You are correct, SHA-2 is much more prevalent.


Interesting they both say "Official Sha3" and "by its designers", which as I remember it isn't that accurate. Keccak was chosen and then NIST added what is affectionately known as the 'mystery padding' before certification. What we know as official is not the way the designers submitted the proposal.

This isn't an attempt at a scary accusation, but as a pedant, this got me.

For those wondering, here is an explanation by a commenter:

    The padding change is the only difference, this allows future tree hashing modes as well as the current SHAKE outputs to generate different digests given the same security parameters and message inputs. Up to 4 additional bits are added, which keeps the full padding inside a byte boundary, making implementations with octet only input able to switch to SHA-3 from Keccak with change to only a single line of code.
https://crypto.stackexchange.com/questions/10645/are-nists-c...

https://cdt.org/insights/what-the-heck-is-going-on-with-nist...

ketccak team's response: https://keccak.team/2013/yes_this_is_keccak.html


Who actually wrote the faulty code? I had the impression that the padding change was written by the keccak team at the request of NIST.


The Keccak team wrote XKCP, for whatever that's worth.


Keccak for xkcp and NIST for the padding, as far as I can remember. As I recall the padding was added first, and the explanation came later. There was great distrust at first during this time as a result, hence the name 'mystery padding' we used.


I'm not sure what's your point, the XKCP is the official SHA-3 implementation.


> I'm not sure what's your point

You're just a baby. There's time.


[flagged]


You're responding to a post that links to the Keccak designers saying that the padding change isn't nefarious. You have to be able to do better than "NIST bad" in comments on threads like these.


From Daniel J. Bernstein:

2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. : https://blog.cr.yp.to/20220805-nsa.html

https://twitter.com/hashbreaker/status/1555625577989541888


... is a FOIA lawsuit about the recent PQ contest they refereed that has literally nothing to do with SHA3.


Appeal to Authority is not a very good argument either.


It is in this case. Absent any argument other than "I don't trust NIST", the designers of Keccak can refute your comment by simply saying "no, this is fine", and that's what they did.


That's a bit of a stretch. I think the person who initially designed something signing off on a modification of that something is pretty compelling.


no bounty and still politely reports it. Good guys need more praise.


it's a public standard, who would pay such a bounty?


A couple companies sponsor “the internet bug bounty”: https://www.hackerone.com/internet-bug-bounty

IMHO, sha3 should be part of that bounty.


I'm sorry, we lost millions of dollars due to an exploit.

Don't you have bug bounties to find and fix these things?

But this was in a public standard that we were using, we don't cover those.

Do I look like I care?


Are you telling me this? Or are you going to pay a bounty?


Someone who values security with real money


Bro what? This is a big deal right?

Like NSO Group hacking everyone for the next 5 years deal?


> The vulnerable code was released in January 2011, so it took well over a decade for this vulnerability to be found.

It would have already been a big deal for the past five years, if anything.

Furthermore you have to induce a bufferoverflow first. I’m no security researcher, but to my mind that means you’re either (a) trusting idiots to write sensitive code without a IT Sec team reviewing it, or (b) you’ve got a malicious actor on the inside, who can slip it past review. I’m sure there are plenty of other scenarios, but I’m saying it doesn’t feel likely by gut.


Absolutely not. Buffer overruns aren’t something that only idiots introduce.


So if we ported this library to rust sha3 seems peachy? No doubt an interesting port but it now seems inevitable.


There are various crates with pure Rust implementations of SHA3 or of the Keccak functions, I would expect that they just don't have the bug, although it's possible they instead panic under the circumstances invoked in this exploit.

(safe) Rust writes actual bounds checks, so if you accidentally overflow a buffer in some case you never tested that compiles, it would just panic if the case you got wrong occurs in real life.

A more specialised language like WUFFS doesn't write bounds checks, it just constrains all the buffer access variables, so when you write code that could overflow that doesn't compile, preventing this problem.

There is a price for this, you can have code which you know, intellectually, never hits the case where say k = 5, but maybe the proof would be sixty pages of difficult maths, WUFFS just won't compile that code until you add handling for k = 5, too bad, safe Rust insists on behaving as though k might be 5 (e.g. inserting runtime checks), unsafe Rust would allow you to YOLO, with a lot of hoop jumping, C++ doesn't care. Of course if your sixty page proof is wrong then these outcomes feel very different...


It seems that some memory safety bug in C/C++ makes HN frontpage every other day.


Rust code already exclusively uses rust ports of the library. There are a couple competing ones [1][2], but I couldn't even find rust bindings for the official C library. The ecosystem has a bit of a fetish for pure rust dependencies where possible, and this vulnerability seems to vindicate that stance.

I don't think any of those are published as libraries usable by other languages, but doing that would be less than a hundred lines for defining a C interface.

1: https://lib.rs/crates/sha3

2: https://lib.rs/crates/tiny-keccak


I maintain some Rust bindings to the official implementation here: https://crates.io/crates/kangarootwelve_xkcp. I've published an update with today's patch (https://github.com/XKCP/K12/commit/27a84ecb811200990add07a04...). That said, I don't know whether those overflows were exploitable with K12.


Not the same, but there are multiple implementations. Here's the top one with 13 M downloads. https://crates.io/crates/sha3

If someone used ffi, linked to it, or attempted to reproduce the behavior verbatim in unsafe code, obviously it would have the same problems as the native code.


what do you mean? the sha3 crate is 8 years old




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: