It's a good question. We probably could have designed something based on AES-GCM...

vlovich123 · 2024-05-07T19:03:35

I meant that you could use the AES branch when running on HW-accelerated AES systems and chacha8 otherwise. Given that the security properties of AES are better understood than chacha8, any issues with chacha8 would have more limited scope. And since this is a cryptographic RNG, the specific implementation doesn't actually matter. The math variant probably would probably need to use the chacha8 variant since that can have reproducability requirements for a given seed although it's arguable if that reproducability needs to be the same between totally different machines since the implementation of math/rng isn't actually defined to have that property & you're already changing this in 1.22 which indicates it's mutable.

I'm kind of surprised that it's slower on AMD Ryzen - it looks like only the Pro series have a an actual co-processor. Weird decision on AMD's part to implement AES-NI without HW acceleration on some CPUs instead of just not implementing the AES-NI instruction set. That being said, AES-CBC would be even better for this purpose since the authentication guarantees aren't needed.

On my Intel machine, it's 5.7 GiB/s for AES-GCM. I don't know how you benchmarked the chacha8 version so I can't run the equivalent on my machine.

rsc · 2024-05-07T21:58:54

For benchmarking ChaCha8, I ran:

    go test -bench=Block internal/chacha8rand

For benchmarking AES-GCM, I edited src/crypto/cipher/benchmark_test.go:51 to add 256 to the length list, and then I ran:

    go test -bench=GCM/-128-256 crypto/cipher
    GODEBUG=cpu.aes=off go test -bench=GCM/-128-256 crypto/cipher

You're right that we could use AES where available in the places where reproducibility doesn't matter, although that's a second implementation to debug and maintain. ChaCha8 seems fine.

rsc · 2024-05-07T22:01:16

> I'm kind of surprised that it's slower on AMD Ryzen - it looks like only the Pro series have a an actual co-processor. Weird decision on AMD's part to implement AES-NI without HW acceleration on some CPUs instead of just not implementing the AES-NI instruction set.

I meant that AES-GCM is 20% slower than ChaCha8 on that system, not that HW-accelerated AES-GCM is 20% slower than a software implementation. On the contrary, the HW-accelerated AES-GCM is 20X faster than software on that system.