Hacker News new | past | comments | ask | show | jobs | submit login

It's a good question. We probably could have designed something based on AES-GCM instead, but it would have had more limited impact.

ChaCha8 is still very fast even without direct hardware acceleration. The 32-bit benchmarks at the end of the post are running with no assembly at all and still running within 2X of the 64-bit SSE2-based assembly. AES-GCM with hardware is pretty fast, but AES-GCM without hardware is quite slow.

Just now I tried benchmarking ChaCha8 in 256-byte chunks compared to AES-GCM in 256-byte chunks. With HW acceleration, AES-GCM is maybe 10% faster on my Apple M3 but 20% slower on my AMD Ryzen. Same ballpark as ChaCha8 though.

On the other hand, if I disable AES hardware acceleration, that same benchmark drops by about 20X. So using AES would not have been a good idea for systems without AES hardware.

Overall, not much win to AES in the best case, and quite a loss in the worst case.




I meant that you could use the AES branch when running on HW-accelerated AES systems and chacha8 otherwise. Given that the security properties of AES are better understood than chacha8, any issues with chacha8 would have more limited scope. And since this is a cryptographic RNG, the specific implementation doesn't actually matter. The math variant probably would probably need to use the chacha8 variant since that can have reproducability requirements for a given seed although it's arguable if that reproducability needs to be the same between totally different machines since the implementation of math/rng isn't actually defined to have that property & you're already changing this in 1.22 which indicates it's mutable.

I'm kind of surprised that it's slower on AMD Ryzen - it looks like only the Pro series have a an actual co-processor. Weird decision on AMD's part to implement AES-NI without HW acceleration on some CPUs instead of just not implementing the AES-NI instruction set. That being said, AES-CBC would be even better for this purpose since the authentication guarantees aren't needed.

On my Intel machine, it's 5.7 GiB/s for AES-GCM. I don't know how you benchmarked the chacha8 version so I can't run the equivalent on my machine.


For benchmarking ChaCha8, I ran:

    go test -bench=Block internal/chacha8rand
For benchmarking AES-GCM, I edited src/crypto/cipher/benchmark_test.go:51 to add 256 to the length list, and then I ran:

    go test -bench=GCM/-128-256 crypto/cipher
    GODEBUG=cpu.aes=off go test -bench=GCM/-128-256 crypto/cipher
You're right that we could use AES where available in the places where reproducibility doesn't matter, although that's a second implementation to debug and maintain. ChaCha8 seems fine.


> I'm kind of surprised that it's slower on AMD Ryzen - it looks like only the Pro series have a an actual co-processor. Weird decision on AMD's part to implement AES-NI without HW acceleration on some CPUs instead of just not implementing the AES-NI instruction set.

I meant that AES-GCM is 20% slower than ChaCha8 on that system, not that HW-accelerated AES-GCM is 20% slower than a software implementation. On the contrary, the HW-accelerated AES-GCM is 20X faster than software on that system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: