Huh! I was expecting adding u128 integers to be slower because of the cast; but it looks like llvm is (correctly) realising the upcast + downcast has no effect and replacing it with a single u64 add in release mode.
I want to do some additional testing to check if it also optimizes correctly for wasm and in 32 bit contexts, but generally I'm shocked that works so well. Thanks!
It also will happily vectorize and all the rest:
https://rust.godbolt.org/z/hn888ezj4
I want to do some additional testing to check if it also optimizes correctly for wasm and in 32 bit contexts, but generally I'm shocked that works so well. Thanks!