That's actually just a bug: the x86 asm constraints were needlessly forcing the compiler to use rdx as the input register for the multiply instruction.
But after staring at this for a little while, I realized simply reversing the order of the fields in the dword struct will make the result from the multiply instruction already match the way 128-bit structs are returned in registers under the x86_64 calling convention! This is quite a bit better: https://godbolt.org/z/MbfG63vej
Yes I also noticed the bug in the constraints and came to the same conclusion that changing them yields the same code for both implementations. But I figured that the fact that a function with a single asm instruction has a bug kind of supports my point. :)
As for the codegen on ARM, I don't think your asm implementation is correct there either. As far as I can tell the UMULL instruction only cares about the least significant 32 bits in the source registers, regardless if you're on a 64-bit CPU: https://developer.arm.com/documentation/ddi0602/2024-03/Base...
Very much disagree with your point about constraints. A bug existing in 10+ year old code that has never really been tested and run by four people doesn't support any point. In real life one actually checks these things, it's not that complex :)
Case in point: counting the f's in your constants took me longer than finding that constraint bug did. You would argue ULLONG_MAX would fix that, I suppose.
You're right, that's the 32-bit arm instruction, doh. In my defense, this code was written before 64-bit ARM existed!
With that fixed, it's the same: https://godbolt.org/z/M571P371K
But after staring at this for a little while, I realized simply reversing the order of the fields in the dword struct will make the result from the multiply instruction already match the way 128-bit structs are returned in registers under the x86_64 calling convention! This is quite a bit better: https://godbolt.org/z/MbfG63vej
Also worth noting the asm makes nicer code on arm64: https://godbolt.org/z/7eas6s9vK
https://github.com/jcalvinowens/toy-rsa/commit/59ef9ea905dbd... https://github.com/jcalvinowens/toy-rsa/commit/ceaa8a4dd0834...