That's actually just a bug: the x86 asm constraints were needlessly forcing the ...

winternewt · 2024-05-05T12:25:12 1714911912

Yes I also noticed the bug in the constraints and came to the same conclusion that changing them yields the same code for both implementations. But I figured that the fact that a function with a single asm instruction has a bug kind of supports my point. :)

As for the codegen on ARM, I don't think your asm implementation is correct there either. As far as I can tell the UMULL instruction only cares about the least significant 32 bits in the source registers, regardless if you're on a 64-bit CPU: https://developer.arm.com/documentation/ddi0602/2024-03/Base...

jcalvinowens · 2024-05-05T16:49:30 1714927770

Very much disagree with your point about constraints. A bug existing in 10+ year old code that has never really been tested and run by four people doesn't support any point. In real life one actually checks these things, it's not that complex :)

Case in point: counting the f's in your constants took me longer than finding that constraint bug did. You would argue ULLONG_MAX would fix that, I suppose.

You're right, that's the 32-bit arm instruction, doh. In my defense, this code was written before 64-bit ARM existed!