Hacker News new | past | comments | ask | show | jobs | submit login

> Integer Overflow is undefined behavior in C

Signed overflow is undefined behavior, unsigned overflow is defined in both C/C++.

Apart from that, I agree with you. It has to do with the fact that OP is using 128-bit variables on a 64-bit architecture.

Come to think of it, it's actually more mesmerizing that x86 is not slowed down by a 128-bit variable. The ARM architecture is behaving as is to be expected, Intel is actually the odd one out.

Someone mentioned cryptography, I can imagine that because of it, Intel has a few instructions to optimize integer arithmetic on wider integers, and that is probably the reason of the anomaly, which is actually Intel and not ARM.

As mentioned upthread, the mermerizing instruction in question is "MUL", which debuted in 1978 on the 8086 and, except for register width, behaves identically today.

I'm no expert, but shouldn't x86 then produce two 128-bit register entries if it multiplies two 128-bit integers, so totaling actually four registry entries on a 64-bit architecture? If this were the case, Intel would slow down just as much as ARM on a double-than-archictecture-width-integer multiplication, but it doesn't. That's what I find mesmerizing. I'm guessing that Intel simply discards the earlier double registry logic once it goes beyond architecture width, which would explain the speed up.

I.e. 64b * 64b = 2x64b registry entries, according to MUL should be 128b * 128b = 2x64b * 2x64b = 4x64b, but Intel discards this in favor for 128b * 128b = 2x64b * 2x64b = 2x64b.

x86 can't multiply two 128 bit numbers at a time. But it can multiply two 64 bit numbers without losing the high 64 bits of the 128 bit product, which makes the 128 bit multiplication much faster to implement.

> x86 can't multiply two 128 bit numbers at a time.

What's happening here then? Are these not two 128-bit integers? One's a 64-bit recasted to 128-bit, the other a 128-bit constant. Code would be doing faulty math, if it just decides to drop any bits. Coincidence, maybe, that the upper half of the recasted is in this case 0x0, but the code must work for 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF as well, and probably does too.

  __uint128_t tmp;
  tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;

32-bit systems do have long long in the standard to do 64-bit arithmetic, yet they have exactly the same issue on ARM.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact