Signed overflow is undefined behavior, unsigned overflow is defined in both C/C++.
Apart from that, I agree with you. It has to do with the fact that OP is using 128-bit variables on a 64-bit architecture.
Come to think of it, it's actually more mesmerizing that x86 is not slowed down by a 128-bit variable. The ARM architecture is behaving as is to be expected, Intel is actually the odd one out.
Someone mentioned cryptography, I can imagine that because of it, Intel has a few instructions to optimize integer arithmetic on wider integers, and that is probably the reason of the anomaly, which is actually Intel and not ARM.
I.e. 64b * 64b = 2x64b registry entries, according to MUL should be 128b * 128b = 2x64b * 2x64b = 4x64b, but Intel discards this in favor for 128b * 128b = 2x64b * 2x64b = 2x64b.
What's happening here then? Are these not two 128-bit integers? One's a 64-bit recasted to 128-bit, the other a 128-bit constant. Code would be doing faulty math, if it just decides to drop any bits. Coincidence, maybe, that the upper half of the recasted is in this case 0x0, but the code must work for 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF as well, and probably does too.
tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;