Hacker News new | past | comments | ask | show | jobs | submit login

This is a bit misleading. Quotient and modulo with constant right-hand side is readily optimised into faster operations, like multiplication and bit shifts. "All" optimising compilers do this and don't emit a true divide instruction in such cases. See for instance: https://godbolt.org/z/vxnTPvY6M

Quotient and modulo with a variable RHS is expensive, but this isn't necessary for the algorithm showed in the article. Although I agree that binary wins over decimal anyway. We are using binary computers after all, for the last 50 years or so.




> Quotient and modulo with a variable RHS is expensive

For dividing ‘big’ bignums by ‘small’ divisors, I would use/tweak libdivide (https://libdivide.com/), and spend the time to optimize the division.

For dividing ‘big’ bignums by ‘big’ divisors, that may not be worth it.

(Determining the correct values of the various ‘big’ and ‘small’ values left as an exercise for the reader. Those will depend on the hardware at hand)


The oldish version of gcc used by highload.fun seems to stop performing this optimization after the 5th or so constant divisor within a small function body. Always good to double check :(


The reason a division by 1000000 can be efficiently turned into a multiplication is because you're targeting a system that has a 32x32->64 multiplication instruction, which has plenty of room to spare. But if you have that, you would be better off using e.g. 2^32 as your modulo.


I agree that 2^32 is better, but the reason is not that division is too expensive because OPs algorithm after standard optimisation does not need any division.

Clearly choosing a base type that is fast on the machine, is key. But I feel even a 64/32 divide with constant operand will be optimised into something reasonably fast on most systems. If you have a counterexample it would be interesting to hear.

A system without fast divide likely doesn't have any floats either. So all arithmetic would be fixed point and 32x32->64 multiplication would be a pretty common operation.


I just meant that the compiler in your example is computing x / 1000000 as follows:

   *q = (x * 1125899907) >> 50;
This works out OK because there's a 32x32->64 instruction, arm64's umull in this case. If it didn't have one, and it had a 32/32->32 division instruction, it might have emitted that instead of hand-rolling the same scheme using 32x32->32 multiplications.

Note that it's technically an arithmetic approximation that just happens to work for all possible values of x, since obviously 1<<50 is only divisible by 1<<(0 to 50).


I see. However, 32x32->64 multiply has been common on 32-bit processors since the mid 80's. It would be interesting to know if somebody is still building new products on (32-bit) processors that lack it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: