The only time I was able to beat compiler optimization was in an algebraic task where remainders of M mod P were needed, but the number M was fairly often between P and 2P, sometimes 3P, but rarely higher. (Due to the intrinsic character of the algorithm used.)
Of course, the compiler could not have known that in advance, but I did, so I tried subtracting P from M once or twice and checking whether it was already under P.
That was like 40 per cent faster than a single division of 32-bit numbers. But it was 2005, so YMMV.
I have been thinking about "Sufficiently Smart Compilers" recently and it seems like in most cases a SSC would be able to guess the distribution of numbers being divided.
If your program is doing calculations on data read from a CSV, for example, and you happen to know facts about that data that allows you to make a better algorithm (for that data), then I don't see how a compiler would be able to compete.
That aside, I am reminded of: https://ridiculousfish.com/blog/posts/old-age-and-treachery....
"grep sells out its worst case (lots of partial matches) to make the best case (few partial matches) go faster"
My original comment was based on the idea that a SSC would be able to deduce M's distribution from the "intrinsic character of the algorithm used". In that case the optimization would be based on the code not the data.
The “intrinsic character of the algorithm used”, I think, would mean the programmer has to tell the compiler how to do this faster, which I think the OP did when writing something along the lines of
if n < divisor
if n < 2 * divisor
n - divisor
n % divisor
i'm still wondering what the "out by 2 case" is for algorithm D...
most notably for 'word-size' remainders the compiler most certainly does not evaluate the division then work out the remainder... it comes all in one on hardware for the simple and most common case, and not far off it for the multi-word numbers stuff like GMP use (with single word divisor) - even stuff like "algorithm D" keeps track of a remainder as it goes (as any long division algorithm must).
the find is interesting though. its pleasant to see these kinds of problems continue to see improvement to the solutions available. i was completely expecting yet another "find some inverse" trick.
> most notably for 'word-size' remainders the compiler most certainly does not evaluate the division then work out the remainder
I guess you're referring to this part of the article:
> How do compilers compute the remainder? They first compute the quotient n/d and then they multiply it by the divisor, and subtract all of that from the original value (using the identity n % d = n - (n/d) * d).
but the article is correct and is talking about constant divisors and the multiplication-by-inverse optimization here. For example, num % 5 on x86-64 gcc 11.2 -Ofast:
movsx rax, edi
mov edx, edi
imul rax, rax, 1717986919
sar edx, 31
sar rax, 33
sub eax, edx
lea edx, [rax+rax*4]
mov eax, edi
sub eax, edx
Or you can use a JIT compiler ...
I mean the extreme case is obviously a scripting language , where lowering to multiplication may well be possible.
That said you are correct in a JIT being vastly more heavyweight than most cases would need - but mostly due to the cost of making an indirect call rather than the cost of the jit itself
this is annoyingly non-trivial, and exactly the sort of thing that gets culled for JIT performance.
the problem isn't generating code properly targeted at the hardware, or unrolling late-binding cycle wasting.