Hacker News new | past | comments | ask | show | jobs | submit login

> there is no modulo operation.. So what could we do to solve the FizzBuzz task?

You don’t need modulo instruction when the divisor is known at compile-time.

Good compilers routinely implement integer division and modulo as a sequence of faster instructions like multiplication and bit shifts. Here’s examples: https://godbolt.org/z/4noTxP

Works for SIMD just fine. Take a look at what clang auto-vectorizer did: https://godbolt.org/z/q4bcfj You see what happened to the code? It turned into a bunch of AVX2 instructions, and the loop is gone.




If anyone is interested in knowing how that trick works: if you take a 32x32-bit multiplication problem "a x b", you have a 64-bit result. If you drop the 32-lowest bits, you're performing the operation: "floor(a x b / 4294967296)").

Turns out that you can represent a great many division problems in terms of a * (b / 4294967296).

For example, a / 2 coincides with a * 2147483648/4294967296. a / 3 coincides with a * 1431655765 / 4294967296.

------

That's the basis of the trick. The devil is in the details however. You'll find that floor(a x b / 4294967296) is not exactly equivalent to integer division: some values are "off" slightly. A few adds, or subtracts can correct for this error.

Some values are "perfect" and don't need correction. The compiler will build off of these perfect values and combine them with adds and bitshifts.

-----

Before compilers were good, embedded engineers were basically expected to know how to do this. Most embedded CPUs had no division operator (including ARMv7 and older, but also 8051 and other venerable microprocessors).

The "imperfect" division was probably most used back then. Compilers however have to implement bit-perfect results (exactly equivalent to a division operator). Embedded engineers can often make the decision that "imperfect division" was good enough.


Good comment, however for a / 2 and other powers of two modern compilers are instead optimizing division into shifts and modulo into bitwise and. Both are very fast instructions, faster than integer multiplication.

Even JIT compilers do these optimizations: https://sharplab.io/#gist:ce7f7677cbb9f43efe74aeeadb75deb1




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: