SSE is a special case that makes it easier to be deterministic, but you still need to be careful, since the reciprocal instructions can be implemented differently and could produce different bit patterns for their results.
My statement was about non-SSE scalar floating point operations. Using different optimizations ('Debug'/-O0 versus 'Release'/-O3 mode) will possibly produce different results, unless you are very careful. Using a different compiler (gcc versus clang versus visual studio) will likely produce different results.
If you want results to be reproducible across different CPU architectures (x86, arm64 etc) one option is using fixed-point arithmetic.
I meant SSE registers but (typically) scalar instructions. That is, a 64 bit op is always executed with 64-bit arguments and no risk of extended precision sneaking in. The .NET Jit has that guarantee on x64.
> Using different optimizations ('Debug'/-O0 versus 'Release'/-O3 mode) will possibly produce different results, unless you are very careful.
Yes. But .NET I think is much more predictable in that case as I don’t observe differences from optimization either. Having a spec and a decent memory model and no undefined behavior makes the compiler worse at optimizing things but better at consistency I guess. If the language spec strictly defines what math operators do and prevents reordering and similar, then there isn’t much outside transcendental functions that can go wrong. On 32bit .NET this did go wrong because whether or not something was in an 80bit x87 register or spilled to a 64 bit memory value seemed to depend on the moon phase. Those were bad times.
> If you want results to be reproducible across different CPU architectures (x86, arm64 etc) one option is using fixed-point arithmetic.
Luckily I never had that need. .NET (C#), x86-64, Windows. In that target, things look extremely stable now.
Step outside into 32bit or non-windows or C/C++ or even non x86 then all bets are off obviously.
My statement was about non-SSE scalar floating point operations. Using different optimizations ('Debug'/-O0 versus 'Release'/-O3 mode) will possibly produce different results, unless you are very careful. Using a different compiler (gcc versus clang versus visual studio) will likely produce different results.
If you want results to be reproducible across different CPU architectures (x86, arm64 etc) one option is using fixed-point arithmetic.