Though I don't actually do any professional work with floats more than whatever distributed SQL correctness issues I get with sum() of doubles in the right/wrong order.
The error I try to correct are for visual/game code I wrote for fun & push through SSE.
More recently have started looking at bfloat16 which doesn't seem to have a lot of work done for this area.
Anyway floating point numerical stability is a different issue: some expressions produce results with less precision than other, different expressions, even though they should be the same thing algebraically (if we were dealing with real numbers or at least rational numbers). The issue here is that FP has limited precision, and it makes it lose some algebraic properties like associativity; and this limited precision sometimes lead to catastrophic cancellation and other problems that sharply reduce the precision (this doesn't happen if you use something with arbitrary precision like rational numbers).
Floating point determinism on the other hand is about always giving you the same result when running the same code (even if you happen to run in a different machine; and that's sometimes challenging because some architectures may not support features like subnormal numbers). It's okay if different code gives different results as far as determinism is concerned.
... what floating point determinism?
Sorry, couldn't help myself. Even in the cases where it's deterministic, the implementation differences across hardware AND software make such a prospect downright hilarious to me.
The determinism that ensures a game replay can use only initial conditions to recreate a whole sequence. There is tons of code relying on fp determinism.
This works when running the game on the same machine and same software. Once you use a different CPU or different compiler, floating point determinism (and such replay) can easily fail.
FP hardware is in this sorry state where they have varying capabilities, but we need to dispel this notion that floating point can't be made deterministic across architectures. If it runs on a computer of course it can be made deterministic. Right now it requires some care, but I hope in the future it will be easier.
You can absolutely have determinism across different architectures, if you stick with architectures that support IEEE 754-2008. Also if a compiler break determinism this is a serious bug (as long as you don't enable -ffast-math of course - otherwise you're asking for breakage).
Unfortunately this means no SIMD, also you need to bring your own transcendental functions (sin, cos, etc). Also you need to be wary of other forms of nondeterminism like threads.
There's a physics engine meant for games and robotics called Rapier [0] that has a feature enhanced-determinism that will enable cross-platform determinism (by disabling threads and simd). If you spot any form of nondeterminism, it's absolutely a bug (either on the compiler or on the library itself), just like it would be a bug if it used only integers (which, by the way, was how nphysics, the antecessor of Rapier, implemented determinism: it used fixed-point math with integers [1])
No it works within the same architecture if care is taken. So playing back a game replay on any x64 machine is ok. When recompiling, care must be taken of course, but a compiler doesn’t reorder instructions randomly to produce different results.
Care must also be taken not to risk using the x87 80-bit registers differently, but these days this is much easier - just don’t use x87 at all and use SSE instead.
I write software that does finite element analysis and similar things and results are reproducible across machines and across versions of the software.
SSE is a special case that makes it easier to be deterministic, but you still need to be careful, since the reciprocal instructions can be implemented differently and could produce different bit patterns for their results.
My statement was about non-SSE scalar floating point operations. Using different optimizations ('Debug'/-O0 versus 'Release'/-O3 mode) will possibly produce different results, unless you are very careful. Using a different compiler (gcc versus clang versus visual studio) will likely produce different results.
If you want results to be reproducible across different CPU architectures (x86, arm64 etc) one option is using fixed-point arithmetic.
I meant SSE registers but (typically) scalar instructions. That is, a 64 bit op is always executed with 64-bit arguments and no risk of extended precision sneaking in. The .NET Jit has that guarantee on x64.
> Using different optimizations ('Debug'/-O0 versus 'Release'/-O3 mode) will possibly produce different results, unless you are very careful.
Yes. But .NET I think is much more predictable in that case as I don’t observe differences from optimization either. Having a spec and a decent memory model and no undefined behavior makes the compiler worse at optimizing things but better at consistency I guess. If the language spec strictly defines what math operators do and prevents reordering and similar, then there isn’t much outside transcendental functions that can go wrong. On 32bit .NET this did go wrong because whether or not something was in an 80bit x87 register or spilled to a 64 bit memory value seemed to depend on the moon phase. Those were bad times.
> If you want results to be reproducible across different CPU architectures (x86, arm64 etc) one option is using fixed-point arithmetic.
Luckily I never had that need. .NET (C#), x86-64, Windows. In that target, things look extremely stable now.
Step outside into 32bit or non-windows or C/C++ or even non x86 then all bets are off obviously.
No, it does not. There's plenty of subtle implementation differences between x86 CPUs of the same manufacturer. Usually different generations, but it's enough to mess up meteorology.
Can you give any examples of where two different CPUs have different or non IEEE compliant behavior (disregarding things like actual bugs or atypical settings)?
Let’s say for regular x64 code and SSE FP.
I’m surprised you say it’s common as I haven’t yet bumped into the problem and I have a large suite of complex FP tests (engineering calculations) that have reproduced to the last digit over 15 years and dozens of different test machines of all flavors.
Results calculated 2004 on P4’s still work out to the same results after millions of calculations on every cpu since then and dozens of versions of the software.
Which instructions are you using? I'd trust basic arithmetic, but I wouldn't trust functions like sine (AFAIK the standard fully specifies the former, but not the latter).
There’s lots of trig. But as far as I’m aware those have also been identical in SSE.
Edit: SSE actually do transcendentals at all so it’s possible I’m saved by stable software defined transcendentals in .net (effectively then the Windows x64 implementations)
Though I don't actually do any professional work with floats more than whatever distributed SQL correctness issues I get with sum() of doubles in the right/wrong order.
The error I try to correct are for visual/game code I wrote for fun & push through SSE.
More recently have started looking at bfloat16 which doesn't seem to have a lot of work done for this area.