
Re: What is acceptable for -ffast-math? (2001) - willvarfar
https://gcc.gnu.org/ml/gcc/2001-07/msg02150.html
======
ajtulloch
-ffast-math (well, really -funsafe-math-optimizations) is also essential for FP vectorization on ARM CPUs. Compare the generated code of a real-world ARM function with/without funsafe-math-optimizations - [https://godbolt.org/g/P4C9qm](https://godbolt.org/g/P4C9qm) (after, ~70 instructions) vs [https://godbolt.org/g/blAe2q](https://godbolt.org/g/blAe2q) (before, ~270 instructions). As GCC's documentation states ([https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html](https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html)):
    
    
      If the selected floating-point hardware includes the NEON
      extension (e.g. -mfpu=‘neon’), note that floating-point
      operations are not generated by GCC's auto-vectorization
      pass unless -funsafe-math-optimizations is also specified.
      This is because NEON hardware does not fully implement the 
      IEEE 754 standard for floating-point arithmetic (in 
      particular denormal values are treated as zero), so the use
      of NEON instructions may lead to a loss of precision.

------
ecma
Could use (2001) in the title.

This is a pretty interesting email chain and a fairly good case study for
those of us who aim for zealous adherence to standards and rules vs. providing
a practical and useful output for users. I know I tend to go zealous by
default and often have to remind myself that I'm not just writing software to
satisfy my own academic ego.

~~~
exDM69
Yes, there should be 2001 in the title.

Compilers and CPUs have made huge strides since then. I'm pretty sure that
-ffast-math makes a much bigger difference now.

------
danieljh
Here's an interesting anecdote for -ffast-math: the Kahan summation algorithm
[1] can be used to reduce numerical errors in sums.

Now have a look at the codegen [2] without and with -ffast-math.

With -ffast-math the compiler rewrites the Kahan summation algorithm to a
simple sum, completely screwing the code's semantic (since we told the
compiler to do so!). Keep this in mind.

1
[https://en.wikipedia.org/wiki/Kahan_summation_algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)
2 [https://godbolt.org/g/uWZgW2](https://godbolt.org/g/uWZgW2)

~~~
lfowles
I've also been bit pretty hard by not realizing -ffast-math turns on -ffinite-
math-only... which optimizes out nan and inf checks :\

~~~
gok
Me too. Worse, at least for as recent of clang and GCC that I've tested, it
doesn't warn when you do things that -ffinite-math-only makes tautological. So
"if (x!=x)" happily compiles to nothing without warning when x is a float.
(You get a warning if using integers)

~~~
lfowles
Same for actual calls, I suspected x!=x could disappear silently but didn't
think isfinite(x) would be optimized out.

------
pvdebbe
> I used -ffast-math myself, when I worked on the quake3 port to Linux (it's
> been five years, how time flies).

What's going on here? Quake 3 port to linux in 1996? Can this sentence be read
otherwise? Did he possibly mean Quake 1?

~~~
coldtea
> _Quake 3 port to linux in 1996? Can this sentence be read otherwise? Did he
> possibly mean Quake 1?_

The mail is from July 2001. Q3 was released in early 1999, that's already 2
years and 3 months.

Now, Q2 was released in 1997. Q3 could have started in development immediately
afterwards, so that pushes it to 4 years.

Not that much of a stretch to call them 'five years', especially casually
speaking. And if you're a little older (over 30) it's even easier to have a
fuzzier feeling for time (for a 20 year old, 5 years is 25% of their life --
for a 30 year old just 16%, and even whole decades start to have less defined
edges, especially combined work/family routine).

Or it could be a typo.

~~~
cma
I'm pretty sure it was a typo, Quake 3 didn't have a software renderer, so I
wouldn't think they would get a doubling of framerate from that optimization
(unless they were artificially lowering res to make things CPU bound), but
Quake 2 did, so it could.

~~~
coldtea
Makes sense.

------
Qantourisc
Maybe instead of --fast-math we need a new type aprox_float to indicate, we
(the programmes) allow this variable to take some shortcuts during
computation. While we promise not to do things like div0.

Edit: we must agree on a max deviation/error for this type.

~~~
lmm
If you can figure out a sensible way to formally specify what you want, that
would be a good idea. But that gets very complex very quickly. E.g. if you
allow the compiler to compute any algebraically equivalent expression in place
of the expression you wrote, it can just add FLOAT_MAX and then subtract
FLOAT_MAX (which mathematically should make no difference, right?), and then
if it knows the input was reasonably small then it can optimize the result to
0.

------
gpderetta
"Big companies that have billion-dollar fabs spend time optimizing their chips
that take several years to design for _games_. Not for IEEE traditional
Fortran-kind math."

Then again, GPUs now do strictly follow IEEE and certainly not for games sake.
That didnt use to be the case.

In fact they even support denormal numbers at full speed (not even Intel does
that).

~~~
golergka
What GPUs are you talking about? Most mobile chipsets are so bugridden I would
really doubt they follow any standard like that.

~~~
gpderetta
NVidia and I think AMD. Mobile is a different story.

~~~
sspiff
How so? Mobile chips are based on the same architecture, aren't they?

They often lag behind or are cut down (fewer pipelines, lower clocks) for
lower power consumption, but they aren't entirely different designs as far as
I can tell.

~~~
camperman
Mobile chips are whatever architecture the manufacturer has implemented
underneath. There's a bunch of them - Mali, Broadcom, IMX et al - and they're
famous for ignoring the industry standards. Just how badly is something you
find out for yourself when trying to write GL applications on embedded
devices.

------
keldaris
In my experience, -ffast-math has become vastly more useful over the last
10-15 years. I suppose this was a somewhat reasonable discussion to have in
2001, when most performance critical code still heavily relied on hand-
optimized assembly, but nowadays -ffast-math is often the only reasonably
convenient way to get compilers to autovectorize code properly, use FMA
instructions, etc. I have production code that literally runs 4x faster purely
by adding -ffast-math due to autovectorization (and subsequent ILP and other
improvements).

------
zitterbewegung
A lot of this echos Daniels Bernstein's email about CPUS are designed for
video games. [https://moderncrypto.org/mail-
archive/noise/2016/000699.html](https://moderncrypto.org/mail-
archive/noise/2016/000699.html)

------
taneq
The point about memory bandwidth and general algorithmic factors outweighing
raw FPU performance holds true with pretty much all software even today. (And
if you genuinely need raw FPU performance that badly then you're already
running that FPU crunching on a GPU.)

~~~
gpderetta
Nonsense, there is a reason that many floating point kernel as still written
in hand optimised assembler.

And not all FPU loads are amenable to GPU computation.

------
kahrkunne
Reading Linus talk about game development had me confused for a bit there. In
my mind Linus is the kernel guy.

------
tomlock
What does IEEE mean in this context?

~~~
jdoliner
IEEE standard floating point semantics:

[https://en.wikipedia.org/wiki/IEEE_floating_point](https://en.wikipedia.org/wiki/IEEE_floating_point)

------
gravypod
Can we compile specific source files with fast math and others without?

~~~
jagger27
Of course.

------
frozenport
Okay, who won?

