
Improving the fast inverse square root (2010) - cpdt
http://rrrola.wz.cz/inv_sqrt.html
======
skrebbel
Pretty much off topic, but Řrřola, the author of this blog post, also makes
mind blowing 256 byte demos.

E.g. Puls from 2009:
[https://www.pouet.net/prod.php?which=53816](https://www.pouet.net/prod.php?which=53816)
(check the youtube link if you don't have an MS-DOS ready)

I understand little about extreme sizecoding, but I suspect it's a similarly
obsessed mathy story as this blog post, to double use the same bytes as code
and content in a way that things actually work and look great.

~~~
Waterluvian
I heard some Nintendo games does that with sprites or sounds that can take on
a random-ish look. Very very cool.

~~~
andybest
Yars' Revenge on the Atari 2600 used the game code as random input to generate
the graphics for the 'safe zone'

------
simonbyrne
It is worth noting that with AVX-512, Intel has introduced a native inverse
sqrt approximation (VRSQRT14).

~~~
mmozeiko
Inverse sqrt approximation is available since SSE1 with rsqrtss & rsqrtps
instructions.

~~~
slavik81
Which is nice because SSE1 and SSE2 are mandatory parts of x86_64. If you're a
64bit application for desktop, you can use rsqrtss without any checks or
fallbacks.

Unfortunately, it doesn't tend to get used automatically in languages like C.
The result of rsqrtss is slightly different from 1/sqrtf(x) as two seperate
operations, so it cannot be applied as an optimization.

If the rules for floating point optimization are loosened by passing -ffast-
math to GCC, the compiler will use it. That being said, -ffast-math is a
shotgun that affects a lot of things. If you need signed zeros, Infs, NaNs or
denormals that flag may break your program.

~~~
edynoid
I find it quite fortunate, that they don't use it automatically. Introducing a
1e-3 relative error is quite a deal breaker for some. Not for games sure, but
for science that is mostly unacceptable.

~~~
meta_AU
From memory, GCC does one NewtonRaphson iteration on the approximate result so
the error is much lower (closer to e-9 from memory again). They don't use the
approximation directly in fast-math mode.

------
narkee
How is it that inverse seems to be used as "multiplicative inverse" in this
context? It seems like a really ambiguous term, because it could also be
interpreted as either:

inverse of the square root (which is just the squaring operation), or

the inverse of some other binary operator, like addition or anything else...

~~~
StefanKarpinski
I think you’ve hit the nail on the head:

> it could also be interpreted as ... [the] inverse of the square root _(which
> is just the squaring operation)_

Since the other obvious interpretation is not very useful and has a clearer
name—i.e. “the square”—the term “inverse square root” has only one useful
meaning, which is therefore how it’s interpreted. (I don’t follow the second
option about binary operators.) Mathematical terminology and notation in
general are full of ambiguities which are resolved by extensive contextual
knowledge. As noted by a sibling comment, calling it the reciprocal square
root would be clearer.

------
whyever
It seems like this does not work for denormal floats.

------
oranlooney
That's great and all, but nobody needs a 32-bit anything in 2018. This
undergraduate paper provides a magic number and associated error bound for
64-bit doubles:

[https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf](https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf)

~~~
dnautics
Even scientific calculation would be fine with 32 bit floats, but average
floating point error due to representation creeps with ON (iirc) over N
multiplications, so you have to use 64 bit for many scientific applications to
get satisfactory results after a million or a trillion multiplications.

~~~
llukas
Not really -
[https://en.wikipedia.org/wiki/Numerical_stability](https://en.wikipedia.org/wiki/Numerical_stability)

If your algorithm is not stable then even 64-bit won't help you.

Compare Euler vs Verlet -
[https://en.wikipedia.org/wiki/Verlet_integration](https://en.wikipedia.org/wiki/Verlet_integration)

~~~
dnautics
You're making a different argument.

~~~
llukas
Which problem that has stable algorithm would require 64-bit then?

