
Floating-Point Formats and Deep Learning - _eigenfoo
https://eigenfoo.xyz/floating-point-deep-learning/
======
alecmg
Hoped to see opinions on Unum [1]

Correct me if I'm wrong, but most machine learning does happen around 1.0.
Unum should give more precision for same bits, same precision for less bits
around 0 and 1. And some other interesting features.

But would require new hardware and software.

[1]
[https://en.wikipedia.org/wiki/Unum_(number_format)](https://en.wikipedia.org/wiki/Unum_\(number_format\))

~~~
bsder
Please don't propagate Unums.

They have all the failure modes of interval arithmetic on top of their own
brand of failure modes.

There are _loads_ of engineers with stiff numerical problems that would
welcome a better representation than standard floating point if it worked.
Somehow, Gustafson never manages to demonstrate unums on any of these
problems.

Until Gustafson grabs some real numerical code, implements it with unums and
demonstrates how much better they are, he is not worth paying attention to.

~~~
kortex
Type I and II unums are likely dead ends. Type III (posits) look promising.
Facebook AI has developed their own type of "posit" with the terribly
undescriptive name of "(8, 1, alpha, beta, gamma) log". Much less catchy than
"posit" or "unum" but have been actually demonstrated to have numerical
benefits.

[https://engineering.fb.com/ai-research/floating-point-
math/](https://engineering.fb.com/ai-research/floating-point-math/)

~~~
bsder
The key quote from that paper is as follows:

"Against 32-bit IEEE 754 single-precision FMA, ELMA will not be effective,
though, as the Kulisch accumulator is massive (increasing adder/shifter sizes
and flip-flop power), and the log-to-linear lookup table is prohibitive."

So, these things may be effective if you can dial your precision down. Okay.
We use domain-specific number representations all the time. Fixed-point binary
for DSP. Decimal representations for currency.

The Facebook takeaway is that transistors are now so cheap, that we can do <16
bit floating arithmetic with mostly ROM lookup tables if we make some
restrictions.

That is, however, _NOT_ what Gustafson is proposing. He proposes these for
general usage as a panacea to the complications of floating point.

Numerics isn't some secret cabal. A new floating point system that allowed the
folks doing partial differential equation solvers, computational fluid
dynamics, or discrete time simulation to gain even 25% in time or to open up a
new simulation field because of extra stability would get a serious look.

And while William Kahan (who drove a lot of IEEE-754) generally comes off as
an insufferable jerk, he knows his stuff, and he wasn't alone. The numerics
folks at IBM and DEC (and others) were mostly converging to the same things
with differences at the margins (signed zero, denorms, NaNs, etc.)--mostly
because some of the things that IEEE-754 demanded were a huge pain to
implement in hardware of the day.

As for politics, IEEE-754 was basically a reaction to allowing DEC or IBM to
create a de facto definition of the floating point standard.

------
sdenton4
There's also the whole world of fixed point inference which isn't discussed
here, but quite important. All of the hardware supports fast integer
operations, and with fewer platform specific caveats, so you can get better
guarantee of consistent behavior in deployments.

------
odomojuli
> Floating point? In MY deep learning?

It's more likely than you think.

Maybe not the most appropriate place for an "X? in MY y?" meme despite its
relatively innocuous presentation

It's kind of gross so I'll refrain from linking it

------
loopz
The moment floating-point precision errors become significant in your model,
know that you're dealing with algorithmic BS.

~~~
tgv
Didn't you know our neurons have a 384 bit resolution?

~~~
nine_k
Source, if you don't mind?

