
John Gustafson’s crusade to replace floating point with something better - galaxyLogic
https://www.nextplatform.com/2019/07/08/new-approach-could-sink-floating-point-computation/
======
xvilka
Here is the official site[1] of the project. There is a request[2] to add it
in the Scryer Prolog[3] (ISO Prolog implementation in Rust). And
implementations in Rust[4] itself, and Julia[5] language.

[1] [https://posithub.org/index](https://posithub.org/index)

[2] [https://github.com/mthom/scryer-
prolog/issues/6](https://github.com/mthom/scryer-prolog/issues/6)

[3] [https://github.com/mthom/scryer-prolog](https://github.com/mthom/scryer-
prolog)

[4] [https://gitlab.com/burrbull/softposit-
rs](https://gitlab.com/burrbull/softposit-rs)

[5]
[https://juliacomputing.com/blog/2016/03/29/unums.html](https://juliacomputing.com/blog/2016/03/29/unums.html)

~~~
QuickToBan
That's great, but if the hardware doesn't support it, then wouldn't the
implementation would be slow?

~~~
mkettn
As far as I read
[https://posithub.org/docs/Posits4.pdf](https://posithub.org/docs/Posits4.pdf)
there are FPGA implementations. Also in systems where floating point
arithmetic isn't supported in hardware (i.e. Arduino Uno) you rely on soft fp.

~~~
klingonopera
Benchmark performance analysis of accuracy and speed on AVR platforms would
indeed be very interesting and an actual, possible, immediate implementation
scenario...

------
enriquto
Posits and other floating variants are seriously cool, and Gustafson work is
amazing.

Sadly, the guy has a very annoying writing style that makes him sound like a
crackpot. The advantages of posits would shine much more if they were not
mixed with ridiculous language (posits _are_ floating point numbers, thus they
cannot _replace_ them) and outlandish claims (IEEE floating point is
deterministic, to the apparent contradiction of many sentences written by
Gustafson). It does not help either that Gustafson work only seems to attire
the attention of semi-illiterate journalists who do not really understand what
they are talking about.

~~~
coldtea
> _IEEE floating point is deterministic, to the apparent contradiction of many
> sentences written by Gustafson_

What about this then, which somebody comments below:

"On old x86 systems, the x87 registers were 80-bits and the bottom bits were
undefined. You could theoretically have a 1-bit difference on some results
depending on what the bottom 26 bits (that were undefined, because you had
64-bit floats most of the time, even though the machine did things 80-bits at
a time)."

And below:

"I spent a long time once debugging an issue arising from this. Code was
basically:

    
    
      double x = y;
      [...]
      if (x>y) fail;
    

Nothing in between the assignment and the check changed the value of x or y,
yet the check triggered the fail condition. Turned out to be that one of the
value stayed in the 80 bit x87 register and one was written to RAM as a 64 bit
value, then loaded back into an x87 register for the check, resulting in the
inequality. Running in a debugger wrote both values out of the registers
before the check, making the problem unreproducible."

This behavior might not be IEEE defined, but even as IEEE undefined (and
implementation dependent), it still qualifies those IEEE compliant
implementations as non-deterministic.

Or how about:

"parallel systems, most likely GPUs, where it is common to have slightly
different results even on the same operation, on the same system. This is due
to the fact that several pipelines will do calculations in parallel and
depending on which pipeline ends first (which can depend on many factors like
current temperature) the sums can happen in a different order, leading to
different rounding results."

due to fp associativity?

~~~
yorwba
Those are not problems a new number format can fix though. The issues with x87
precision are due to compilers freely converting between different
representations without this being indicated in the program.

Posits also can't achieve associativity in all conditions, you need to use
"quire" accumulators with higher precision. However, if the decision of using
quires were left to the compiler, you'd end up with the exact same problem of
unclear precision as for x87. To avoid that, those accumulators will need to
be annotated in the code, and straightforward programs that do not use them
will continue to exhibit non-associativity.

~~~
Lerc
The observation that all of the +-*/ operations in a basic block can be
reduced to a single round with the quire would get you a reasonable way with
the compiler alone.

Overall I'm not sold on the quire thing. It seems like a great big accumulator
is a solution that is fairly independent from the floating point format.

~~~
vanderZwan
> _It seems like a great big accumulator is a solution that is fairly
> independent from the floating point format._

It could still be a good idea to mandate a standard version in the spec for
consistent behavior.

------
dang
A thread from 2015:
[https://news.ycombinator.com/item?id=9943589](https://news.ycombinator.com/item?id=9943589)

2016:
[https://news.ycombinator.com/item?id=11573172](https://news.ycombinator.com/item?id=11573172)

2017:
[https://news.ycombinator.com/item?id=15617633](https://news.ycombinator.com/item?id=15617633)

[https://news.ycombinator.com/item?id=14669913](https://news.ycombinator.com/item?id=14669913)

Many other articles (but not many comments):

[https://hn.algolia.com/?sort=byDate&dateRange=all&type=story...](https://hn.algolia.com/?sort=byDate&dateRange=all&type=story&storyText=false&prefix=true&page=0&query=unum)

[https://hn.algolia.com/?sort=byDate&dateRange=all&type=story...](https://hn.algolia.com/?sort=byDate&dateRange=all&type=story&storyText=false&page=0&query=gustafson)

~~~
rrss
The proposed format has changed several times since 2015, so not all those
threads are discussing the same thing as this one.

Posits are Type III Unums IIRC, the original proposal was with Type I Unums.

Previous proposals were variable length formats, which is pretty much a
nonstarter for a variety of reasons. This proposal (posits) is a fixed length
format with variable length fields.

~~~
xenadu02
There have been several papers and blog posts written about the more recent
posits proposals.

[https://marc-b-
reynolds.github.io/math/2019/02/06/Posit1.htm...](https://marc-b-
reynolds.github.io/math/2019/02/06/Posit1.html)
[https://hal.inria.fr/hal-01959581v2](https://hal.inria.fr/hal-01959581v2)
[https://hal.inria.fr/hal-02131982](https://hal.inria.fr/hal-02131982)

The bottom line is they are a tradeoff; they certainly aren't purely better
than IEEE floats. Plus they seem to have a non-trivial hardware cost (one of
the linked papers shows the adder is twice as large as the equivalent IEEE
adder and has double the latency!)

------
antonyme
Very few developers truly understand floating point representation. Most think
of it as base-10, and put in horrific kludges and workarounds when they
discover it doesn't work as they (wrongly) expected. I shudder to think how
many e-commerce sites use `float` for financial transactions!

So as far as I'm concerned, whatever performance cost these alternate methods
may have, it would be well worth it to avoid the pitfalls of IEEE floats.
Intel chips have had BCD support in machine code; I'm surprised nobody has
made a decent fixed point lib that is widely used already.

~~~
rrss
Replacing all the IEEE 754 hardware with posits won't fix this, though.

If you don't care about performance, then the actual solution has no
dependency on hardware:

1\. Replace the default format for numbers with a decimal point in suitably
high level languages with a infinite precision format.

2\. Teach people using other languages about floating point and how they may
want to use integers instead.

The end. No multi-generation hardware transition required.

IMO, IEEE 754 is an exceptionally good format. It has real problems, but they
aren't widely known to people unfamiliar with floats (e.g. 1.0 + 2.0 != 3.0
isn't one of them).

~~~
dmd
I'm pretty sure that 1.0 + 2.0 == 3.0 in IEEE 754. :) Now, 0.1 ...

~~~
piadodjanho
They are not the same thing, but they are close enough most of the time.

The issue with floating point arrises when comparing very close numbers.

------
jchw
Obviously for any 32 bit value there’s only 2^32 possible values. So
logically, a better number format is all about the distribution of values and
reducing redundancy.

It sounds like the basic idea here is that rather than a fixed amount of
significant figures, you get more precision in the middle. Meanwhile, you can
also get larger exponents. Is that right?

~~~
jacobolus
The ideas are (1) gradual instead of hard overflow [IEEE floats do gradual
underflow via denormals], (2) crowding representable numbers more precisely
around 1 while letting very large or very small numbers get less and less
precise, vs. floating point which (except for denormals) is scale free within
its range, (3) making a format which can be extended by just adding bits
without new definitions, so that you can get more or less precise numbers as
needed by just adding zeros or truncating, (4) a it more uniform logic for
basic arithmetic with fewer edge cases.

It's a bit sad that many of the creator’s claims are exaggerated and his
examples cherry-picked though.

~~~
FabHK
> floating point which (except for denormals) is scale free within its range

Presumably that's what you mean by scale free, but let's spell it out:

It's denser around 0, then has uniform density between, say, 1/8 and 1/4, half
that density between 1/4 and 1/2, half that between 1/2 and 1, half that
between 1 and 2, etc. etc.

If x = 1.0e22, you can add a million and it's still the same.

~~~
hinkley
I recall in some of these earlier conversations, people pointing out that
there are game engines where they invert some of their calculations to prevent
artifacts caused by loss of significant figures with certain numbers.

------
gwbas1c
> Better yet, he claims the new format is a “drop-in replacement” for standard
> floats, with no changes needed to an application’s source code.

Claims like that are best taken with a grain of salt:

> It also does away with rounding errors, overflow and underflow exceptions,
> subnormal (denormalized) numbers, and the plethora of not-a-number (NaN)
> values. Additionally, posits avoids the weirdness of 0 and -0 as two
> distinct values.

Ok, so posits will probably work fine as a drop-in replacement when my
application makes simple use of floats. But, assuming my application is doing
non-trivial math, it's probably aware of the above edge cases. Thus, dropping
in posits might have lots of weird side effects where I had to work around
weird side effects of floats.

~~~
vanderZwan
> _But, assuming my application is doing non-trivial math, it 's probably
> aware of the above edge cases. Thus, dropping in posits might have lots of
> weird side effects where I had to work around weird side effects of floats._

Yeah, but posits appear to have _fewer weird edge-cases_. Plus the people
trying out posits are constantly trying to find methods to work around the
limitations too.

During this years conference on posit maths Florent de Dinechin had a really
nice talk bringing up all current issues with posits and ways that floating
point maths had found workarounds over the year, as a kind of challenge to
make posits catch up[0][1]. The community took it really well, as far as I can
see, and Gustafson in particular seemed delighted because he genuinely wants
everyone to start using better numerical methods.

[0]
[https://www.youtube.com/channel/UCOstJ2IVC4Y8mbgN0IsowKw](https://www.youtube.com/channel/UCOstJ2IVC4Y8mbgN0IsowKw)

[1]
[https://www.youtube.com/watch?v=tcX2nRCdZvs](https://www.youtube.com/watch?v=tcX2nRCdZvs)

------
Aardwolf
Choosing to have -infinity and various nans can be done independently from
choosing to add regime bits, one could as well design this system to include
them by adding more special values.

I didn't find a resource specifying how many exponent bits to actually use?
(The 'es' value)

I don't see anything different in this system than regular floats that would
ensure consistent results across machines/compilers/...? It's floats plus
unary encoded regime bits to have var length exponent, so everything that can
make floats inconsistent can still happen here: non commutativity, different
exponent sizes or precisions of intermediate values, different rounding modes,
...

~~~
Lerc
I just read the paper. Standard es bit counts are covered in section 7.2

Long story short: es=log2(nbits) -3

------
ambrop7
> It also does away with rounding errors

What? Surely it does not have infinite precision with a finite number of bits,
and also doesn't seem to be a rational number representation.

~~~
DougBTX
Hm, perhaps the answer is a subtle definition of "rounding error".

A rounding error is when you have a number, lets say 0.75, and due to rounding
it is recorded as 1.00. The "rounding error" is 0.25.

An alternative to rounding to 1.00 would be to have a mechanism which says
"the value is between 0.50 and 1.50". This way, there is no actual rounding,
as it doesn't commit to a rounded value, so there is technically no "rounding
error".

A neat advantage of recording an interval rather than rounding is that the
"error" is preserved in the data through arithmetic, so if there is some
following code that runs x * 100, a rounding mechanism would say "the value is
100" whereas an interval mechanism would say "the value is between 50 and
150". Then, if the user only looks at the output, it will be clear that there
is a wide error range and something needs to be fixed, rather than the output
indicating a precise answer when really it suffers from significant rounding
errors.

~~~
phkahler
I like posits, but they do not encode intervals. His older ideas on that are
crap and that's part of the problem getting posits accepted IMO.

------
tosh
found this from Kahan in 2016
[https://people.eecs.berkeley.edu/~wkahan/UnumSORN.pdf](https://people.eecs.berkeley.edu/~wkahan/UnumSORN.pdf)

[https://people.eecs.berkeley.edu/~wkahan/EndErErs.pdf](https://people.eecs.berkeley.edu/~wkahan/EndErErs.pdf)

[https://people.eecs.berkeley.edu/~wkahan/SORNers.pdf](https://people.eecs.berkeley.edu/~wkahan/SORNers.pdf)

~~~
FabHK
Thanks, came to say the same thing - wondered what William "Father of IEEE
754" Kahan had to say about this.

I haven't had time to digest it all, but he is certainly critical of
Gustafson's proposals. Not sure though the articles linked above cover the
latest and greatest Unum III. At any rate, I'd pay close attention to Kahan's
critique.

------
savant_penguin
"(even the same computation on the same system can produce different results
for floats)"

wait...what? Is this real?

~~~
dragontamer
Yes and no.

1\. Yes -- On old x86 systems, the x87 registers were 80-bits and the bottom
bits were undefined. You could theoretically have a 1-bit difference on some
results depending on what the bottom 26 bits (that were undefined, because you
had 64-bit floats most of the time, even though the machine did things 80-bits
at a time).

2\. No -- Modern x86 systems use SSE registers, which are 64-bits. There are a
whole slew of configuration options, but if Windows / Linux does their job,
you should have the same rounding-errors across your program (unless you
manually change the rounding options, but that's your own fault if you go that
route).

3\. Kinda yes -- Floating point operations are NON-associative. (A+B)+C does
NOT equal A+(B+C). The best example is Python:

>>> (1+1)+2.0 __53 9007199254740994.0 >>> 1+(1+2.0 __53) 9007199254740992.0

2 __53 is the first value which starts to "round off" 1.0 in double-precision.
So (1+2.0 __53) == 2.0 __53, because the value was rounded off.

Due to #3, even if you did everything correctly (ie: set the rounding flags so
they were consistent), if your arithmetic happened in slightly different
orders (very common in network-games, where different players may have their
updates in slightly different orders), you end up with a 1-bit error between
clients... which completely borks your simulation.

Because #3 is nonintuitive, many people think that you get different results
when using floats. But its simply due to non-associativity.

~~~
DubiousPusher
> Modern x86 systems use SSE registers

Are your sure of this? It's my understanding that SSE registers require the
use of a special API and standard floating point operations do not use them.

But the last time I worked with them was writing a SIMD vector library 7 years
ago.

~~~
titzer
> It's my understanding that SSE registers require the use of a special API
> and standard floating point operations do not use them.

No, SSE registers are used by most compilers that do floating point these
days. They support all the usual IEEE float math operations and a host of bit
twiddling operations as well as vector operations. The vector operations do
still require using compiler intrinsics in C++, although some
autovectorization does occur in gcc, icc, and llvm.

~~~
DubiousPusher
Ah yes. This is what I was remembering. For vector operations you have to use
special intrinsics

------
mrtnmcc
How does the implementation of an FPU for this compare? I thought the existing
IEEE 754 floating point standard was focused on reduced complexity of addition
and multiplication hardware. This seems more complicated.

~~~
dnautics
having implemented it, with unoptimized verilog (ok, i wrote a verilog
generator to generate it), it requires about 30% fewer LUTs on a FPGA relative
to berkeley hardfloat implementation.

~~~
wwwigham
Is the variable length regime handling that much easier to deal with (space-
wise) than the NaN and subnormal handling needed in IEEE floats? I'd think
that the regime scheme would effectively be equivalent to creating a multitude
of different-width subnormal routes. Is it really the NaN handling that kills
IEEE float performance?

~~~
dnautics
It's basically a barrel shifter; for addition you're going to need it anyways.
Multiplication is a bit nastier, but most of multiplier gates are the adder
gates anyways. I made a useful insight that negative numbers are basically the
same as positives, with a "minus two" invisible bit.

Here is a sample 8-bit multiplier. All code was generated using a verilog DSL
I wrote in Julia for the specific purpose. All verilog is tested by
transpiling to c using verilator and mounting the shared object into a Julia
runtime with a Julia implementation.

[https://github.com/interplanetary-
robot/mullinengine/blob/ma...](https://github.com/interplanetary-
robot/mullinengine/blob/master/multiplier/posit_mult.v)

------
theamk
This may be good for numerical simulation, but I doubt this will ever replace
common ieee754 float in general purpose programs. Two problems I can see right
away:

\- Sometimes, one needs precision for numbers not close to 1. A UNIX timestamp
is a good example -- for the current date, it provides ~1uS resolution in
64-bit ieee754. I could not calculate what the resolution would be for posits,
but I suspect much worse.

\- The lack of positive and negative infinites will break naive min/max
calculations which start accumulator with +/-inf.

Now, you might say that those things should not be done, and that software
that uses those patters is defective. But even then, there is a lot of
software like this, and so posits will never become a default "float" type.

~~~
_kst_
UNIX timestamps are normally stored as 32-bit or 64-bit signed integers, not
as floating-point. If you want better than 1-second precision, then the type
"struct timespec" (specified by POSIX) gives you nanosecond precision. Fixed-
point types can also be used in languages that support them.

~~~
jjtheblunt
I thought unsigned in 64 bits, holding the number of nanoseconds since the
Unix 0 time (which might be 1970, but i forget).

~~~
masklinn
UNIX time is _seconds_ since epoch (hence year 2038, that's the limit for a
signed 32b time_t).

gettimeofday() and clock_gettime() provide higher resolution timestamps
(respectively µs and ns), using typedefs instead of just numbers.

Some APIs return floating-point UNIX time in order to provide sub-second
accuracy (the decimal part is the fractional second). Python's time.time()
does that for instance.

------
ambicapter
The paper for this was published in 2017[0]

[0]
[https://dl.acm.org/citation.cfm?id=3148220](https://dl.acm.org/citation.cfm?id=3148220)

~~~
agumonkey
It's odd. Someone told me about that on irc at the Time. People were feeling
strongly about the topic. Quackery, mythomany,.. nobody wanted to hear about
anything but IEEE standards.

~~~
bsder
> Quackery, mythomany,.. nobody wanted to hear about anything but IEEE
> standards.

Mostly because there is no good evidence that what is being proposed is
better.

This isn't the olden days when it was difficult to demonstrate on a large
enough CPU and dataset. Today we have cloud computing. If you create something
better, you can demonstrate it by putting it into a numerics application and
blow everybody away.

The CFD people are _always_ looking for better solutions. The numerical
simulation people are _always_ constrained.

Until you do that, people have a right to blow you off.

~~~
ricardobeat
This paper on UK weather simulation, cited in the article, seems like pretty
good evidence:
[https://posithub.org/conga/2019/docs/13/1100-MilanKlower.pdf](https://posithub.org/conga/2019/docs/13/1100-MilanKlower.pdf)

~~~
bsder
That only really compares posits to Float16 in a domain where the paper admits
roundoff error mostly isn't really an issue even at single precision.

I'd be much more interested to see how these handle stiff systems with
multiple time constants. That's a domain where everything has problems and
improvement is going to actually move some things from completely infeasible
to actually simulatable.

That's a _much_ stronger use case. And you don't have to worry about
implementation efficiency since the improvement ratio is "infinity"\--you went
from can't do it at all to actually being able to do it.

------
vanderZwan
The curious skeptics here might want to check out the videos from CoNGA'19[0].
There are talks about posits being used and tried out in the wild with
impressive results. For example, according to Millan Klöwer, 16 bit posits
could be accurate enough to replace 64 bit floats in certain climate modelling
problems[1].

EDIT: just realized that the example was mentioned in the article, with a link
to the slides. Still, the YT talk may add some context

[0]
[https://www.youtube.com/channel/UCOstJ2IVC4Y8mbgN0IsowKw/vid...](https://www.youtube.com/channel/UCOstJ2IVC4Y8mbgN0IsowKw/videos?disable_polymer=1)

[1]
[https://www.youtube.com/watch?v=XazIx0cMVyg](https://www.youtube.com/watch?v=XazIx0cMVyg)

~~~
piadodjanho
The innovative idea from posits are they use Golomb-Rice prefix to encode
exponent numbers. The Golomb Rice prefix let you encode exponent closer to
zero using less space.

For example, the posit16 with nbits=16 and es=1 encodes the exponent like:

01.0 = 0 01.1 = 1 001.0 = 2 001.1 = 3 0001.0 = 4

The format has a normal exponent field with es bits that encodes exponent in
binary. When it overflows, it encodes the carry as unnary format. In the
unnary format the run-lenght of the same number encodes a number. For example:
0001 would be 3, 001 would be 2, etc. Of course, the posit format is a bit
bore complicate than that (it supports negative exponent, for example). But
the ideia is pretty much this.

Because of this encoding, if you are working with numbers that have small
magnitude posit will have a LOT of more precision than your floating point
format.

But the claim that posit16 can have as much precision as binary64 from
ieee-754 is misleading. The posit16 can have _up to_ 16-1-2-1=12 bits of
precision. While binnary64 always has _53 bits_ of precision.

They likely compared the binnary64 with posit16 using the accumulator (aka
quire). I'm not sure how the quire would map to real world FPU, it uses a lot
os space.

~~~
nestorD
> They likely compared the binnary64 with posit16 using the accumulator (aka
> quire).

Which I always find deceptiv because nothing stops us from using a quire with
classical floating point arithmetic.

It is in fact a comparaison betwen a summation and a compensated summation :
it is more precise because the algorithm is different not because of posits or
floats.

They tell you that the quire is even faster than a traditional sum while being
more precise but reading the associated reference reveals that it only hold
with specific hardware which could also be used to use the quire with floats.

(and, arguably, I would love to know that every programmer is aware of a solid
implementation of compensated/exact summation/dot-product and uses it when
appropriate)

~~~
piadodjanho
> They tell you that the quire is even faster than a traditional sum while
> being more precise but reading the associated reference reveals that it only
> hold with specific hardware which could also be used to use the quire with
> floats.

Precisely. Once the posit in unpacked they are indistinguishable from a
floating point. It is not fair let posit use a massive accumulator while
working with a tiny ieee-754 floating point accumulator. Like I said before,
the precision of a number represented in binnary64 is greater than posit8.
This comparation ignores the biggest advantages of posit: efficient data
format.

> (and, arguably, I would love to know that every programmer is aware of a
> solid implementation of compensated/exact summation/dot-product and uses it
> when appropriate)

I like the idea of making the accumulator type (quire) accessible to the
programmer. I think this brings awareness of the underlying hardware
implementation to the average programmer.

------
Mikhail_K
> It also does away with rounding errors, overflow and underflow exceptions,
> subnormal (denormalized) numbers, and the plethora of not-a-number (NaN)
> values.

All of those are important and useful features. Presenting their absence as
some kind of advantage shows that Gustafson has no clue.

~~~
pfortuny
The article is too sloppy on details: these can be found in the specifiation
and are not what the journalist writes.

------
mooman219
One annoying part about floats is the abuse of the NaN space. Some runtimes
like SpiderMonkey, JSC, and LuaJIT abuse the NaN space to store pointers in
doubles. This practice is often called nan-boxing, but has a few variants. A
more efficient use of bits like with Posits will break this.

As for the claim about a FPU taking up less space and power for posits than
floats, facebook made that claim: [https://code.fb.com/ai-research/floating-
point-math/](https://code.fb.com/ai-research/floating-point-math/)

~~~
enriquto
> One annoying part about floats is the abuse of the NaN space.

I would say that this is certainly my favorite feature of IEEE floats :)

------
yholio
Do these techniques make it easier or harder to implement hardware floating
point units?

Storage and transmission speed have progressed exponentially while execution
units have become the bottleneck. A floating point format that is 20% denser
or more accurate but that requires a 2x number of gate delays to implement is
major step backwards, except maybe in highly specialized applications.

~~~
yaantc
From [1]: " The standard 32 bits posit adder is found to be twice as large as
the corresponding floating-point adder. Posit multiplication requires about 7
times more LUTs and a few more DSPs for a latency which is 2x worst than the
IEEE-754 32 bit multiplier." It's for an FPGA implementation.

This being said, I object to your premise: transmission and memory storage are
getting comparatively more costly vs computation. From "Computer Architecture,
A Quantitative Approach", 6th edition in figure 1.13 there are some values for
the TSMC 45nm process, so hardly the leading edge and it only gets worse with
finer nodes: \- 32 bits integer multiplication: 3.2 pJ \- 32 bits float
multiplication: 3.7 pJ \- 32 bits read in a small 8 kB SRAM: 5 pJ \- external
DRAM 32 bits read: 640 pJ

We have a lot of transistor nowadays, and as long as the computation can be
pipelined halving the memory accesses could be a win (TBC).

Also on from the same Inria team as [1], and also covering hardware cost but
more than this: "Posits: the good, the bad and the ugly":
[https://hal.inria.fr/hal-01959581v3/document](https://hal.inria.fr/hal-01959581v3/document)

[1] "Hardware cost evaluation of the posit number system",
[https://hal.inria.fr/hal-02131982/document](https://hal.inria.fr/hal-02131982/document)

~~~
dnautics
I'm not sure the inria team used the correct optimization for posit addition.
You do yourself a disservice by treating a posit like a float and bifurcate
the cross (negative/positive) from the negative/negative and positive/positive
branches; since posits are twos complements the posit adder should be smaller,
not bigger.

------
melinoe
This has always been interesting to me. It's been awhile since I read about it
last, though, and I remember reading some criticisms or concerns that posits
(or unums? what's the difference?) would end up being slower for some reason.
I don't remember the arguments though, or where I saw them; I think the idea
was that there were some edge cases that were common enough in practice that
overall it would slow things down. It would be nice to see a balanced
discussion of the ideas (pros and cons).

------
Causality1
Performance per transistor has barely budged in thirty years. Posits seem like
a very good development in that direction.

------
dilawar
Is there any FPGA implementation of his ideas?

One can then solve some ODEs to check how these implementation are performing.

Ofcourse the speed will be far slower but one can get number of instructions.

------
mcv
This sounds really good. I find floating points completely unusable for any
situation where accuracy is important. It's fine when being vaguely in the
right ballpark is good enough, but I don't want to have to deal with 2 + 4.1 =
6.1000000001, or x/1000000 + y/1000000 != (x+y)/1000000.

~~~
nestorD
Well, those will not be solved by posits, it is still an approximate floating
point format. What it does is redistribution the precision to what Gustafson
considers a better default and dropping edge cases in order to get more bits
for precision.

edit: One of several examples found in the "Posits: the good, the bad and the
ugly" paper linked in the thread : 10.0 * 2.0 = 16.0 in posit8

~~~
mcv
Really? My impression from the article was that this was exactly one of the
things posits were supposed to fix, at least for numbers with small exponents.

I'm not sure how 10.0 * 2.0 = 16.0. I'm not sure what posit8 means, but it can
only be correct if it switches halfway from base 8 representation to base 10,
which is a bit weird, but at least the calculation is correct. (Otherwise it
would be so incorrect to be unusable for anything.)

~~~
nestorD
I am serious (and recommend reading the paper I quoted to have a good
understanding of the trade-of offered by posits).

Overall, posits are a new trade-of that will give you better precision
(nothing exact, it is still an approximation) when you manage to keep all your
number in a small range. Once you get out of that range precision drops
significantly (whereas the precision of classical floating points drops
gradually).

Posit8 are equivalent to 8 bits floating points (minifloats) making them an
easy target for pathological cases but the example still illustrate the fact
that, contrary to floating-point arithmetic, multiplication by a multiple of
two is not exact with posits (one of several good properties we take for
granted and would lose when switching to posits).

~~~
mcv
If it's just a variation of the same problems behind floats, then I'm not that
interested. Well, I guess it depends on how small the range is.

------
stochastimus
This is awesome. Doubling performance for free sounds good to me.

~~~
Ballas
Well, we still need to replace all the floating point hardware in everything.
So not exactly free...

------
Katydid
Gustafson commented on the article this morning.

------
jlgustafson
At the risk of a "flame war" where there are no winners, I would like to
comment on some the statements here before they get stale. If we avoid ad
hominem attacks and stick to the math, the claims, and counterexamples, this
can be a useful scientific discussion and I very much welcome all the
criticism of my ideas.

The irreproducibility of IEEE 754 float calculations is well documented... on
Wikipedia, by William Kahan, and in an excellent paper by David Monniaux
titled "The pitfalls of floating-point computations". It is amazing that this
is tolerated, but IEEE 754 has done a great deal to lower the expectations of
computer users regarding mathematically correct behavior.

The posit approach is not merely a format but also the Draft Standard. Whereas
floats can arbitrarily use "guard bits" to covertly do calculations with
greater accuracy, the posit standard rules that out. Whereas the float
standard recommends that math functions like log(x), cos(x) etc. be correctly
rounded, the draft posit standard mandates that they be correctly rounded (or
else they have to use a function name that clarifies that they are not the
correctly-rounded function). By the draft posit standard, you cannot do
anything not specified in the source code (like noticing that a multiply and
an add could be fused into a multiply-add with deferred rounding, so calling
fused multiply-add without telling anyone). The source code completely defines
what the result will be, bitwise, or it is not posit-compliant. It cannot
depend on internal processor flags, optimization levels, or special hardware
with guard bits to improve accuracy; this is what corrupted the IEEE 754
Standard and made it an irreproduci ble environment to this day.

The claim that posits is a "drop-in" replacement for floating point needs a
lot of clarification, and this is unfortunately left out of much of the
coverage of the ida. Clearly, if an algorithm assigns a hexadecimal value to
encode a real value, that will need work to port from IEEE floats to posits.
The math libraries need to be rewritten, as well as scanf and printf in C and
their equivalent for other languages. However, a number of researchers have
found that they can substituted a posit representation for a float
representation of the same size, and they get more accurate results with the
same number of bits. I call that "plug-and-play" replacement; yes, there are a
multitude of side effects that might need to be managed, but it's nothing like
the jarring change, say, of moving from serial execution to parallel
execution. It's really pretty easy, and it's easy to build tools that catch
the 'gotcha' cases.

Some here have suggested the use of rational number representation, or said
that there are redundant binary representations of the same numerical value.
Unlike floats, posits do not have redundancy. I suspect someone is confused by
the Morris approach to adjusting the tradeoff between fraction bits and
exponent bits, which produces many redundant wa6s to express the same
mathematical value.

Perfect additive associativity is available, as an option, with the quire. If
needed. Multiplicative associativity is available, as an option, by calling
fused multiply-multiply in the draft posit standard. Because quire operations
appear to be both faster (free of renormalization and rounding) and more
accurate (exact until converted back to posit form), I am puzzled regarding
why anyone would want to do things more slowly and with less accuracy.

Kulisch blazed the way with his exact dot product; unfortunately, any exact
dot product based on IEEE floats will have an accumulator with far too many
bits (like 4,224 for IEEEE double precision) and an accumulator that is just a
bit larger than a power-of-two size. The "quire" of posits is always a power-
of-two, much more hardware-friendly. It's 128 bits for 16-bit posits, and 512
bits for 32-bit posits, the width of a cache line on x86, or a an AVX-512
instruction.

"A little knowledge is a dangerous thing." In evaluating posit arithmetic,
please use more than what you see in a ycombinator blog. You might discover
that there are several decades of careful decision-making behind the design of
posit arithmetic. And unlike Kahan, I subject my ideas to critical review by
the community and learn from their input. The 1985 IEEE format is grossly
overdue for a change.

~~~
milankl
I want to add a few comments as most of the discussions here concerned the
hardware implementation and only few pointed to possible applications. I work
on weather and climate simulations, but my opinions should apply in general to
CFD or PDE-type problems.

Yes, having redundant bitpatterns is not great when designing a number format,
however, even for Float16 (half-precision), making use of the 3% NaNs is wise,
but not going to be a gamechanger. Some others discussed pro/con for neg zero
and also neg infinity: In my view you want to have a bit pattern that tells
you that the answer you get is not real, but whether it's +/\- Inf or some NaN
is pretty much irrelevant. Using these bit patterns for something else sounds
like a very reasonable approach to me. Furthermore, I've never come across a
good reason for -0 in our applications.

When it comes to weather and climate models in HPC, I see the following
potential for posits: Similar as BFloat16 is supported on TPUs, I could see
Posit16 to be supported by some specialised hardware like GPUs, FPGAs etc. I'm
saying that because for us it's not important to have a whole operating system
running in posits (although I probably wouldn't mind) but to have them for
some performance critical algorithms. Unfortunately, weather and climate
models are far more complex than some dot products and we usually have to deal
with a whole zoo of algorithms causing weather and climate models to cover
easily several million lines of code. Now let's say we know our model spends
20% of the time in algorithm A which only requires a certain (low) precision
to be stable and to yield reasonable results, then it would be indeed a big
game changer if we could run this algorithms in, say, 16bit. In exchange of
precision for speed we would probably want to push things to the edge, i.e. if
we can just about do it in 16bit, then we should. Now there are several 16bit
formats: Float16, BFloat16, Posit16, Posit16_2 (with 2 exp bits), and
technically also Int16. Let's forget about the technical details of these
formats and let's focus on where they actually considerably differ: What is
the dynamic range and where on the real axis do I get how much precision to
represent numbers. Yes, from a computer science perspective also the technical
details matter, but from our perspective most of it is pretty irrelevant and
what actual matters are these two things: dynamic range and where is the
precision. Because these two really determine whether your algorithm is gonna
crash or whether you can use it operationally on your desktop computer or in a
big fat $$$ supercomputer.

For PDE-type problems (that includes CFD and also weather and climate models)
I came within the last year of my research to the following preliminary
conclusions regarding dynamic range and precision with respect to the above
mentioned formats:

Int16: Let's forget about it. Float16: The precision is okay, but rarely
needed towards the edges of the dynamic range. Floatmin might work, however,
floatmax with 65504.0 is easily a killer. Might work with a no-overflow
rounding mode and smart rewriting of algorithms to avoid large numbers.
BFloat16: For our applications having only 7 significant bits is not enough, I
didn't come across a single sophisticated algorithm that works with BFloat16.
Posit16 (with 1 exp bit): Great, puts a lot of precision where it's needed but
also allows for a reasonable dynamic range. Posit16 (with 2 exp bits):
Probably even better, the sacrifice of a bit precision in the middle is fine
and the wide dynamic range gives it the potential to also work with algorithms
that are hard to squeeze into a smaller dynamic range.

In short, posits actually fit much better the numbers our algorithms produce.
And this can indeed be the game changer: If a GPU supports posit arithmetic
and we can run algorithm A on it in 16bit: Wonderful, contract sold! But if we
couldn't with BFloat16 or Float16 than there is no future for 16bit in our
field.

I explain more about this in this paper: dx.doi.org/10.1145/3316279.3316281

And there are two talks which tell a similar story:
[https://www.youtube.com/watch?v=XazIx0cMVyg](https://www.youtube.com/watch?v=XazIx0cMVyg)
[https://www.youtube.com/watch?v=wp7AYMWlPLw](https://www.youtube.com/watch?v=wp7AYMWlPLw)

or simply drop me an email if you have questions (unlikely respond here) that
you find on my website: milank.de

------
QuickToBan
Posits seem great, but LLNL seems to favor ZFP[1], not posits.[2] Maybe new
chips should then implement both posits and ZFP.

[1] [https://github.com/LLNL/zfp](https://github.com/LLNL/zfp)

[2]
[https://helper.ipam.ucla.edu/publications/bdcws2/bdcws2_1504...](https://helper.ipam.ucla.edu/publications/bdcws2/bdcws2_15049.pdf)

~~~
garmaine
Unrelated as far as I can tell. Posits are an alternative floating point
format, not an array compression algorithm. The “compression” benefits from
being able to use posit floats instead of ieee754 doubles, because of the
better precision.

~~~
sitkack
Posits also take less space for the same total accuracy, they use less space
on disk, ram and less memory bandwidth.

~~~
garmaine
It occurs to me that maybe you are referring to unums? The "unum" format
discussed in "End of Error" is variable sized. However they since dropped that
design and switched to fixed-sized number formats (16-, 32-, or 64-bit) with
adjustable division between exponent and fraction bits, the "posit."

So "posit" numbers get you the ability to tradeoff precision and range, but
they're not anymore highly compressed than regular old IEEE floats. Unless, as
Gustafson argues, you didn't need a double in the first place and the added
features of posits let you switch to a float.

------
towlinson
Let's change the world.

