
What Every Computer Scientist Should Know About Floating-Point Arithmetic - hamidr
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
======
haberman
Personally I think one of the hardest things to wrap your head around is how
it's power-of-two fractions that can be represented exactly, not power-of-10
fractions. In other words, 0.5 can be represented exactly as IEEE floating
point, but 0.1 cannot.

This is not inherent to floating point; a floating point representation that
was based on decimal digits instead of binary ones would not have this issue.

It's not too hard to understand limited precision and rounding; we deal with
both all the time. The part that's hard to understand is that you have to
perform rounding simply to put a number like "0.1" into a floating-point
value.

Specifically, in 0.1 as an IEEE float is represented as:

    
    
      val = 0x3dcccccd
      sign = 0
      significand = 4ccccd (1.6000000)
      exponent = 0x7b (-4)
    

So 0.1 is internally represented as 1.6000000 * 2^-4, while 0.5 has the much
more understandable representation 1 * 2^-1.

This is made even more confusing by the fact that rounding performed on output
will make it _appear_ that 0.1 has been exactly represented in many cases.

So take a data type that can't exactly represent some numbers that look very
simple in decimal, but that appears to represent them exactly when you print
them out, but then compares not equal when you try to actually compare them.
No wonder people get confused!

~~~
Geee
It's easier to explain that decimal 0.1 in binary is
0.000110011001100110011001100110011001100110011001100110011001100110011001100...
It's impossible to present accurately with any combination of
significand*2^exponent in a finite-memory binary system. Just like 1/3 in
decimal system can't be written down accurately.

~~~
haberman
Sure, but it's not obvious what numbers can be represented exactly and which
can't, and since the numbers get rounded on output to _look_ exact, you can
easily be misled into thinking that 0.1 _can_ be represented.

~~~
nova
Any rational number can be represented as a fraction p/q, where p and q are
integers with no common factors (reduced fraction).

If that fraction is representable as an exact "decimal" number in base b that
means that p/q = m/b^n = m * b^(-n), where m and n are integers too. For
example, 3/4 = 75/10^2 or 0.75 in decimal, 3/2^2 or 0.11 in binary.

That means p * b^n = m * q. We said p and q have no common factors, so all of
q's factors go into b^n. That means that q divides b^n, or in other words:

p/q is representable as an exact "decimal" number in base b if and only if q
divides some power of b.

For example, 0.1 is 1/10. But there is no power of 2 which is divisible by 10,
so 0.1 is not exactly representable in binary.

As another example, 1/3=0.33333.... because there is no power of 10 divisible
by 3.

------
WalterBright
The D programming language takes the (so far as I know) unique approach that
the implementation is allowed to evaluate expressions in the highest precision
it wants to - the different floating point data types only represent storage
formats.

The idea is that algorithms should be designed with a minimum required
precision, but no maximum, and should not fail if higher precision is used.

I think this rule serves very well.

~~~
Avshalom
That seems a bit terrible. Now it's D so it's not like it's a real problem
but: if I went from one "standards compliant" implementation to another and
suddenly my numerical code started being shit because it turns out my first
compiler was silently more precise... there would be much wailing and gnashing
of teeth.

~~~
WalterBright
I have yet to see any algorithm, outside of a test suite, that gave worse
results when precision was increased.

~~~
Someone
That 'other standards compliant compiler' need not be the one with higher
precision.

Also, it might have higher precision in most cases but just not in that case
where your code needs it most.

Real-world examples are x86 compilers that use the 80-bit floating point
registers for intermediates, but flush them to 64-bit results (sounds stupid,
but it is not; flushing typically is done to the bytes that hold the result
variable). There, the register allocation algorithm used by the compiler can
be the determiner of which operation sequences get done in 80 and which in 64
bits.

gcc: <http://gcc.gnu.org/wiki/FloatingPointMath> (the 'Note on x86 and m68080
floating-point math' section),
[http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Disappointments....](http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Disappointments.html)
( _"On 68000 and x86 systems [...] you can find that a floating point value
which is not a NaN is not equal to itself"_ )

Microsoft's compilers: <http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx>

------
jordanb
This article seems to appear every so often on HN and Reddit. It used to be at
sun.com. I'm glad Oracle didn't 404 it.

The title isn't accurate though. The article is mostly about using IEEE 754 to
manage computational error, which is a pretty esoteric subject.

Most programmers spend precious little time using floating point and don't
need to worry about computational error. Even among those who use it a great
deal -- those doing numerical analysis -- computational error doesn't seem to
be something they concern themselves with. My guess is that in most instances
it's swamped by experimental error.

My thoughts on this article are:

1: It's certainly good that it's available online. It'd be better with a
different title.

2: It's good that modern CPUs support IEEE 754 for the rare instance that
people use it.

3: It's worth reading the first third or so of it, just to get an idea of what
computational error is

4: The most important thing the "typical" programmer needs to know is that
dollar values shouldn't be represented in floating point. Ideally they
shouldn't be represented as integers either because the integer rounding
truncates towards zero, which is not how an accountant would round.

~~~
fleitz
It's perfectly possible to generate accountant rounding using integers. If the
accuracy required is one cent then the system must merely multiply by 10
before calculating and then use dutch round to the nearest 10 before dividing
by 10. I'm pretty sure you could do it with quarter cent accuracy rather than
10ths of a cent.

You just shouldn't calculate with 1 cent accuracy if the rounding methodology
is important to your business, but truth be told most businesses have a lot
more to optimize than the rounding methodology used.

~~~
daniel-cussen
>> but truth be told most businesses have a lot more to optimize than the
rounding methodology used.

Perfectly true. I can see exceptions being managing other people's money--
banks, accountants, and the IRS had damn well better get this right.

------
experiment0
I think the latex typeset pdf is much nicer to read.

<http://docs.oracle.com/cd/E19957-01/800-7895/800-7895.pdf>

------
taylodl
The big thing I remember from my Numerical Computing course in college was
floating point arithmetic is not commutative. Totally. Rocked. My world. You
have to work hard to retain your significant digits throughout the
computation.

~~~
jacobparker
Floating point addition and multiplication are commutative, but are not
necessarily associative.

~~~
taylodl
You are correct. I should have been more clear. Our expectation for addition
to be associative is deeply ingrained and it's shocking to discover that's not
the case when performing floating point computations.

------
danbruc
[http://www.hnsearch.com/search#request/submissions&q=Wha...](http://www.hnsearch.com/search#request/submissions&q=What+Every+Computer+Scientist+Should+Know+About+Floating-
Point+Arithmetic&sortby=create_ts+desc)

------
jpollock
After spending a week ripping out someone's poorly thought out floating point
code, the lesson I learned is that you just shouldn't use floating point
unless you really need to.

For example, if you're working with money, you don't need floating point.

~~~
smegel
Not that big a deal. Most floating point numbers are imprecise rounded values
to the degree of precision that is being used. An imprecise binary
representation of an already imprecise rounded value at the edge of precision
really doesn't matter - and if it does the problem is that you probably needed
more precision in the first place.

------
jacques_chester
What Every Web Programmer Should Know About Floating Point:

0\. Do not calculate monetary amounts in FP. Businesses don't react well to
being told about acceptable imprecision.

1\. Javascript is floating point.

2\. Do not calculate money amounts in Javascript.

There are decimal libraries for Javascript (eg [1][2]). Use those if you need
to calculate monetary amounts.

[1] <https://github.com/whatgoodisaroad/Big-js>

[2] <https://github.com/dtrebbien/BigDecimal.js>

~~~
whichdan
I'm assuming that storing $129.99 as 12999 and dividing by 100 when necessary
would be preferable?

~~~
jacques_chester
The problem is that every javascript number is floating point.

Even if you think you have integers ... sometimes you don't.

------
mjcohenw
A number of years ago the project I was working had problems in Ada because
the conversion of floating point numbers from character string to internal
floating point by the Ada compiler did not always produce the same values as
runtime conversion. I had fun reconciling the two.

------
brazzy
What Every Programmer Should Know About Floating-Point Arithmetic:

<http://floating-point-gui.d>

~~~
brazzy
Should be <http://floating-point-gui.de> \- sorry for that

------
DannoHung
What every Computer Scientist should be asking about Floating-point
arithmetic: Why aren't there types for fixed-point/rational arithmetic exposed
by your language and/or standard library?

~~~
rwmj
Would also love 128 bit ints to be commonly available. And I even have a use
for them: representing the size in bytes of some extremely large storage
configurations (sparse ones -- as yet a fully allocated > 2^64 byte store
would cost several billions).

