
7 Bits Are Not Enough for 2-Digit Accuracy - pascal_cuoq
http://www.exploringbinary.com/7-bits-are-not-enough-for-2-digit-accuracy/
======
ChuckMcM
This is something I don't think folks really appreciate until they run into it
in their own code. Back in the early days of microcomputers when everyone was
rolling their own floating point code, some packages would very slowly diverge
as things went back and forth between decimal and binary. In particular was an
example of a BASIC program that was printing out a series of numbers for an
accounting program to paper tape, and then later reading them back in when
additional data was added. That round trip ended up in variations of a couple
of pennies every trip through the system.

------
amalcon
IEEE 754 Floating point numbers should only be used where you can tolerate
rounding errors. Even if you have enough precision to do base conversions,
actually doing math will frequently introduce rounding errors.

You can either be very careful to avoid the many cases where this can happen,
choose a different number format (fixed-point, decimal floating point,
numerator-denominator, arbitrary precision mantissa binary floating point,
which one depends on your problem domain), or use units that let you work in
integers (equivalent to fixed-point, but easier to explain).

------
trjordan
"7 Bits Are Not Enough for 2-Digit Accuracy If You're Using a Naive Encoding."

Information theory says you can fit 100 values into 7 bits.

If you're not going to use the right encoding, I can also come up with
encodings for which 100 bits isn't enough to fit 100 values.

~~~
vidarh
The article is _specifically_ about round-tripping between base-10 and base-2
floating point because the precision of floating point is a very real issue if
you're doing conversions. This is e.g. one of the reasons why decimal fixed
point is preferable for many types of operations.

Fair enough that the title leaves it ambiguous, but the ambiguity is resolved
in the first sentence of the post, and further clarified in a lot of detail
over the next paragraph and a half.

~~~
trjordan
Sure. I'm mad at the headline.

The whole setup is a problem that shouldn't exist, because you're using 7-bit
floats for an inappropriate range. When are you going to run into 7-bit floats
as the default? If you're not thinking about it, I suppose this could happen
if you scale it up, but it's less surprising to say
"4862057400958023983457609283094859 can't be represented exactly in floats".

------
logicallee
It's a pretty click-baity title, it should be hedged to 'native floating-
point' or something.

In the second section the author _clearly_ shoots down the title, while again
choosing to repeat that it 'isn't enough', having again as a subheading:

>Seven Bits Looks Like Enough But It Isn’t

When that section clearly demonstrates that it _would_ be enough.

This statement really should be hedged.. You can just straight-up do binary-
encoded decimal and change every pair of decimal digits into 7 bits and back
again (bin 0000000 for decimal 00 through bin 1100011 for decimal 99, see
reference [1] for chart), end of story and never suffer any degradation.

I realize this isn't the scheme that is _chosen_ , but that is a choice. Both
the title and the quoted sub-heading imply otherwise; i.e. that it isn't
enough bits. As shown above clearly it is.

[[[

 _Aside: in this scheme you EVEN have some kind of 28% larger space you can
make use of somehow for checksumming or the like!

Naive example: when encoding any value that happens to contain digits smaller
than 28 you can also encode parity information:

when encoding, if the two digits you're encoding happen to be digits 00-27 you
ALSO encode the checksum value 1 by choosing to add decimal 100 (to arrive at
bin 1100100 through 1111111 i.e. decimal 100 through decimal 127) or checksum
value 0 by passing on doing so (i.e. keep bin 0000000 through 0011011 i.e.
decimal 00 through 27).

When decoding, when you decode and encounter decimal 100 through decimal 127,
subtract 100 from it before recording, and also record a checksum 1; when
encountering decimal 00 through decimal 27 record it but also record a
checksum value 0.

This means that in 28% of cases you gain 1 bit of checksumming information
while perfectly encoding two binary digits. (If your stream doesn't happen to
contain any decimal digits 00 through 27 then you just don't gain any
checksumming information.)

You can use these checksum bits however you like! You can't be guaranteed to
have them, but you have a good chance of having them, starting with a 27%
chance of having a parity bit in the case of even a single pair of digits!

I'm not an expert on checksums, and the above is just a naive example, but the
point is this: not ONLY is 7 bits enough to encode 2 decimal digits: it's even
enough to add a parity bit in 27% of input cases._

]]]

[1] [http://ascii.cl/conversion.htm](http://ascii.cl/conversion.htm)

~~~
icodestuff
You're reading something into the title that isn't there. The author says "7
bits aren't enough for 2-digit accuracy" and you're reading it as "7 bits
aren't enough to represent 2-digit numbers." The post is about converting base
10 floating point numbers (with 2 digits accuracy) to base 2 floating point
numbers and back, not representation of integers.

~~~
wongarsu
I don't think he's reading too much into the title. He equally well showed
that it's possible to encode 2 base-10 digits of accuracy in base-2 fixed
point numbers and back. Nowhere does the title mention floating point numbers,
which is just one arbitrary (but popular) format among a number of
possibilities to encode numbers with chosen accuracy.

~~~
icodestuff
I find myself completely unable to care that "floating point" is not in the
title, particularly since it _is_ in the very first sentence.

~~~
logicallee
fair enough; maybe it's too far outside my area of expertise. OTOH I never
expect round-trip conversions to be perfect and couldn't care less whether
there are small errors, and would use a different representation if I did.

------
spb
Unless you exclusively deal with remainders divisible by powers of 2, any
digits to the right of the decimal point should be seen as an approximation.
Don't confuse the way it's displayed with the way it's considered - your code
should keep its numbers in terms of the smallest unit you need integrity at.
(In other words, calculate your billing in floating _cents_ , not dollars.)

And, if all your remainders are divisible by tens, you should be using
[http://en.wikipedia.org/wiki/Decimal_floating_point](http://en.wikipedia.org/wiki/Decimal_floating_point)

~~~
marcosdumay
Why do you discriminate against the numbers on the left of the decimal point?
They are an approximation too.

They are precise in fixed point representations, but floating point has a
single bit as the integer part of the mantissa (and even this one can gather
error on some sequences of operations).

~~~
Dylan16807
I think you're getting something mixed up. Until you get to numbers so big
that n+1 == n, the numbers on the left of the decimal point are represented
exactly. They have the exact same precision as fixed point.

Ignore how the mantissa looks by itself. Once the exponent gets involved,
everything that influences the left of the decimal point is an integer. That's
why you use the same base for the mantissa and the exponent. The exponent
losslessly shifts part of the mantissa into the integer range. Any particular
floating point number is exactly equivalent to a fixed point number.

~~~
marcosdumay
My point is that differentiating the integral and fractional parts of a
floating point makes no sense. Some operations lose a constant amount of
precision, others lose more precision the more different the numbers are. None
of them act differently on the integral and fractional parts of the number.

Now, of course, there's a fixed point representation for each possible
floating point number. But operations on fixed point numbers act differently
on the integral and fractional parts of the numbers (in part because it's
impossible to have a pair of fixed point numbers of the same type, but so
different that n+1==n).

~~~
Dylan16807
>But operations on fixed point numbers act differently on the integral and
fractional parts of the numbers

Not really. Not in any way different from floating point. Can you explain this
point? The efficient way of doing fixed point is to store it as a single
scaled number, which is also how floating point stores the mantissa. You could
split it into two numbers, but you could also split a mantissa into multiple
numbers if you really wanted, it doesn't really change the math.

>(in part because it's impossible to have a pair of fixed point numbers of the
same type, but so different that n+1==n).

I'd disagree there. You can have fixed point denominated in 100s as easily as
you can have it denominated in .01s.

>Some operations lose a constant amount of precision, others lose more
precision the more different the numbers are. None of them act differently on
the integral and fractional parts of the number.

Precision is only 'lost' compared to other floating point operations. A long
double won't _ever_ have less precision than a 32.32 fixed point number,
unless you're using numbers that wouldn't fit in 32.32 in the first place.

In no way do you have "a single bit" of useful precision. Anything that can
corrupt the upper bits of a floating point number can corrupt the upper bits
of a fixed point number.

------
bsder
Sigh, you know David Goldberg wrote a really nice paper about this ...

... in 1991 ...

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6768)
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.6768&rep=rep1&type=pdf)

It is a real shame nobody ever seems to read it.

~~~
RickRegan
I wanted to go back to the source, or at least as far as I could go (I didn't
look to see if any of Von Neumann's papers covered it :) ). I know D.
Goldberg's appendix covers this (pages 218-219), for the binary to decimal
direction, for single-precision floating-point. But I wanted a simple example
as well, and I. Goldberg's paper had that.

------
legulere
It's quite interesting how much of our data is decimal floating point (any
ascii representation like in css, html, json, javascript, svg, when we enter
data somewhere), yet we convert it to binary floating point back and forth all
the time.

------
rurban
I never knew how many digits exactly could be preserved in a double. Now I
know: 15 or 17, depending where you start.

~~~
jacquesm
It's highly dependent on the base of the digits!

------
coin
Should be precision and not accuracy

~~~
csours
I believe the point is accuracy after conversion round trip from integer to
floating point and back again.

A better term might be fidelity.

