Hacker News new | past | comments | ask | show | jobs | submit login

It also creates a lot of misconceptions of how accurate floating point math is.

An interesting aspect of decimal formatting is how it frequently masks representation error (i.e encoding of 0.1 etc), because the error is symmetrical in the formatter/parser it makes such non-representable fractions appear to be stored perfectly to unsuspecting users.

This can be quite deceptive, if more users were aware of just how many of the simple rational decimals they input were converted into imprecise representations they probably wouldn't trust computers as much as they do. To confuse things more, when operating upon periodic representations the result often matches the representation error of the the equivalent accurate decimal value encoded directly (i.e there were errors, but everything canceled out through formatting) - when they occasionally do not (e.g 0.1 + 0.2) it makes the problem appear all the more elusive.

I think this detail is often lost in explanations of 0.1 + 0.2, that is: representation error is extremely common, 0.1 + 0.2 is merely one of the cases where it both persists through the formatter AND you notice it because the inputs were short decimals, and it's so obvious that the output should be a non-periodic decimal..

TL;DR formatting floats to decimals makes us trust floating point math far more than we should - it's healthy to remember that the formatting process is necessarily imprecise and that you are merely looking at a proxy for the underlying value. Remember that next time you look at _seemingly_ non-periodic decimal output.




Or perhaps it makes us mistrust more than we should? How often are we working on problems where the difference between 0.1 and 0.100000000000000006 is of any practical importance?

When I format a float out to 5 decimal places, I'm sort of making a statement that anything beyond that doesn't matter to me.


Small differences can matter.

One that comes to mind for me is the old game Kohan: Immortal Sovereigns. There was a native Linux port for the game that could play online with the Windows machines. However, the pathfinding solution used floating point numbers to make the decisions. A very small difference in the way Linux and Windows rounded the numbers meant that an online game between a Linux and Windows host would de-synchronize after 15-30 minutes because a model that zigged on the Linux host would zag on the Windows host, and the game would detect the difference in state after a bit and kick one of the players out. There was no way to rejoin the game at this point either.

This bug was never fixed, and the company that made the port (Loki) went out of business.


> When I format a float out to 5 decimal places, I'm sort of making a statement that anything beyond that doesn't matter to me.

Yes, it's true the vast majority of the time it doesn't actually matter. However decimal formatting does such a good job at giving us the impression that these errors are merely edge-cases, and calculators automatically formatting to 10sf etc further that illusion. If people are not aware it's only an illusion (or just how far the rabbit hole goes) it can be dangerous when they go on to create or use things where that fact matters.


Fair enough.

One thing along similar lines that I've always hated to see is when the floating-point type in a programming language is called "real".


haha, yes it's a terrible name, such arrogance to suggest fitting infinite things into a very finite thing. In fact they couldn't even call it rational, because even after the precision limitation it's a subset of rationals (where representation error comes from due to base)... Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called "floating bar", this essentially does not have representation error, but it will likely hit precision errors sooner or at least in a different way (it's kinda difficult to compare directly).


Yeah, that is more what I'm on about. I can accept that a computer's `int` type is not an infinite set, even if it does cause problems at the boundaries. It feels like more of a little white lie to me.

Whereas, even between their minimum and maximum values, and even subject to their numeric precision, there are still so many rational numbers that an IEEE float can't represent. So it's not even a rational. Nor can it represent a single irrational number, thereby failing to capture one iota of what qualitatively distinguishes the real numbers. . . if Hacker News supported gifs, I'd be inserting one that features Mandy Patinkin right here.


Yeah, I think this is the single aspect that everyone finds unintuitive, everyone can understand it not having infinite precision or magnitude. It's probably a very reasonable design choice if we could just know the reasoning behind it, I assume it will be mostly about practicality of performance and implementing the operators that have to work with the encoding.


>Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called floating bar

Thanks for the read! More fun than the original article to my taste :)

Here's the link, since Googling "floating bar" nets you very different results:

http://www.iquilezles.org/www/articles/floatingbar/floatingb...


I just remember his name, writes lots of interesting stuff :)


I always wonder if he ever says "My name is Inigo Quilez. Prepare to learn!".

My favorite post by him is where he explains that you never need to have trigonometry calls in your 3D engine. Because everyone now and then you still see "educational" article in the spirit of "learn trig to spin a cube in 3D!" :/


Yeah, I've noticed in physics programming in other peoples code you can often come across the non-trig ways of doing things that avoid unit vectors and rooting etc (which are both elegant and efficient)... However i've never personally come across an explicit explanation or "tutorials" of these targeted at any level of programmer's vector proficiency. Instead i've always discovered them in code, and had to figure out how they work.

I guess the smart people writing a lot of this stuff just assume everyone will derive them as they go. That's why we need more Inigo Quilez's :D to lay it out for us mere mortals and encourage it's use more widely.


Common Lisp has had a rational number Type since the 1980s at least, presumably inherited from MACSYMA


Then you should hate integer too.


"integer" as a type is less offensive though, as it only has one intuitive deficiency compared the mathematical definition (finite range). Where as "real" as a type has many deficiencies... it simply does not contain irrational numbers, and it does not contain all rational numbers in 3 respects: range, precision and unrepresentable due to base2, and for integers it can represent a non-contiguous range!


countable vs. uncountable deficiencies...


But I can build a computer that would accept any pair of integers you sent it, add them and return the result. Granted, they'd have to be streamed over a network, in little-endian form, but unbounded integer addition is truly doable. And the restriction is implicit: you can only send it finite integers, and given finite time it will return another finite answer.

You can't say the same about most real numbers, all the ones that we postulate must exist but that have infinite complexity. You can't ever construct a single one of them.


I'm not sure what your point is. It's impossible to ask the computer to store those numbers, so why does it matter whether they can be stored?

Or in other words: It can store all the non-computable numbers you give it. That should suffice.


The way I see it, 'int' as a truncation of 'integer' nicely symbolizes the way the type truncates integers to a fixed number of bits.

Of course any self-respecting modern language should also provide arbitrary-precision integers at least as an option.


It is not a matter of modernity, but a matter of use-case. You don't need arbitrary-length integers for a coffee machine or a fridge. You don't even need FP to handle e.g. temperatures; fixed-point is often more than enough. So if you are making some sort of "portable assembler" for IOT devices, you can safely stick with simple integers.


> I think that illusion can be a bit dangerous when we create things or use things based on that incorrect assumption.

I'd be curious to hear some of the problems programmers have run into from this conceptual discrepancy. We've got probably billions of running instances of web frameworks build atop double precision IEEE 754 to choose from. Are there any obvious examples you know?


Operations that you think of being associative are not. A simple example is adding small and large numbers together. If you add the small numbers together and then the large one (e.g. sum from smallest to largest) the small parts are better represented in the sum than sum from largest to smallest. Could happen if you have a series of small interest payments and are applying it to the starting principal.


I've worked with large datasets, aggregating millions of numbers, summing, dividing, averaging, etc... and I have tested the orders of operations, trying to force some accumulated error, and I've actually never been able to show any difference in the realm of 6-8 significant digits I looked at.


Here's a common problem that shows up in implementations of games:

    class Time {
        uint32 m_CycleCount;
        float m_CyclesPerSec;
        float m_Time;
        
    public:
        Time() {
            m_CyclesPerSec = CPU_GetCyclesPerSec();
            m_CycleCount = CPU_GetCurCycleCount();
            m_Time = 0.0f;
        }

        float GetTime() { return m_Time; }

        void Update() {
            // note that this is expected to wrap
            // during the lifetime of the game --
            // modular math works correctly in that case
            // as long as Update() is called at least once
            // every 2^32-1 cycles.
            uint32 curCycleCount = CPU_GetCurCycleCount();
            float dt = (m_CycleCount - curCycleCount) / m_CyclesPerSec;
            m_CycleCount = curCycleCount;
            m_Time += dt;
        }
    };

    void GAME_MainLoop() {
       Timer t;
       while( !GAME_HasQuit() ) {
           t.Update();
           GAME_step( t.GetTime() );
       }
    }
The problem is that m_Time will become large relative to dt, the longer the game is running. Worse, as your CPU/GPU gets faster and the game's framerate rises, dt becomes smaller. So something that looks completely fine during development (where m_Time stays small and dt is large due to debug builds) turns into a literal time bomb as users play and upgrade their hardware.

At 300fps, time will literally stop advancing after the game has been running for around 8 hours, and in-game things that depend on framerate can become noticably jittery well before then.


Though inside game logic you should probably default to double, which would easily avoid this problem.


If I'm going to use a 64 bit type for time I'd probably just use int64 microseconds, have over 250,000 years of uptime before overflowing, and not have to worry about the precision changing the longer the game is active.


So using fixed point. You could do that, but you can't make every single time-using variable fixed point without a lot of unnecessary work. Without sufficient care you end up with less precision than floating point. If you don't want to spend a ton of upfront time on carefully optimizing every variable just to avoid wasting 10 exponent bits, default to double.


> in the realm of 6-8 significant digits I looked at

That is far inside the range of 64bit double precision. For error to propagate up to that range of significance depends on the math, but i doubt the aggregation you are describing would cause it... provided nothing silly happens to subtotals like intermediate rounding to precision (you'd be surprised).

Something like compounding as the parent was describing are far more prone to significant error propagation.


I've seen rounding errors on something as simple as adding the taxes calculated per line item vs calculating a tax on a subtotal. This was flagged as incorrect downstream where taxes were calculated the other way.

In a real-life transaction where pennies are not exchanged this could mean a difference of a nickel on a $20 purchase which isn't a meaningful difference but certainly not insignificant.


How much was the difference? Was there any rounding involved at any step? When dealing with money, I see rounding and integer math all the time. As another comment has mentioned, within 53 bits of mantissa, the number range is so big, we are talking 16 digits. I''d be curious to see a real-world example where the float math is the source of error, as opposed to some other bug.


It doesn't take much imagination to construct an example. Knowing 0.1 isn't exactly represented make a formula with it that should be exactly a half cent. Depending on fp I precision it will be slightly above or below a half cent and rounding will not work as expected. We found this in prod at a rate of hundreds to thousands of times per day it only takes volume to surface unlikely errors.


I thought I was clear that the result was rounded. The fp imprecision affects the correctness of rounding e.g. half to nearest even.


It is the rounding that seems to be an issue here.


    x = 1e-20 + 1e20 - 1e20
    y = 1e20 - 1e20 + 1e-20
    assert x == y


That's like the first rule of floating point numbers though: don't do that


The person I replied to claims to have looked at large* data sets and never seen a discrepancy in 6-8 significant digits. I thought I'd show them a small data set with 3 samples that retains no significant digits.

* Never mind that "millions" isn't large by current standards...


But you are observing all the digits, not just 6-8. It's implicit in the semantics of the operation, and that's something everyone who works with floating point should know.


I think I see where you're hung up. If I change it to:

    x = 1e20 + 1 - 1e20
    y = 1e20 - 1e20 + 1
    assert x == y
There is only 1 digit, and it's wrong. You don't even need 6-8. I probably should've used this as my example in the first place.


You're making the same mistake but now it's less obvious because the change of scale. When you compare floating point numbers, simple == is not usually what you want; you need to compare them with a tolerance. Choosing the tolerance can be difficult, but in general when working with small numbers you need a small value and with large numbers you need a large value. This dataset involves datapoint(s) at 1e20; at that magnitude, whatever you're measuring, the error in your measurements is going to be way more than 1, so a choice of tolerance ≤ 1 is a mistake.


Ugh, you're preaching to the choir. I wasn't trying to make a point about the equality operator, I was trying to make a point about x and y being completely different. I must be really bad at communicating with people.


That's the thing: x and y are not "completely different"; they are within the tolerance one would use to compare them correctly.


So umm, more than a "difference in the realm of 6-8 significant digits"?


That construction can turn any residual N digits out into a first digit difference. It wouldn't matter without making comparisons with tolerance way under the noise floor of the dataset. But yes, you have technically invented a situation that differs from that real-world anecdote in regard to that property, in an extremely literal interpretation.


And here I was, worried I might be tedious or pedantic, trying to argue that floating point is just not that simple. You've really outdone me in that regard.


JavaScript is precise for 2^53 range. It's unlikely that you're operating with numbers outside of that range if you're dealing with real life things, so for most practical purposes doubles are enough.


> JavaScript is precise for 2^53 range

What does this mean to you? It's very easy to get horrible rounding error with real-life sized things. For instance

    document.writeln(1.0 % 0.2);
The right answer is 0.0, and the most it can be wrong is 0.2. It's nearly as wrong as possible. These are real-life sized numbers.

btw: I think IEEE-754 is really great, but it's also important to understand your tools.


> document.writeln(1.0 % 0.2);

> The right answer is 0.0, and the most it can be wrong is 0.2. It's nearly as wrong as possible.

Just to clarify for others, you're implicitly contriving that to mean: you care about the error being positive. The numerical error in 0.1 % 0.2 is actually fairly ordinarily tiny (on the order of x10^-17), but using modulo may create sensitivity to these tiny errors by introducing discontinuity where it matters.


Call it a contrivance if you want, my point was just that you can get very surprising results even for "real life" sized numbers.

Not sure why you changed it from "1.0 % 0.2" to "0.1 % 0.2". The error on the one I showed was near 0.2, not 1e-17. Did I miss your point?


I mistakenly used 0.1 instead of 1.0 but the _numerical_ error is still x10-17, the modulo is further introducing a discontinuity that creates sensitivity to that tiny numerical error, whether that is a problem depends on what you are doing with the result... 0.19999999999999996 is very close to 0 as far as modulo is concerned.

I'm not arguing against you just clarifying the difference between propagation of error into significant numerical error through something like compounding; and being sensitive to very tiny errors by to depending on discontinuities such as those introduced by modulo.


I'm talking about integer numbers. 2^53 = 9007199254740992. You can do any arithmetic operations with any integer number from -9007199254740992 to 9007199254740992 and results will be correct. E.g. 9007199254740991 + 1 = 9007199254740992. But outside of that range there will be errors, e.g. 9007199254740992 + 1 = 9007199254740992


You are are describing only one of the types of numerical error that can occur, and it is not commonly a problem: this is only an edge case that occurs at the significand limit where the exponent alone must be used to approximate larger magnitudes at which point integers become non-contiguous.

The types of errors being discussed by others are all in the realm of non-integer rationals where limitations in either precision or representation introduce error and then compound in operations no matter the order of magnitude... and btw _real_ life tends to contain _real_ numbers, that commonly includes rationals in use of IEEE 754.


I didn't see anything in your message above indicating you were restricting yourself to integers.


Actually this is a source of many issues where a 64bit int say a DB autoinc id can't be exactly represented in a js number. Not a inreallife value but still a practical concern.


I spent a day debugging a problem this created. Without going into irrelevant domain details, we had a set of objects, each of which has numeric properties A and B. The formal specification says objects are categorized in a certain way iff the sum of A and B is less than or equal to 0.3.

The root problem, of course, was that 0.2 + 0.1 <= 0.3 isn't actually true in floating point arithmetic.

It wasn't immediately obvious where the problem was since there were a lot of moving parts in play, and it did not immediately occur to us to doubt the seemingly simple arithmetic.


I can't show you, but my job involves writing a lot of numerical grading code (as in code that grades calculated student answers in a number of different ways). I've had the pleasure of seeing many other systems pretty horrible attempts at this, both from the outside and in, in both cases numerical errors rooted in floating point math abound. To give an easy example, a common numerical property required for building formatters and graders of various aspects (scientific notation, significant figures etc), is the base 10 floored order of magnitude. The most common way of obtaining this is numerically using logarithms, but this has a number of unavoidable edge cases where it fails due to floating point error - resulting in myriad of grading errors and incorrect formatting by one sf.

These are an easy target to find issues that _matter_ because users are essentially fuzzing the inputs, so they are bound to find an error if it exists, and they will also care when they do!

When these oversights become a problem is very context sensitive. I suppose mine is quite biased.


Web frameworks usually don't need floats at all. Developers working on other things encounter these problems all the time.

https://randomascii.wordpress.com/category/floating-point/

https://stackoverflow.com/questions/46028336/inconsistent-be...


You'd be surprised. Decades ago I had to convert mortgage software that was using 32-bit floating point to fixed point arithmetic because they discovered that the cumulative errors from computing payments over 30+ year mortgages could lead to non-trivial inconsistencies.

Don't knock the impact of cumulative loss of precision.


It can in mathematical modelling


What is the relative error in 0.1? What is the relative error in 0.2? And what is the relative error in the sum? That's how one thinks seriously about floats.


That's a really succinct way of putting it. I would add a less catchy companion to highlight the affect of the formatter:

What is the relative error in 0.1? What is the relative error in 0.3? And what is the relative error in the sum? And whats the relative error in 0.4?

(i.e if they are the same it will be obscured by the formatter).


The most precise measurement I know of is the frequency of a laser-cooled Rb 87 standard 6,834,682,610.904333 which is handled with full precision by 64-bit floating point.

If you think this isn't precise enough for you, maybe you don't really understand your precision needs.


Or you aren’t doing something physical. For example there are tons of things in math that can use as much precision as you want. For a toy example, looking at rates of convergence or divergence in extremely small regions of the Mandelbrot set. There are techniques that cut down the requirement for precision for that problem, but they are necessary because the default level of precision is insufficient.


> if more users were aware of just how many of the simple rational decimals they input were converted into imprecise representations they probably wouldn't trust computers as much as they do

I disagree. People overestimate the accuracy of decimal encoding so much more than they ever overestimate floating point. Then when they see 0.1 + 0.2 they tend to learn entirely the wrong lesson, and start underestimating the accuracy of floating point alone.

> TL;DR formatting floats to decimals makes us trust floating point math far more than we should

The only reason a decimal encoding can cause too much trust is because we trust decimal too much. Decimal fails in exactly the same ways.


I wonder about the terminology of calling these "errors". That implies that there's a mistake, when really floats are among the most accurate ways to represent arbitrary real numbers in a finite number of bits.

Another way to phrase it is that a single float value represents a range of real numbers, rather than a single real number. So 0.1 stored as a double precision float really represents the range of real numbers from roughly 0.09999999999999999862 to 0.10000000000000001249.


> Another way to phrase it is that a single float value represents a range of real numbers, rather than a single real number. So 0.1 stored as a double precision float really represents the range of real numbers from roughly 0.09999999999999999862 to 0.10000000000000001249.

But the range is different for every number, it's not a constant error like precision epsilon. I think "representation error" is reasonable name, as it's used when describing the error in converting representation between base10 and base2.

> I wonder about the terminology of calling these "errors". That implies that there's a mistake, when really floats are among the most accurate ways to represent arbitrary real numbers in a finite number of bits.

If you only care about the irrational portion of real numbers maybe, but for rationals this is definitely not true, you could use a format based on fractions which unlike IEEE754 would contain no representation error compared to base 10 decimals - in fact they would even allow you to represent rationals that base10 decimals could not such as 1/3. Inigo Quilez came up with one such format "floating bar" [1]. There are advantages and disadvantages to such a format (e.g performance). But in terms of numerical error I doubt IEEE 754 is best, for representation error it is definitely not (and I say that as someone who likes the IEEE 754 design O_o).

[1] http://www.iquilezles.org/www/articles/floatingbar/floatingb...


Hm, yes, perhaps "error" is still a good word for it.

I'm not sure I understand what you mean by "holes", the idea is that all the real numbers between roughly 0.09999999999999999862 and 0.10000000000000001249 are represented by the same double precision floating point value as for 0.1.

Thinking about it in terms of ranges helps to understand why 0.1 + 0.2 goes "wrong", as the middle of the range of real numbers represented by a double precision floating point value 0.1 is slightly higher than 0.1.


Sorry I edited that away as realised it was not strictly relevant to what you meant... I was highlighting how there are values in your range that _can_ be represented exactly e.g 0.125 like I said not really relevant.


What you should be saying is not "can be represented exactly" but "can be represented identically in decimal"

There is nothing inherently less accurate about binary floating-point representation than decimal. Some binary numbers can be identical to their decimal counterparts, while others aren't. This is OK, as we should know what amount of precision is required in our answer and round to that.


> What you should be saying is not "can be represented exactly" but "can be represented identically in decimal"

No, exactly, if the decimal was non-periodic such as 0.1 it is exact, it is exactly representing 1/10, but base2 cannot represent 1/10 exactly.

> There is nothing inherently less accurate about binary floating-point representation than decimal.

Yes there is, and this is the part that most people do not intuit, this was the entire point i was making about the deceptiveness of formatting masking representation error of the fractional part in my original comment... we are not talking merely about precision, but the representation error which is dependent on the base:

       base10  base3  base2
 1/10  yes     no     no
 1/3   no      yes    no
 1/2   yes     no     yes
For the first decimal place in base 10 (i.e 0 through 9 denominators of 10) you will find only 1 out of 10 possible fractional parts can be represented exactly in IEEE 754 binary. IEEE 754 actually specifies a decimal format, not that it's ever used, but if you were to implement it you would see these discrepancies between binary and decimal using the same format at any precision by noticing a periodic significand in the encoding when the source representation is non-periodic.

This is not a deficiency of IEEE 754 per say, but the entire concept of a decimal point in any base, which makes it impossible to finitely represent all rational numbers, kind of making them pseudo irrational numbers as a side-effect of representation... the solution is to use fractions of course.


At least for me, that's the most interesting part of it. (0.125 isn't in the range I mentioned above though)


Ok this is getting a bit meta recursive, I made a mistake in the explanation of my mistake that I attempted to edit away before you replied to it. Anyway, I was talking about the range above your range, yo dawg :D...

How if you took a limited precision range of rationals in base 10, e.g 0.10000 to 0.20000 and convert them to base 2, there are "holes" of non-representable numbers in the range. These holes are of different sizes, (one of which is the range you are talking about), so I summarized it as that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: