Hacker News new | past | comments | ask | show | jobs | submit login
Floating Point in the Browser, Part 1: Impossible Expectations (randomascii.wordpress.com)
64 points by nikbackm on Sept 29, 2020 | hide | past | favorite | 64 comments



I've mentioned it before, but JSON Numbers are arbitrary-precision floating-point. They can't be naively stored as JavaScript numbers in all cases, since JS numbers are IEEE754 double-precision floating point. Likewise, you can't store arbitrary JS number values in JSON Numbers, since it doesn't handle IEEE754 exception values (qNaN, sNaN, ±INF).

So if you're trying to publish data from a GPS chip that reports fix failure by setting the latitude and longitude to a NaN (possibly with a failure code in the exception value) you can handle that in JS but only through some intermediate representation in JSON.


Honestly I think it's the biggest misfeature of JSON. Arbitrary precision numbers should have been handled differently & I've repeatedly seen JSON parsers incorrectly handle those. We're now in a situation where you can't serialize numbers you encounter daily into JSON and you can't trust that numbers you correctly serialized into JSON will be properly deserialized by the peer.

I spent a lot of time writing the number handling code in PBNJSON for WebOS back in the day (IIRC there were 0 C/C++ parsers that offered correct number parsing at the time). It also makes sure to properly report overflow/underflow/failure to convert whenever you encounter situations that aren't possible (e.g. reading 32-bit number of a 64-bit int JSON, reading 64-bit integer for doubles that are out of range, etc). I'm sure I missed some corner cases though as it's insanely difficult to get every single corner case (especially with floating point).


This is true in theory but not in practice. Most implementations don’t want to pay the cost of converting everything to bignums (or the convenience (and possibly speed) cost of leaving the numbers as strings). Generally the json-parsing is separate from the rest of the deserialisation so there’s no great way for the parser to know when it sees a number if it should convert to float or int or do something clever.

As soon as you have something in your pipeline that converts to floats, it is poisoned. Lots of implementations you may use do this conversation though (e.g. basically anything JavaScript, jq) so it is pretty easy to get float poisoning.

The actual solution to representing big bums in practice is to wrap them in double quotes. It only costs two bytes (and if you care about those bytes then you should probably not be using json.


> It only costs two bytes

And having to have special code to handle getting it as a string instead of a number.


But that would be the case anyway because we’re talking about ways to handle numbers that don’t fit in floats. If your JSON has to be put through a black-box pipeline where you don’t know if it will be reencoded at some point by a lossy parser then the only safe thing to do is pass as a string.


> I've mentioned it before, but JSON Numbers are arbitrary-precision floating-point. They can't be naively stored as JavaScript numbers in all cases, since JS numbers are IEEE754 double-precision floating point.

Well, RFC 8259 says "This specification allows implementations to set limits on the range and precision of numbers accepted." and goes on to explain interoperability with IEEE doubles.


Which implementation though? If you're only talking to yourself - you don't really need to (de)serialize to/from json - in many (most) use cases the json serializer will be different from the deserializer, and the data structure will be in a different memory model and/or runtime... And I don't think there's a standard way to tag a json document hinting that 32bits is enough, or here comes a 128 bit float?


Well, that's the point, isn't it. The RFC basically says "while the JSON grammar allows for arbitrary-precision numbers, don't expect anything larger or more precise than a double, unless you know the recipient uses a more capable implementation". (In other words, JSON numbers are _not_ arbitrary-precision in practice.)


Anyone who's ever worked on programming language implementation will recognise that you will for all time get a steady stream of occasional bug reports that your floating point numbers aren't arbitrarily precise.

Some language implementation bug trackers have a warning not to submit bugs for it front and centre because they get it so often.


In JavaScript I assume it’s worse because most people would assume the existence of some sort of integer type but the number silently “becomes a floating point” instead, which can be confusing.


I don't use JavaScript but JavaScript using binary floating point always makes me shake my head in disbelief. You got a language designed from the start to deal with user input that can't represent decimal numbers and fractions.


JSON Numbers are arbitrary precision floating point. JavaScript numbers aren't necessarily numeric, they can be NaN or INF. That disconnect isn't a bug, but can cause problems if you naively convert from one to the other without handling the edge cases.


The problem is we’re now at the point where we have mountains of code that doesn’t handle the edge cases and very little that can be done outside of being super conservative about what send an unknown JSON parser.


What’s also interesting is how the “minimum representation needed to roundtrip a number” is actually done. There is recent (2010) research into doing this efficiently using an algorithm called Grisu: https://dl.acm.org/doi/10.1145/1809028.1806623. Many programming language implementations use it now for this purpose.


Actually, there's even more recent research into doing this really efficiently, using an algorithm called Ryū.

[e] https://github.com/ulfjack/ryu

[-4] https://www.youtube.com/watch?v=kw-U6smcLzk

[1.73] https://www.youtube.com/watch?v=4P_kbF0EbZM


The f18 Fortran compiler has a novel algorithm (calculating in base 10^16) for conversions that is fast and doesn't require big look-up tables.

https://github.com/llvm/llvm-project/tree/master/flang/lib/D...


The author has many blog posts on floating point math, and they're all very interesting. If you're into the topic, you can check the tagged entries:

https://randomascii.wordpress.com/category/floating-point/

In general the whole blog is super interesting and many times submitted to HN:

https://news.ycombinator.com/from?site=randomascii.wordpress...


TIL JavaScipt doesn't have an integer type and uses double under the hood to store integers. Yet another footgun to add to the massive stack.


JavaScript also has newish BigInt support: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe.... It even has syntax: 42n.


Given that it's a dynamic scripting language, I think double was the best choice they could have made. It gives serious number crunching performance, an integer range up to 2^53, and, of course, double precision floating point math. The only better choice they could have made would have been to make it static/strongly typed with various numeric types.


Not quite true. They do use an integer data representation if the number is an integer, it just silently changes to a float if its necessary. So `let x = 2;` the type would be 'number' but the data would just be an integer.


That’s an implementation detail.

JavaScript, the language, doesn’t (AFAIK; I don’t keep track of this rapidly moving target) have integers (but does have integer arrays, nowadays. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...)

If the type were integer,

  let x = 2
  let y = x / 8
would have y = 0, but its value will be 0.25.


That's perfectly reasonable; there's nothing to say that `/` has type `int -> int -> int` even in a strongly-typed language. I wouldn't bat an eyelid to discover that `/` had type `int -> int -> float` in any language. The problem is when you discover that `+` does not have type `int -> int -> int` (e.g. when you try and increment a very large integer and floating point stops being able to represent it).


> I wouldn't bat an eyelid to discover that `/` had type `int -> int -> float`

I'd bat both eyelids, that's the kind of surprising design decision that would turn me away from a language. To my knowledge, no major programming language does this, and there's wisdom in the principle of least surprise.

edit: I am of course mistaken, Python does exactly this. Corrected by chrisseaton again!


> I'd bat both eyelids, that's the kind of surprising design decision that would turn me away from a language. To my knowledge, no major programming language does this, and there's wisdom in the principle of least surprise.

Lol prepare to bat your eyelids.

    % python
    Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
    [Clang 11.0.3 (clang-1103.0.32.62)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 1 / 2
    0.5


I'll bat my eyelids, but only after I kick myself ;-P

If you're up for a game of No True Scotsman, can you name me a statically typed language that does the same?


Julia? There's an integer division operator, but the default is... More complex:

https://docs.julialang.org/en/v1/manual/integers-and-floatin...


Have I gone mad? (EDIT: it appears I have.)

```

GHCi, version 8.6.5: http://www.haskell.org/ghc/ :? for help

Prelude> 1 / 2

0.5

```

This is because 1 and 2 are polymorphic and get instantiated to floats here. Witchcraft.


I think you're still right though. I know Haskell has an unusual approach to numbers, but it looks like the resultant type is indeed different from the type of the operands. I got this from typeOf (/):

    Note: there are several potential instances:
    instance RealFloat a => Fractional (Complex a)
    -- Defined in ‘Data.Complex’
    instance HasResolution a => Fractional (Data.Fixed.Fixed a)
    -- Defined in ‘Data.Fixed’
    instance Integral a => Fractional (Ratio a)
    -- Defined in ‘GHC.Real’
    ...plus two others


There are many instances for Fractional, but the resultant type of the (/) operator is always the same as the operands:

    GHCI> :i (/)
    class Num a => Fractional a where
      (/) :: a -> a -> a
      ...
            -- Defined in ‘GHC.Real’
    infixl 7 /
The key is that integer types aren't Fractional, so the Haskell 2010 defaulting rules[1] cause the literals to be interpreted as Double. With an appropriate default declaration GHC could have picked Ratio Integer, Complex Double, etc. instead, but without one Integer and Double are the only options and Integer wouldn't satisfy the constraints.

[1] https://www.haskell.org/onlinereport/haskell2010/haskellch4....



Interesting, thanks. I see Nim uses '/%' for integer division. [0] I can't say I like that decision.

[0] https://nim-lang.org/docs/manual.html#types-preminusdefined-...


Unsigned integer division. In practice I haven’t seen this operator used anywhere yet.


Visual basic


I don't think so. According to this, in Visual Basic, the '/' operator isn't defined for integer types.

https://docs.microsoft.com/en-us/dotnet/visual-basic/languag...


Integers are right there in the table on the page you're linking to.

If you don't believe me, perhaps you'll believe output from the machine.

https://tio.run/##K0vSTc4vSv3/3zc/pTQnVcEjNScnn4szuDRJwTcxM0...

* edited for guideline compliance


Looks like you're right. In my defense the page I linked states that The / operator is defined only for the Decimal, Single, and Double data types. I suppose it accepts integer types by means of 'widening'?

> Did you read it?

The HN guidelines ask that you don't do that.


I've edited my comment for guideline compliance.

I don't think it settles anything, but a trivial function that divides two integers and returns the result compiles to this:

  IL_0000:  ldarg.1     
  IL_0001:  conv.r8     
  IL_0002:  ldarg.2     
  IL_0003:  conv.r8     
  IL_0004:  div         
  IL_0005:  ret        

conv.r8 is an explicit conversion to double precision float in the target language, MSIL.


I was already convinced by your earlier tio.run example, but I'm still a bit baffled by the page saying The / operator is defined only for the Decimal, Single, and Double data types.


Python2 did it the other way, wasnt it?


(/ 1 2) #=> 1/2 Clojure as well, though it's a rational not a float.


Interesting, thanks


Double represents consecutive integers up to 2^53 - that's even larger than Java's int so it's more than fine for almost all purposes.


I can't help but compare JS to python though. For every footgun in JS, the same footgun does not exist in python including this one.


I agree Python is the better language, but they have quite different origin stories. JavaScript was thrown together in very short order, and has gradually evolved since then. It was a good enough first-to-market scripting language for the browser, got standardised, and has been the client-side scripting language of the web ever since. It's evolved somewhat since its early days but is backward compatible.

If I understand correctly, Python had a more considered initial design, and had a clean-break overhaul in Python3, allowing it to evolve without accumulating as much cruft and inconsistency. (Python is famous for its philosophy of there should be one—and preferably only one—obvious way to do it.) People use Python by choice: unlike JavaScript where browsers support no other scripting languages, Python won out over its rivals in fair contest.

All things considered, JavaScript isn't that bad. Used correctly it can be elegant and powerful. It has some nasty dark corners, but it generally provides all you need to avoid them, if you know your way around. With modern JIT engines it can even be pretty fast. Its curious use of doubles is just the kind of thing modern JIT engines are able to be smart about, compiling down to use integer operations where possible.


Interesting - I view it the opposite way. In JavaScript, you need to know the details of exactly one numeric type: the regular, standard IEEE 754 float that's used by Java, Python, C++ (generally), and every other programming language under the sun. That's it! There is only one way to do it ;)

These days, if you need huge integers, you can use BigInt, but if you want to use it in an expression that also contains floating point values, then you need to explicitly convert it. Explicit is better than implicit.


I just find it counter intuitive, that's all. The expectation is that high level scripting languages have BigInt enabled by default (since that's what Ruby/Python do). Not so with JS.

The expectation is that sorting an array of integers in place:

    [1,3,2,4,5,7,6,8,9,10].sort() 
produces an array with the numbers in ascending order (since that's what Ruby/Python do). Not so with JS (try it out).

A agree there is a perfectly logical reason for every quirky behavior, but I've been surprised (in a bad way) so many times by JS's counter-intuitive behavior that I now have a mental collection of gotchas where JS deviates from expectation.


Consider this:

JavaScript actually has a formal specification. Ruby and Python have no formal specification. I can tell you (with reference to the specification) exactly what a JavaScript program does. JavaScript may seem wacky on the surface, but actually it's all worked out. Nobody can tell you what a Ruby or Python program does.

And you think JavaScript is the footgun?


Well yes, in theory if all the developers read JS's formal specification JS would be better. But in practice developers just wing it without reading anything, and only go to read things when getting an unexpected result. And so we get the follow outcome - python behaves as expected in more cases than JS behaves as expected.

"Theory is great in theory, but in practice, practice is better"


Array.sort converts to UTF-16 code units, or some such weirdness, by default. If you are sorting anything other than strings, you will want to pass in a comparison function.

[1,3,2,4,5,7,8,9,10].sort((a,b) => a - b) gives you the behavior one would expect.

I suppose you already know that, though. Agreed there are lots of quirks.


I mean, if you define "whatever Ruby and Python do" to be intuitive, then, sure, every other language is going to have "a long list of weird footguns".

JS has many quirks, but I like that the numeric representation is extremely simple and explicit (even compared to Python).


That's a problem with the specification of sort, not with javascript using double by default. But yeah, that sort example is seriously bonkers.


Except it also has bitwise operators which is definitely weird.


And those bitwise operators are defined to convert the number into a 32-bit signed integer as part of the process. Which is why foo | 0 is a way of converting a number to an integer (as long as it's not too large ...).


Python just has its own footguns, like mutable default arguments or how lambdas close over variables in comprehensions.


While I think everyone can agree this is just yet another terrible design decision in JavaScript, at least there are libraries to give you this, and it looks like BigInt will eventually be native to JS:

https://github.com/peterolson/BigInteger.js/


> While I think everyone can agree this is just yet another terrible design decision in JavaScript

I disagree. Double was the best choice they could have made for a dynamically typed language.


I really really wish these things were errors. Data loss should require an explicit opt-in. It's easy to point to the spec and say that people should obviously know it back-and-forward but it's extremely surprising that JSON decoding is a lossy operation and you have reach not for an option flag but a whole 3rd party library to not get that behavior.

JSON's arbitrary precision decimals is the silliest footgun. When every implementation implicitly defines its own valid format by way of what native types it maps it makes for a really awkward way to pass messages between two black-box systems.


I feel like a lot of discussion is on floating point accuracy and artifacts and most of it misses the point completely.

It doesn't help there are IEEE standards for fp calculations which are also very misunderstood. The point of these standards is to provide reproducibility and repeatability so that different types of software running on different machines or with different compilers provide same results.

The goal of floating point never was to provide precise representation or calculation and in fact it is not possible to provide in general with floating point. The goal was to make it easy and relatively fast to perform real world calculations.

You understand your calculator gives approximate results except for very special cases, so why would you bicker over some digits that are very far to the right of your result.

It is actually much worse with calculators because they tend to employ very different calculation engines and very different precision. You can actually identify a particular model or model family of calculator by running couple calculations and finding what kind of artifacts are present in the results, exactly.

Yet nobody will tell you calculators are useless as I frequently hear about floating point.

I work with Java and there seems to be this movement recently where developers use arbitrary precision arithmetic to do any sort of calculation on anything without actual idea of when this is needed. Supposedly flating point is broken and cannot be trusted. Sigh...


You bicker over the digits far to the right because there tends to be ways for those inaccuracies to propagate up into your significant figures. That doesn't happen from typing a few arithmetic operations into a handheld calculator. It does in a computational environment that might be looping over and summing a million or a billion elements and applying more operations to them. It became too easy and fast to use floats for real world calculations, overstressing and exposing their inadequacies.

Of course, the right solution is to use the right tool for the job: algorithms that either use a better structure than a 64-bit float, or are smartly designed enough to avoid their inadequacies. But how many developers have both enough insight and enough development time to do that? Everything is a compromise; fixed-point decimal representation is one that works for many applications.


There's a huge difference between mathematical calculations having round-off error and computations that are not mathematical calculations exhibiting something akin to round-off error anyway.

It's like giving someone a $10k check and seeing them get $9,999.99, and then getting irritated when people try to investigate what happened because it's only a penny to you. It's kind of missing the point that the error logically shouldn't be there in this situation.


I avoid FP as much as possible precisely because of these nondeterministic (at least in the practical sense) results. I use integers almost exclusively and if I need fixed point I have view code that renders things correctly (such as for currency). When Erlang didn't have a power function that didn't use floating-point (and thus resulted in rounding errors), I wrote my own: https://github.com/pmarreck/elixir-snippets/blob/master/math...


In this case you're just trading for overflow.


And now you know why the Twitter API has all IDs as both numbers and strings.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: