Floating Point in the Browser, Part 1: Impossible Expectations

SAI_Peregrinus · on Sept 29, 2020

I've mentioned it before, but JSON Numbers are arbitrary-precision floating-point. They can't be naively stored as JavaScript numbers in all cases, since JS numbers are IEEE754 double-precision floating point. Likewise, you can't store arbitrary JS number values in JSON Numbers, since it doesn't handle IEEE754 exception values (qNaN, sNaN, ±INF).

So if you're trying to publish data from a GPS chip that reports fix failure by setting the latitude and longitude to a NaN (possibly with a failure code in the exception value) you can handle that in JS but only through some intermediate representation in JSON.

vlovich123 · on Sept 29, 2020

Honestly I think it's the biggest misfeature of JSON. Arbitrary precision numbers should have been handled differently & I've repeatedly seen JSON parsers incorrectly handle those. We're now in a situation where you can't serialize numbers you encounter daily into JSON and you can't trust that numbers you correctly serialized into JSON will be properly deserialized by the peer.

I spent a lot of time writing the number handling code in PBNJSON for WebOS back in the day (IIRC there were 0 C/C++ parsers that offered correct number parsing at the time). It also makes sure to properly report overflow/underflow/failure to convert whenever you encounter situations that aren't possible (e.g. reading 32-bit number of a 64-bit int JSON, reading 64-bit integer for doubles that are out of range, etc). I'm sure I missed some corner cases though as it's insanely difficult to get every single corner case (especially with floating point).

dan-robertson · on Sept 29, 2020

This is true in theory but not in practice. Most implementations don’t want to pay the cost of converting everything to bignums (or the convenience (and possibly speed) cost of leaving the numbers as strings). Generally the json-parsing is separate from the rest of the deserialisation so there’s no great way for the parser to know when it sees a number if it should convert to float or int or do something clever.

As soon as you have something in your pipeline that converts to floats, it is poisoned. Lots of implementations you may use do this conversation though (e.g. basically anything JavaScript, jq) so it is pretty easy to get float poisoning.

The actual solution to representing big bums in practice is to wrap them in double quotes. It only costs two bytes (and if you care about those bytes then you should probably not be using json.

thayne · on Sept 29, 2020

> It only costs two bytes

And having to have special code to handle getting it as a string instead of a number.

Spivak · on Sept 29, 2020

But that would be the case anyway because we’re talking about ways to handle numbers that don’t fit in floats. If your JSON has to be put through a black-box pipeline where you don’t know if it will be reencoded at some point by a lossy parser then the only safe thing to do is pass as a string.

mormegil · on Sept 29, 2020

> I've mentioned it before, but JSON Numbers are arbitrary-precision floating-point. They can't be naively stored as JavaScript numbers in all cases, since JS numbers are IEEE754 double-precision floating point.

Well, RFC 8259 says "This specification allows implementations to set limits on the range and precision of numbers accepted." and goes on to explain interoperability with IEEE doubles.

e12e · on Sept 29, 2020

Which implementation though? If you're only talking to yourself - you don't really need to (de)serialize to/from json - in many (most) use cases the json serializer will be different from the deserializer, and the data structure will be in a different memory model and/or runtime... And I don't think there's a standard way to tag a json document hinting that 32bits is enough, or here comes a 128 bit float?

mormegil · on Sept 30, 2020

Well, that's the point, isn't it. The RFC basically says "while the JSON grammar allows for arbitrary-precision numbers, don't expect anything larger or more precise than a double, unless you know the recipient uses a more capable implementation". (In other words, JSON numbers are _not_ arbitrary-precision in practice.)

chrisseaton · on Sept 29, 2020

Anyone who's ever worked on programming language implementation will recognise that you will for all time get a steady stream of occasional bug reports that your floating point numbers aren't arbitrarily precise.

Some language implementation bug trackers have a warning not to submit bugs for it front and centre because they get it so often.

saagarjha · on Sept 29, 2020

In JavaScript I assume it’s worse because most people would assume the existence of some sort of integer type but the number silently “becomes a floating point” instead, which can be confusing.

Gibbon1 · on Sept 30, 2020

I don't use JavaScript but JavaScript using binary floating point always makes me shake my head in disbelief. You got a language designed from the start to deal with user input that can't represent decimal numbers and fractions.

SAI_Peregrinus · on Sept 29, 2020

JSON Numbers are arbitrary precision floating point. JavaScript numbers aren't necessarily numeric, they can be NaN or INF. That disconnect isn't a bug, but can cause problems if you naively convert from one to the other without handling the edge cases.

Spivak · on Sept 29, 2020

The problem is we’re now at the point where we have mountains of code that doesn’t handle the edge cases and very little that can be done outside of being super conservative about what send an unknown JSON parser.

saagarjha · on Sept 29, 2020

What’s also interesting is how the “minimum representation needed to roundtrip a number” is actually done. There is recent (2010) research into doing this efficiently using an algorithm called Grisu: https://dl.acm.org/doi/10.1145/1809028.1806623. Many programming language implementations use it now for this purpose.

ygra · on Sept 29, 2020

Actually, there's even more recent research into doing this really efficiently, using an algorithm called Ryū.

[e] https://github.com/ulfjack/ryu

[-4] https://www.youtube.com/watch?v=kw-U6smcLzk

[1.73] https://www.youtube.com/watch?v=4P_kbF0EbZM

pklausler · on Sept 29, 2020

The f18 Fortran compiler has a novel algorithm (calculating in base 10^16) for conversions that is fast and doesn't require big look-up tables.

https://github.com/llvm/llvm-project/tree/master/flang/lib/D...

jakub_g · on Sept 29, 2020

The author has many blog posts on floating point math, and they're all very interesting. If you're into the topic, you can check the tagged entries:

https://randomascii.wordpress.com/category/floating-point/

In general the whole blog is super interesting and many times submitted to HN:

https://news.ycombinator.com/from?site=randomascii.wordpress...

umvi · on Sept 29, 2020

TIL JavaScipt doesn't have an integer type and uses double under the hood to store integers. Yet another footgun to add to the massive stack.

evilpie · on Sept 29, 2020

JavaScript also has newish BigInt support: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe.... It even has syntax: 42n.

mschuetz · on Sept 29, 2020

Given that it's a dynamic scripting language, I think double was the best choice they could have made. It gives serious number crunching performance, an integer range up to 2^53, and, of course, double precision floating point math. The only better choice they could have made would have been to make it static/strongly typed with various numeric types.

conradludgate · on Sept 29, 2020

Not quite true. They do use an integer data representation if the number is an integer, it just silently changes to a float if its necessary. So `let x = 2;` the type would be 'number' but the data would just be an integer.

Someone · on Sept 29, 2020

That’s an implementation detail.

JavaScript, the language, doesn’t (AFAIK; I don’t keep track of this rapidly moving target) have integers (but does have integer arrays, nowadays. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...)

If the type were integer,

  let x = 2
  let y = x / 8

would have y = 0, but its value will be 0.25.

Smaug123 · on Sept 29, 2020

That's perfectly reasonable; there's nothing to say that `/` has type `int -> int -> int` even in a strongly-typed language. I wouldn't bat an eyelid to discover that `/` had type `int -> int -> float` in any language. The problem is when you discover that `+` does not have type `int -> int -> int` (e.g. when you try and increment a very large integer and floating point stops being able to represent it).

MaxBarraclough · on Sept 29, 2020

> I wouldn't bat an eyelid to discover that `/` had type `int -> int -> float`

I'd bat both eyelids, that's the kind of surprising design decision that would turn me away from a language. To my knowledge, no major programming language does this, and there's wisdom in the principle of least surprise.

edit: I am of course mistaken, Python does exactly this. Corrected by chrisseaton again!

chrisseaton · on Sept 29, 2020

> I'd bat both eyelids, that's the kind of surprising design decision that would turn me away from a language. To my knowledge, no major programming language does this, and there's wisdom in the principle of least surprise.

Lol prepare to bat your eyelids.

    % python
    Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
    [Clang 11.0.3 (clang-1103.0.32.62)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 1 / 2
    0.5

MaxBarraclough · on Sept 29, 2020

I'll bat my eyelids, but only after I kick myself ;-P

If you're up for a game of No True Scotsman, can you name me a statically typed language that does the same?

e12e · on Sept 29, 2020

Julia? There's an integer division operator, but the default is... More complex:

https://docs.julialang.org/en/v1/manual/integers-and-floatin...

Smaug123 · on Sept 29, 2020

Have I gone mad? (EDIT: it appears I have.)

```

GHCi, version 8.6.5: http://www.haskell.org/ghc/ :? for help

Prelude> 1 / 2

0.5

```

This is because 1 and 2 are polymorphic and get instantiated to floats here. Witchcraft.

MaxBarraclough · on Sept 29, 2020

I think you're still right though. I know Haskell has an unusual approach to numbers, but it looks like the resultant type is indeed different from the type of the operands. I got this from typeOf (/):

    Note: there are several potential instances:
    instance RealFloat a => Fractional (Complex a)
    -- Defined in ‘Data.Complex’
    instance HasResolution a => Fractional (Data.Fixed.Fixed a)
    -- Defined in ‘Data.Fixed’
    instance Integral a => Fractional (Ratio a)
    -- Defined in ‘GHC.Real’
    ...plus two others

nybble41 · on Sept 30, 2020

There are many instances for Fractional, but the resultant type of the (/) operator is always the same as the operands:

    GHCI> :i (/)
    class Num a => Fractional a where
      (/) :: a -> a -> a
      ...
            -- Defined in ‘GHC.Real’
    infixl 7 /

The key is that integer types aren't Fractional, so the Haskell 2010 defaulting rules[1] cause the literals to be interpreted as Double. With an appropriate default declaration GHC could have picked Ratio Integer, Complex Double, etc. instead, but without one Integer and Double are the only options and Integer wouldn't satisfy the constraints.

[1] https://www.haskell.org/onlinereport/haskell2010/haskellch4....

1wd · on Sept 29, 2020

Nim https://play.nim-lang.org/#ix=2zbk

MaxBarraclough · on Sept 29, 2020

Interesting, thanks. I see Nim uses '/%' for integer division. [0] I can't say I like that decision.

[0] https://nim-lang.org/docs/manual.html#types-preminusdefined-...

dom96 · on Oct 1, 2020

Unsigned integer division. In practice I haven’t seen this operator used anywhere yet.

recursive · on Sept 29, 2020

Visual basic

MaxBarraclough · on Sept 29, 2020

I don't think so. According to this, in Visual Basic, the '/' operator isn't defined for integer types.

https://docs.microsoft.com/en-us/dotnet/visual-basic/languag...

recursive · on Sept 29, 2020

Integers are right there in the table on the page you're linking to.

If you don't believe me, perhaps you'll believe output from the machine.

https://tio.run/##K0vSTc4vSv3/3zc/pTQnVcEjNScnn4szuDRJwTcxM0...

* edited for guideline compliance

MaxBarraclough · on Sept 29, 2020

Looks like you're right. In my defense the page I linked states that The / operator is defined only for the Decimal, Single, and Double data types. I suppose it accepts integer types by means of 'widening'?

> Did you read it?

The HN guidelines ask that you don't do that.

recursive · on Sept 29, 2020

I've edited my comment for guideline compliance.

I don't think it settles anything, but a trivial function that divides two integers and returns the result compiles to this:

  IL_0000:  ldarg.1     
  IL_0001:  conv.r8     
  IL_0002:  ldarg.2     
  IL_0003:  conv.r8     
  IL_0004:  div         
  IL_0005:  ret

conv.r8 is an explicit conversion to double precision float in the target language, MSIL.

MaxBarraclough · on Sept 30, 2020

I was already convinced by your earlier tio.run example, but I'm still a bit baffled by the page saying The / operator is defined only for the Decimal, Single, and Double data types.

andi999 · on Sept 30, 2020

Python2 did it the other way, wasnt it?

cyberbanjo · on Sept 30, 2020

(/ 1 2) #=> 1/2 Clojure as well, though it's a rational not a float.

MaxBarraclough · on Sept 30, 2020

Interesting, thanks

chrisseaton · on Sept 29, 2020

Double represents consecutive integers up to 2^53 - that's even larger than Java's int so it's more than fine for almost all purposes.

umvi · on Sept 29, 2020

I can't help but compare JS to python though. For every footgun in JS, the same footgun does not exist in python including this one.

MaxBarraclough · on Sept 29, 2020

I agree Python is the better language, but they have quite different origin stories. JavaScript was thrown together in very short order, and has gradually evolved since then. It was a good enough first-to-market scripting language for the browser, got standardised, and has been the client-side scripting language of the web ever since. It's evolved somewhat since its early days but is backward compatible.

If I understand correctly, Python had a more considered initial design, and had a clean-break overhaul in Python3, allowing it to evolve without accumulating as much cruft and inconsistency. (Python is famous for its philosophy of there should be one—and preferably only one—obvious way to do it.) People use Python by choice: unlike JavaScript where browsers support no other scripting languages, Python won out over its rivals in fair contest.

All things considered, JavaScript isn't that bad. Used correctly it can be elegant and powerful. It has some nasty dark corners, but it generally provides all you need to avoid them, if you know your way around. With modern JIT engines it can even be pretty fast. Its curious use of doubles is just the kind of thing modern JIT engines are able to be smart about, compiling down to use integer operations where possible.

jkaptur · on Sept 29, 2020

Interesting - I view it the opposite way. In JavaScript, you need to know the details of exactly one numeric type: the regular, standard IEEE 754 float that's used by Java, Python, C++ (generally), and every other programming language under the sun. That's it! There is only one way to do it ;)

These days, if you need huge integers, you can use BigInt, but if you want to use it in an expression that also contains floating point values, then you need to explicitly convert it. Explicit is better than implicit.

umvi · on Sept 29, 2020

I just find it counter intuitive, that's all. The expectation is that high level scripting languages have BigInt enabled by default (since that's what Ruby/Python do). Not so with JS.

The expectation is that sorting an array of integers in place:

    [1,3,2,4,5,7,6,8,9,10].sort()

produces an array with the numbers in ascending order (since that's what Ruby/Python do). Not so with JS (try it out).

A agree there is a perfectly logical reason for every quirky behavior, but I've been surprised (in a bad way) so many times by JS's counter-intuitive behavior that I now have a mental collection of gotchas where JS deviates from expectation.

chrisseaton · on Sept 29, 2020

Consider this:

JavaScript actually has a formal specification. Ruby and Python have no formal specification. I can tell you (with reference to the specification) exactly what a JavaScript program does. JavaScript may seem wacky on the surface, but actually it's all worked out. Nobody can tell you what a Ruby or Python program does.

And you think JavaScript is the footgun?

umvi · on Sept 29, 2020

Well yes, in theory if all the developers read JS's formal specification JS would be better. But in practice developers just wing it without reading anything, and only go to read things when getting an unexpected result. And so we get the follow outcome - python behaves as expected in more cases than JS behaves as expected.

"Theory is great in theory, but in practice, practice is better"

ficklepickle · on Sept 29, 2020

Array.sort converts to UTF-16 code units, or some such weirdness, by default. If you are sorting anything other than strings, you will want to pass in a comparison function.

[1,3,2,4,5,7,8,9,10].sort((a,b) => a - b) gives you the behavior one would expect.

I suppose you already know that, though. Agreed there are lots of quirks.

jkaptur · on Sept 29, 2020

I mean, if you define "whatever Ruby and Python do" to be intuitive, then, sure, every other language is going to have "a long list of weird footguns".

JS has many quirks, but I like that the numeric representation is extremely simple and explicit (even compared to Python).

mschuetz · on Sept 29, 2020

That's a problem with the specification of sort, not with javascript using double by default. But yeah, that sort example is seriously bonkers.

recursive · on Sept 29, 2020

Except it also has bitwise operators which is definitely weird.

ygra · on Sept 29, 2020

And those bitwise operators are defined to convert the number into a 32-bit signed integer as part of the process. Which is why foo | 0 is a way of converting a number to an integer (as long as it's not too large ...).

andreareina · on Sept 29, 2020

Python just has its own footguns, like mutable default arguments or how lambdas close over variables in comprehensions.

pmarreck · on Sept 29, 2020

While I think everyone can agree this is just yet another terrible design decision in JavaScript, at least there are libraries to give you this, and it looks like BigInt will eventually be native to JS:

https://github.com/peterolson/BigInteger.js/

mschuetz · on Sept 29, 2020

> While I think everyone can agree this is just yet another terrible design decision in JavaScript

I disagree. Double was the best choice they could have made for a dynamically typed language.

Spivak · on Sept 29, 2020

I really really wish these things were errors. Data loss should require an explicit opt-in. It's easy to point to the spec and say that people should obviously know it back-and-forward but it's extremely surprising that JSON decoding is a lossy operation and you have reach not for an option flag but a whole 3rd party library to not get that behavior.

JSON's arbitrary precision decimals is the silliest footgun. When every implementation implicitly defines its own valid format by way of what native types it maps it makes for a really awkward way to pass messages between two black-box systems.

lmilcin · on Sept 29, 2020

I feel like a lot of discussion is on floating point accuracy and artifacts and most of it misses the point completely.

It doesn't help there are IEEE standards for fp calculations which are also very misunderstood. The point of these standards is to provide reproducibility and repeatability so that different types of software running on different machines or with different compilers provide same results.

The goal of floating point never was to provide precise representation or calculation and in fact it is not possible to provide in general with floating point. The goal was to make it easy and relatively fast to perform real world calculations.

You understand your calculator gives approximate results except for very special cases, so why would you bicker over some digits that are very far to the right of your result.

It is actually much worse with calculators because they tend to employ very different calculation engines and very different precision. You can actually identify a particular model or model family of calculator by running couple calculations and finding what kind of artifacts are present in the results, exactly.

Yet nobody will tell you calculators are useless as I frequently hear about floating point.

I work with Java and there seems to be this movement recently where developers use arbitrary precision arithmetic to do any sort of calculation on anything without actual idea of when this is needed. Supposedly flating point is broken and cannot be trusted. Sigh...

T-hawk · on Sept 29, 2020

You bicker over the digits far to the right because there tends to be ways for those inaccuracies to propagate up into your significant figures. That doesn't happen from typing a few arithmetic operations into a handheld calculator. It does in a computational environment that might be looping over and summing a million or a billion elements and applying more operations to them. It became too easy and fast to use floats for real world calculations, overstressing and exposing their inadequacies.

Of course, the right solution is to use the right tool for the job: algorithms that either use a better structure than a 64-bit float, or are smartly designed enough to avoid their inadequacies. But how many developers have both enough insight and enough development time to do that? Everything is a compromise; fixed-point decimal representation is one that works for many applications.

dataflow · on Sept 29, 2020

There's a huge difference between mathematical calculations having round-off error and computations that are not mathematical calculations exhibiting something akin to round-off error anyway.

It's like giving someone a $10k check and seeing them get $9,999.99, and then getting irritated when people try to investigate what happened because it's only a penny to you. It's kind of missing the point that the error logically shouldn't be there in this situation.

pmarreck · on Sept 29, 2020

I avoid FP as much as possible precisely because of these nondeterministic (at least in the practical sense) results. I use integers almost exclusively and if I need fixed point I have view code that renders things correctly (such as for currency). When Erlang didn't have a power function that didn't use floating-point (and thus resulted in rounding errors), I wrote my own: https://github.com/pmarreck/elixir-snippets/blob/master/math...

recursive · on Sept 29, 2020

In this case you're just trading for overflow.

spullara · on Sept 29, 2020

And now you know why the Twitter API has all IDs as both numbers and strings.