
Ryū: Fast Float-To-String Conversion - ngaut
https://pldi18.sigplan.org/event/pldi-2018-papers-ry-fast-float-to-string-conversion
======
kazinator
4:27 > _" Every binary floating point number has an exact decimal equivalent,
but not every decimal number has an exact binary equivalent. And the reason
for that, you know, is that powers of two and powers of ten are only
compatible in one direction, not in the other."_

This is because 1/2 is representable as a fraction with 10 in the denominator:
5/10\. All fractional digits after the binary point are just powers of 1/2,
thus powers of 5/10\. Division by ten in decimal doesn't produce any repeating
digits.

Here is where the above is slightly misleading, though. The number of decimal
digits to capture that exact decimal value of a binary floating point with an
n-bit mantissa _may be far in excess_ of the actual decimal precision that is
contained in an n-bit mantissa.

In the case of the IEEE 64 bit double we need 17 decimal digits to capture a
printed decimal representation which will reproduce the original double. That
representation isn't the exact decimal number; much more than 17 digits may be
required to get the exact decimal value. The exact decimal value is, I think,
rarely of interest. In an ordinary application of floating point numbers, we
don't print double values to, say, 30 digits of precision; anything after 17
is "junk".

In the other direction, only 15 decimal digits of precision are guaranteed to
be preserved by the representation, so in a 17 digit print, digits 16 and 17
are also "dodgy"; they serve only to record the object exactly.

~~~
obl
You're right, but I don't think it qualifies as misleading since the whole
algorithm is exactly about finding the minimal expansion for a specific
number.

The author even goes through an explicit example.

------
jloughry
The paper is open-access at ACM:

[https://dl.acm.org/citation.cfm?id=3192369](https://dl.acm.org/citation.cfm?id=3192369)

That's really appreciated; thanks.

~~~
ulfjack
You're welcome.

------
bratao
The code is here:
[https://github.com/ulfjack/ryu](https://github.com/ulfjack/ryu)

~~~
KenoFischer
Jacob Quinn has a translation to Julia here:
[https://github.com/quinnj/Ryu.jl](https://github.com/quinnj/Ryu.jl)

~~~
qop
I really enjoy seeing Julia devs so invested and up-to-date with projects and
experiments like this, because isn't this the bread and butter of technical
computing? The nitty gritty of making all these bits and bytes do work! So
cool.

Is Ryu.jl significantly faster than what's in Base? Is there already an open
issue somewhere?

~~~
KenoFischer
Yes, I think preliminary tests showed that it was faster than Grisu (which is
what we're using in Base currently). It also seems significantly simpler
(Grisu is generally fairly simple but has a custom BigInt implementation to
handle some corner cases, which makes things complicated). Not a good time
right now to swap the float print algorithm, but a good project for post-1.0.

I don't see an issue about it at the moment. I guess we just expect Jacob to
PR it at some point. IIRC he also wrote the Grisu implementation that's in
Base.

~~~
qop
To hijack this thread a different direction for a moment, is this type of
change one that would be allowed to occur in 1.x? How is it determined what
changes are suitable for 1.x and what will wait for 2.0, once 1.0 is released?

I'm very interested in how detailed and coordinated the Julia team has become
regarding the longer term future of Julia. Admirable effort all around.

~~~
KenoFischer
I'd say probably yes. Changes are suitable for 1.x if they are not generally
breaking for user code. This change probably wouldn't be considered breaking,
but we'd have to look at it at the time.

~~~
ulfjack
FYI: Grisu2 and Ryu should produce bit-identical outputs in all cases.

------
legulere
(De-)Serialization being such a common task both in Servers and clients
nowadays I wonder if special instructions would make sense in CPUs

~~~
namibj
At that point just base-64 for your IEEE floats.... Because anywhere where you
need that speed you can better handle it without using strings...

~~~
cryptonector
But that's not necessarily interoperable. E.g., some JSON parsers might
produce IEEE754 doubles, but others might use arbitrary precision
representations -- what then? One answer is to make JSON support only IEEE754
doubles, but it's a bit late for that, so at most one can recommend that
implementors stick to IEEE754.

If you are starting from scratch, and want to support only platforms that
support IEEE754 in hardware, and so on, then yes, just serialize the raw bits.

~~~
pjc50
You've just given me the idea that interop should be done with "hexponential"
notation: 0x123ABC*0x10^0xDEF can be an exact representation of a floating
point number.

~~~
stephencanon
That’s basically hexadecimal floating point strings, which are in C since C99
via the %a format specifier (they use a base-two exponent written in decimal
instead of a base-16 exponent in hex, but that doesn’t really change the
complexity of parsing or formatting.)

------
twic
This is in a track called "PLDI Research Papers - Floats and Maps". Is that
because they had one paper on maps and two on floats, and decided to put them
together, or is there some deep connection between the two?

~~~
ulfjack
As far as I could tell, there wasn't any deep connection between the talks.

------
tgtweak
Other than printing out the numbers, what are the practical applications?

~~~
Veedrac
Javascript was mentioned in the talk; because of its weak typing and plenty of
APIs that take strings, floating point printing is pretty common.

~~~
amelius
Often, numbers in JS happen to be just integers. Would the implementation
optimize for that case?

~~~
Veedrac
Javascript JITs specialize for integers already. I don't know if Ryū does
specifically though.

~~~
ulfjack
It doesn't optimize for ints right now. I thought about that, but haven't
tried it yet.

------
ur-whale
To save some time:
[https://github.com/ulfjack/ryu](https://github.com/ulfjack/ryu)

------
kstenerud
Does anyone know of a good decimal float to string converter, preferably in c?

~~~
nwellnhof
C code for Ryu is available under the Apache2 and Boost licenses:
[https://github.com/ulfjack/ryu/tree/master/ryu](https://github.com/ulfjack/ryu/tree/master/ryu)

Also have a look at Grisu3, Dragon4, dtoa, and the printf implementations of
glibc and musl.

~~~
stephencanon
The two fastest options that I'm currently aware of, by a pretty wide margin,
are Ryu and Swift's implementation (single C file, it's only in a C++ file to
satisfy a build system quirk in Swift on linux:
[https://github.com/apple/swift/blob/master/stdlib/public/run...](https://github.com/apple/swift/blob/master/stdlib/public/runtime/SwiftDtoa.cpp),
Apache2 + run time exception).

Ryu is somewhat faster for doubles, though SwiftDtoa has support for 80-bit
and float out of the box, which is nice, and we have some further perf
improvements planned. Both are considerably faster than any of the
alternatives (some perf data in the original Swift PR:
[https://github.com/apple/swift/pull/15474](https://github.com/apple/swift/pull/15474)),
and both pass our fairly intensive test suite.

musl's implementation is also interesting for reasons of simplicity.

