
Fast function to parse strings into double (binary64) floating-point values - vitaut
https://github.com/lemire/fast_double_parser
======
scottlamb
It has about 15 KB of data: 10KB in power_of_ten_components and 5KB in
mantissa_128. [1] That's ~23% of the per-core L1 data cache on (eg) the
recently announced APU chip. [2, 3] So before getting too excited about the
microbenchmark numbers, I'd be careful to ensure the additional cache pressure
doesn't slow down my program as a whole.

edit: also, ~2.5KB of that is due to padding in power_of_ten_components. I
wonder if it'd be better to split the uint64_t field into two uint32_ts to
avoid this.

[1]
[https://github.com/lemire/fast_double_parser/blob/master/inc...](https://github.com/lemire/fast_double_parser/blob/master/include/fast_double_parser.h)
[2]
[https://news.ycombinator.com/item?id=22440894](https://news.ycombinator.com/item?id=22440894)
[3]
[https://en.wikichip.org/wiki/amd/ryzen_embedded/r1606g](https://en.wikichip.org/wiki/amd/ryzen_embedded/r1606g)

~~~
userbinator
Also, I know it's not always possible, but if your application is spending so
much time parsing strings into numbers that this could make a difference,
seriously consider whether the source of those strings could be changed to not
emit strings, because unless it's something like user input, chances are they
may already be doubles which can be used as-is.

~~~
thedance
HN is collectively schizophrenic on this topic. On the one hand, the comments
say that nobody really needs anything fancier than JSON. On the other hand,
other comments say that if JSON parsing shows up in your profiles, you should
switch off JSON!

~~~
userbinator
HN (and a lot of other places, really) has a large population of web or
otherwise "web-heritage" developers, as well as those whose ventures into
computing started beside/before the web. I suspect the JSON promotion comes
mostly from the former group, while the latter, of which I am a member,
laments it roughly as much as we did the "XML craze" of the late 90s and early
2000s.

~~~
xxs
Indeed, hand rolled fixed point network marshaling, preferably with delta
encoding for time series alike data (or data that oscillates) is couple of
orders of magnitude faster and but also significantly "smaller" than a double-
string-double, counterpart. Getting ~1 byte per data point of time series is
quite achievable.

~~~
vbezhenar
How do you encode floating point delta into one byte?

~~~
thedance
You can't, that's why they said fixed point. Time series data should probably
never use floats, for the exact reason that delta-encoding is not effective.

~~~
scottlamb
"Can't" is too strong a word. Depending on what your numbers represent, I'd
think you could do one or more of:

* have a shorter exponent and/or mantissa to make each number shorter and the overall series more repetitive. (maybe even skip the mantissa if you only care about order of magnitude! call it the opposite of fixed-point, or just storing the logarithm of the value, whatever terminology works for you...)

* skip the sign bit for a non-negative series (e.g., deltas of a non-decreasing series),

* use a variable-width bitstream encoding like exp-Golomb to represent the exponent and/or mantissa,

* use run-length encoding if it's repetitive (after delta-encoding),

* etc.

One byte per value (or less!) is surely possible in some cases. But I've never
done it myself and agree fixed-point sounds much more pleasant. One of my
projects represents deltas of durations in units of 90,000ths of a second,
varint-encoded. I much prefer that to dealing with float seconds.

~~~
thedance
There's also the XOR compression scheme described by Facebook:

[https://www.vldb.org/pvldb/vol8/p1816-teller.pdf](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf)

------
plus
> That is, given the string "1.0e10", it should return a 64-bit floating-point
> value equal to 10.

Err, surely it should be equal to 10,000,000,000. Or more probably, they meant
to write "1.0e1".

~~~
vardump
Oh, you don't use base 10,000,000,000 yet?

------
eesmith
More context at Daniel Lemire's blog post at
[https://lemire.me/blog/2020/03/10/fast-float-parsing-in-
prac...](https://lemire.me/blog/2020/03/10/fast-float-parsing-in-practice/) .
It's about twice as fast as abseil or from_chars and nearly 10x faster than
strtod.

------
londons_explore
I was expecting this to include fancy bit twiddling and simd assembly for
parsing 8 decimal digits at once...

But the reality is the core of the algorithm is still a while loop that looks
at each digit one at a time and multiplies by 10 each time ...

------
zone411
Shouldn't std::from_chars be included in the benchmarks?

~~~
deadmutex
Yeah, looking forward to it. Especially with latest version of MSVC, based on
this post from the author:
[https://www.reddit.com/r/cpp/comments/92bkxp/how_to_efficien...](https://www.reddit.com/r/cpp/comments/92bkxp/how_to_efficiently_convert_a_string_to_an_int_in_c/e35b3r1/)

------
voldacar
Is there a site out there that collects the absolute fastest ways of doing
common low-level operations like this? For something as common as converting
strings of digits to IEEE 754 floats/doubles, you would think we would already
have the absolute fastest sequence of assembly instructions to do this. It's
disconcerting thinking that the functions in the standard c/c++ library may
not even be close to optimal.

Very cool btw

~~~
thedance
I can't think of a single thing in the C or C++ standard library that is state
of the art. Often they are actually quite terrible. Some of them just can't be
fixed because of ABI compatibility issues. That's why, just as an example,
C++'s std::unordered_map is an embarrassment compared to all recent hash
tables.

~~~
voldacar
That's so true and honestly it kind of gives me a pit in my stomach just
thinking about all the C++ programs in particular that use STL and end up
being so much slower than they could be. Hopefully Rust and Zig won't end up
making this kind of mistake

~~~
pjscott
The most egregious offenders here IMO are C++'s map and unordered_map. The
spec constrains the possible implementations, and it's hard to implement them
as anything better than a red-black tree or a badly-unoptimized hash table,
respectively. And really the unordered one should be the default with the nice
short name.

It looks like Rust is doing well on this. HashMap (the one everybody uses) is
essentially a clone of Google's heavily-optimized C++ hash table:

[https://abseil.io/blog/20180927-swisstables](https://abseil.io/blog/20180927-swisstables)

... and if you need the collection to be sorted, their BTreeMap has much lower
overhead than the C++ std::map, plus a big prominent note in its documentation
saying that HashMap is usually the one you want.

I'm impressed. (Haven't looked at Zig yet.)

~~~
RossBencina
> anything better than a red-black tree

Is there an equivalent data structure that has better worst-case time
complexity than red-black tree?

~~~
Dylan16807
There are plenty of algorithms that are also O(logn) that are much faster in
practice, because of how caches work.

~~~
RossBencina
Could you name some please? I'm genuinely interested in learning of data
structures with this property.

~~~
Dylan16807
The simplest one is a B tree. It's like a self-balancing binary tree, except
that you put more than two children in each node.

Same big-O, but faster by a big linear multiplier.

------
zozbot234
Great hack, but in a broader sense this is silly. If parsing/pretty-printing
floats from/to ASCII strings is a bottleneck you should be using hexfloats, as
supported in the latest C/C++ standards and elsewhere. As a bonus, they will
reliably evaluate to the same floating point number, eliminating an extra
source of ambiguity and possible overhead.

~~~
chadaustin
It can be a big chunk of the cost of parsing JSON.

~~~
gameswithgo
stop using json is the message

~~~
blacklion
I dream about day when all cryptoexchanges get this message. Not today :-(

------
alkonaut
What's a scenario where you have a huge flow of floating point data as text?
Assuming it's usually json, but in which case does that happen and you need to
work around it (parse faster) rather than fix the underlying problem of having
a huge stream of numbers as _text_?

~~~
blacklion
If you could control both ends, yes. But sometimes you need to consume data
produced by entities which you could not control. Examples? Crypto currency
exchanges, for example. All of them use ugly, often terrible, JSON-based
protocols, and some of them have non-trivial amount of trading to see JSON
parsers as hot spots.

------
roel_v
Would having a mode that restricts the types of numbers it accepts (in my
particular case: no scientific notation) speed it up significantly?

------
abductee_hg
bad. you should not do that at all.

~~~
yellowapple
If you're writing a parser for any language which is expected to deal with
floating point numbers, being able to parse those numbers is generally
necessary.

