
Parsing floats in C++: benchmarking strtod vs. from_chars - ibobev
https://lemire.me/blog/2020/09/10/parsing-floats-in-c-benchmarking-strtod-vs-from_chars/
======
jeffbee
FWIW there is also absl::from_chars, with some included benchmarks and results
at the bottom of the benchmark code.

[https://github.com/abseil/abseil-
cpp/blob/master/absl/string...](https://github.com/abseil/abseil-
cpp/blob/master/absl/strings/charconv_benchmark.cc)

ETA: I adapted the benchmark in the blog to use absl::from_chars and it's more
than twice as fast as strtod.

------
osolo
For those interested in more information about performance was achieved,
Stephan Lavavej gave a talk about this very subject during CppCon 2019 [1].

[1]
[https://www.youtube.com/watch?v=4P_kbF0EbZM](https://www.youtube.com/watch?v=4P_kbF0EbZM)

------
SAI_Peregrinus
The first example is incomplete: it forgets to clear and then check errno
and/or check the returned value to +-HUGE_VAL! That'd probably slow down the
strtod version a bit more.

~~~
jwilk
strtod() could also return 0 on underflow.

But to be fair, the C++17 example doesn't check for underflow or overflow
either.

------
thangalin
Performance in the opposite direction is also useful; the Ryū algorithm[0] is
a fast float-to-string conversion that I've used recently to great effect[1].
Namely, to convert path information from glyphs into vector graphics for a
minimal math-oriented Java-based TeX implementation[2]. The result is code
that converts 1,000 simple formulas as strings into individual SVG documents
in under 600 milliseconds on newish hardware.

[0]:
[https://github.com/ulfjack/ryu/blob/master/README.md](https://github.com/ulfjack/ryu/blob/master/README.md)

[1]:
[https://news.ycombinator.com/item?id=17633182](https://news.ycombinator.com/item?id=17633182)

[2]:
[https://github.com/DaveJarvis/JMathTeX/blob/master/README.md](https://github.com/DaveJarvis/JMathTeX/blob/master/README.md)

------
SomeoneFromCA
Locale should not be a big deal in most of the code, since majority of
applications use "C" locale anyways.

------
wolf550e
I guess someone has an optimized function for this which is faster than the
one that comes with the compiler.

~~~
alfalfasprout
I'm trying to think about which use cases would really be bottlenecked by
parsing floats and they basically fall into two categories:

1) You're writing a JSON/some other human-readable serialization library 2)
You need to interact with some API that you can't change that sends floats
over the wire as strings (a subset of which will fall into the case of (1)).

For something like currencies you'd need a custom parsing engine anyways since
you'll typically represent monetary values as fixed-width integer multiples of
the smallest unit (eg; cents or basis points) except for trading engines where
proper accounting isn't required. In fact, a lookup table may even be faster
if the valid values are bounded.

My guess is most of the things in (2) are going to be reading telemetry from
embedded devices with firmware that simply won't be changed anymore and values
are sent as text.

~~~
dundarious
Except for trading engines is pretty big exception :) A lot of exchanges still
use FIX, which is a very simple text format.

~~~
alfalfasprout
Except you don't send IEEE floats over FIX. The precision is typically
bounded. Also, that's more for placing orders.. NASDAQ for instance uses ITCH
which is a binary stream.

~~~
dundarious
You're right that you'd be leaving performance on the floor if you used the
new function in the hot loop, but it would still be quite handy for recorded
FIX messages for compliance, auditing, risk tools, maybe for hedging, and
maybe for execution tasks outside the hot loop -- that's a huge surface area.

As for exchanges, CME _only_ had FIX for latency-sensitive order entry until
the iLink 3 migration began last year. CME is huge and there are others like
it, even further behind. There is plenty of liquidity you're missing if you
ignore non-binary protocols.

