
Python rounds float values by converting them to string and then back - bishala
https://github.com/python/cpython/blob/master/Objects/floatobject.c#L965-L972
======
coldtea
Seems to be one of the best ways to go about it.

From the comment in protobuf source (which does the same thing as Python),
mentioned in the Twitter thread:

(...) An arguably better strategy would be to use the algorithm described in
"How to Print Floating-Point Numbers Accurately" by Steele & White, e.g. as
implemented by David M. Gay's dtoa(). It turns out, however, that the
following implementation is about as fast as DMG's code. Furthermore, DMG's
code locks mutexes, which means it will not scale well on multi-core machines.
DMG's code is slightly more accurate (in that it will never use more digits
than necessary), but this is probably irrelevant for most users.

Rob Pike and Ken Thompson also have an implementation of dtoa() in
third_party/fmt/fltfmt.cc. Their implementation is similar to this one in that
it makes guesses and then uses strtod() to check them. (...)

[https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb3...](https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb33199984118d801956822842771e7e/src/google/protobuf/stubs/strutil.cc#L1174-L1213)

~~~
ChrisLomont
>Seems to be one of the best ways to go about it.

The C/C++ standards do not require formatting to round correctly or even be
portable. I recently had an issue where a developer used this method to round
floats for display, and there were differences on PC and on Mac. It literally
rounded something like 18.25 to 18.2 on one platform and 18.3 on the other.
This led to all sorts of other bugs as some parts of the program used text to
transmit data, which ended up in weird states.

The culprit was this terrible method. If you want anything approaching
consistency or predictability, do not use formatting to round floating point
numbers. Pick a numerically stable method, which will be much faster of done
correctly.

Coincidentally, C/C++ do not require any of their formatting and parsing
routines to round-trip floating point values correctly (except the newly added
hex formatted floats which are a direct binary representation, and some newly
added function allowing an obscure trick I do not recall at the moment... )

~~~
eesmith
> The C/C++ standards do not require formatting to round correctly or even be
> portable.

The linked-to method uses PyOS_snprintf(). Its documentation at
[https://docs.python.org/3/c-api/conversion.html](https://docs.python.org/3/c-api/conversion.html)
says:

"""PyOS_snprintf() and PyOS_vsnprintf() wrap the Standard C library functions
snprintf() and vsnprintf(). Their purpose is to guarantee consistent behavior
in corner cases, which the Standard C functions do not."""

~~~
ImNotTheNSA
response was to the fact that the comment said the method of format strings
was one of “the best ways to go about it”

It’s obvious that PyOS_snprintf is not a standard library function

~~~
eesmith
Sure, PyOS_snprintf isn't a standard library function, but it's a thin wrapper
to snprintf, which is. Python/mysnprintf.c is the location of the:

    
    
       snprintf() wrappers.  If the platform has vsnprintf, we use it, else we
       emulate it in a half-hearted way.  Even if the platform has it, we wrap
       it because platforms differ in what vsnprintf does in case the buffer
       is too small:
    

It mentions that one corner cases what happens when the buffer is too small.
Not rounding issues.

The "the best ways to go about it" comment links to the protobuf code, which
also uses snprintf.

The (what I think is the) relevant C99 spec at [http://www.open-
std.org/jtc1/sc22/WG14/www/docs/n1256.pdf](http://www.open-
std.org/jtc1/sc22/WG14/www/docs/n1256.pdf) says:

> For e, E, f, F, g, and G conversions, if the number of significant decimal
> digits is at most DECIMAL_DIG, then the result should be correctly rounded.
> If the number of significant decimal digits is more than DECIMAL_DIG but the
> source value is exactly representable with DECIMAL_DIG digits, then the
> result should be an exact representation with trailing zeros. Otherwise, the
> source value is bounded by two adjacent decimal strings L<U, both having
> DECIMAL_DIG significant digits; the value of the resultant decimal string D
> should satisfy L ≤ D ≤ U, with the extra stipulation that the error should
> have a correct sign for the current rounding direction.

So either 1) "The C/C++ standards do not require formatting to round correctly
or even be portable.", in which case Python and protobuf are doing it wrong
and somehow this issue was never detected, or 2) The C/C++ standards _do_
require correct rounding, but the case described by ChrisLomont didn't quite
meet the spec requirements to get precision and rounding modes to match across
platforms. Or 3), I don't know what I'm talking about.

~~~
ChrisLomont
"correctly rounded" is implementation defined is the problem. You cannot do it
portably, and you cannot query it portably. As such, different platforms,
compilers, etc do it differently. Thus using formatting for rounding is
inconsistent.

Here's [1] where you can query the current floating-point environment in C:
"Specifics about this type depend on the library implementation".

Here's [2] where you can set some rounding modes in C++: "Additional rounding
modes may be supported by an implementation.". Note this does not have by
default bankers rounding which is used to make many scientific calculations
more stable (lowers errors and drift in accumulated calculations). Many
platforms do this by default, but it's not in the standard.

You can chase down this rabbit hole. I (and several others) did during the
issue on the last project, and got to where it was well-known in numerics
circles that this is not a well-defined process in C/C++. If it were, printing
and parsing should round-trip, and it does not before a recent C++ addition,
and now it __only __is guaranteed in a special case.

[1]
[http://www.cplusplus.com/reference/cfenv/fenv_t/](http://www.cplusplus.com/reference/cfenv/fenv_t/)

[2]
[https://en.cppreference.com/w/cpp/numeric/fenv/FE_round](https://en.cppreference.com/w/cpp/numeric/fenv/FE_round)

~~~
eesmith
Thank you for the clarification!

------
fs111
Apples libc used to shell-out to perl in a function:
[https://github.com/Apple-FOSS-
Mirror/Libc/blob/2ca2ae7464771...](https://github.com/Apple-FOSS-
Mirror/Libc/blob/2ca2ae74647714acfc18674c3114b1a5d3325d7d/gen/wordexp.c#L192)

~~~
stefan_
I thought this is what the Unix philosophy is supposed to be all about.

(Realistically, calling wordexp should just abort the program. Now I actually
want to make a hacked up musl that aborts in all the various "libc functions
no one should ever use" and see how far I get into a Ubuntu boot..)

~~~
f00zz
Would be pretty awesome if Perl called wordexp(3) somewhere along this code
path

~~~
microtherion
I seem to recall that perl used to shell out to /bin/sh for some related
task...

~~~
microtherion
Yep, still there in the latest perl5: Perl_start_glob
[https://github.com/Perl/perl5/blob/blead/doio.c](https://github.com/Perl/perl5/blob/blead/doio.c)

It's somewhat messier than I remember, because it uses csh as the first choice
and falls back to sh.

------
Noe2097
Well, the problem is precisely that rounding as it is generally conceived, is
expressed in base 10 - as we generally conceive numbers including floating
point ones in base 10. Yet at the lowest level, the representation of numbers
is in base 2, including floating point ones. It is imaginable, would be more
correct and efficient to perform rounding (or flooring or ceiling, for that
matter) in base 2, but it would be that more difficult to comprehend when
dealing with non integers in code. Rounding in base 10 needs some form of
conversion anyway, going for the string is one way that is, at least, readable
(pun intended).

------
bhouston
In my experience there are few things slower that float to string and string
to float. And it seems so unnecessary.

I always implemented round to a specific digit based on the built-in
roundss/roundsd functions which are native x86-64 assembler instructions (i.e.
[https://www.felixcloutier.com/x86/roundsd](https://www.felixcloutier.com/x86/roundsd)).

I do not understand why this would not be preferable to the string method.

float round( float x, int digits, int base) { float factor = pow( base, digits
); return roundss( x * factor ) / factor; }

I guess this has the effect of not working for numbers near the edge of it's
range.

One could check this and fall back to the string method. Or alternatively use
higher precision doubles internally:

float round( float x, int digits, int base ) { double factor = pow( base,
digits ); return (float)( roundsd( x * factor ) / factor ); }

But then what do you do if you have a double rounded and want to maintain all
precision? I think there is likely some way to do that by somehow unpacking
the double into a manual mantissa and exponent each of which are doubles and
doing this manually - or maybe using some type of float128 library
([https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/do...](https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/float128.html))...

But changing this implementation now could cause slight differences and if
someone was rounding then hashing this type of changes could be horrible if
not behind some type of opt-in.

~~~
StephanTLavavej
float to string is incredibly fast now - look at Ulf Adams’ Ryu and Ryu Printf
algorithms, which I’ve used to implement C++17 <charconv> in Visual Studio
2019 (16.2 has everything implemented except general precision; the upcoming
16.4 release adds that).

I don’t know of truly fast algorithms for string to float, although I improved
upon our CRT’s performance by 40%.

~~~
ChrisLomont
Formatting is much faster than before, but still terribly slow compared to
simply rounding numbers using math and floor and ceiling appropriately.

Ryu is more than 100x slower than something like

rval = floor(100*val+0.5)/100.0

(which is not quite right due to numerical issues, but close, and illustrates
the idea).

Formatting, to get a rounded float, is terribly slow.

------
bishala
Related thread on Twitter
[https://twitter.com/whitequark/status/1164395585056604160](https://twitter.com/whitequark/status/1164395585056604160)

------
shellac
OpenJDK BigDecimal::doubleValue() goes via a string in certain situations
[https://github.com/openjdk/jdk/blob/master/src/java.base/sha...](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/math/BigDecimal.java#L3667)

~~~
SloopJon
I just ran into a similar booby trap the other day: whereas
BigDecimal::BigDecimal(double) does the full decimal expansion,
BigDecimal::valueOf(double) goes through Double::toString(double), which is
generally a lot fewer digits.

------
latchkey
[https://0.30000000000000004.com/](https://0.30000000000000004.com/)

~~~
ilovepeppapig
The worst way to explain something is to begin with "It's actually pretty
simple."

~~~
The_rationalist
Not _always_! E.g Let's say someone begin to explain me a topic and use e.g
two words that I didn't knew before: e.g supervenience and congency.

Humans generally are afraid of new words (especially weird sounding ones) and
often will assume that the subject is complex and might intimidate them.

But unknown words can have extremely simple meanings, or be synonyms of
already known words.

By asserting: "It's actually pretty simple" You give them confidence that
there's not reason to be afraid of the topic or of the words.

~~~
Zancarius
There's much truth in this.

Whenever I've helped older people with technology they've never used before (a
new tablet or similar), if I started off with any suggestion that it's less
than simple, they'll almost certainly frame the problem scope in their mind as
difficult, and give up, because they're already exhibiting some animosity
toward learning a new thing.

If instead you phrase it as "this is really easy, let me show you how...", you
short-circuit this process by framing their expectations differently, and that
little bit of extra confidence ("this is easy") can help them through the
learning process.

I've found you can't simply show them, either. It's almost better to say "It's
easy" and _then_ go through the process, because it's absolutely necessary to
establish expectations first. They're already afraid of it (it's new), so
doing _something_ to get their guard down can go a long way toward helping
them explore on their own. I tried this experiment with my mother, and some
weeks later she'd have a problem and discover the solution herself
specifically _because_ she was convinced it was easy to do. This can (and
will) backfire if you're not careful about how you do it, but I've had far
more success using this tool than other techniques individually (e.g. writing
down instructions).

This doesn't broadly apply to areas outside education and support (or even to
all areas in education), but for simple things that people may express an
irrational fear over, it works and it works well. A good teacher will use this
technique successfully with their students, so if you're teaching someone, use
it!

------
zelly
This is what we are promised will make trucks drive themselves and usher in
the 4th industrial revolution.

~~~
StreamBright
I am not sure if people understand what kind of hellhole is IT in general.

~~~
Scarblac
There is an XKCD for that:
[https://www.xkcd.com/2030/](https://www.xkcd.com/2030/)

------
analog31
My quick impression is that the choice of a rounding algorithm is relative to
the purpose that it serves. For instance, floor(x + 0.5) is good enough in
many applications.

In some cases, rounding is performed for the primary purpose of displaying a
number as a string, in which case it can't be any less complicated than the
string conversion function itself.

~~~
raphlinus
Fun fact: floor(x + 0.5) rounds 0.49999997 to 1.0 (this is 32 bit floats, the
same principle applies to 64). Most libraries have slower than ideal round
conversion because of historical dross; modern chips have a very fast SIMD
round instruction but its behavior doesn't exactly match libc round. See
[https://github.com/rust-lang/rust/issues/55107](https://github.com/rust-
lang/rust/issues/55107) for a deeper discussion.

~~~
dwheeler
I just tried this on Python3 on a 64-bit x86 system:

    
    
        import math
        x = 0.49999999999999994
        print(x-0.5)
        print(math.floor(x+0.5))
    

I got these printouts:

    
    
        -5.551115123125783e-17
        1
    

So yes, something less than 1/2, with 1/2 added to it, has a floor of 1 in
floating point math.

Yet another reminder that floating point calculations are _approximations_ ,
and not exact.

~~~
masklinn
> Yet another reminder that floating point calculations are approximations,
> and not exact.

floating-point _calculations_ are absolutely exact. What's not is conversions
between decimals and floating-point.

~~~
murkle
> floating-point calculations are absolutely exact

Are you sure about that? What about 1.00000000000001^2 (using eg 64 bit
double)?

~~~
lifthrasiir
Maybe the parent wanted to say that the result is detetministic and not far
too off from the true value (in fact, as accurate as possible given the
format). In the other words FP is exact but abides by a slight different
calculation rule involving rounding.

~~~
dwheeler
Floating point is usually deterministic, but from a math point of view it is
inexact (in most cases it only approximates the exact answer). In many cases
that is fine, which is why it is used, but it is important to remember that.

------
jancsika
A bit on topic...

Is there a phrase for the ratio between the frequency of an apparent archetype
of a bug/feature and the real-world occurrences of said bug/feature? If not
then perhaps the "Fudderson-Hypeman ratio" in honor of its namesakes.

For example, I'm sure every C programmer on here has their favored way to
quickly demo what bugs may come from C's null-delimited strings. But even
though C programmers are quick to cite that deficiency, I'd bet there's a
greater occurrence of C string bugs in the wild. Thus we get a relatively low
Fudderson-Hypeman ratio.

On the other hand: "0.1 + 0.2 != 0.3"? I'm just thinking back through the
mailing list and issue tracker for a realtime DSP environment that uses
single-precision floats exclusively as the numeric data type. My first
approximation is that there are significantly more didactic quotes of that
example than reports of problems due to the class of bugs that archetype
represents.

Does anyone have some real-world data to trump my rank speculation? (Keep in
mind that simply replying with more didactic examples will raise the
Fudderson-Hypeman ratio.)

~~~
pvg
What's the 'class' of the second thing? Numerics/fp bugs of all stripes are
super common. Just often less crashy or noticeable.

------
d--b
Note that there is a fallback version that doesn't use strings. This is
definitely something that's been thought through.

------
ericfrederich
I was looking once at Python and Redis and how numbers get stored. I remember
Python would in the end send Redis some strings. I dove pretty deep and found
that Python floats when turned into a string and then back are exactly the
same float.

I remember even writing a program that tested every possible floating point
number (must have only been 32 bit). I think I used ctypes and interpreted
every binary combination of 32 bits as a float, turned it into a string, then
back and checked equality. A lot of them were NaN.

~~~
dagw
_A lot of them were NaN._

I seem to recall that ~0.5% of the IEEE 32 bit float space is NaN.

~~~
ekimekim
A NaN is any value where the 7-bit (for 32-bit floats) exponent is all 1s,
except for +/-inf. So a quick approximation is that 1/128 ~= 0.78% of the
space is NaN.

That means there's 25 bits that we can change while still being either a NaN
or an inf. But two of those values are infs, so we need to remove them. Divide
that by the entire range and we have (2^25 - 2) / 2^32 = 16777215/2147483648,
or about 0.78124995%.

------
deckar01
`blob/master` isn't a suitable permalink. Use the first few letters of the
commit hash so the line numbers and code are still relevant when this file
inevitably gets modified.

------
ChrisSD
Maybe I'm missing something but what's wrong with rounding floats this way?

~~~
ben509
Rounding a number is, in the common case, multiplying it by some base,
truncating to an integer, and dividing by the base. You do have to handle
extremely high exponents, but even the logic for that is not complex.

Example of implementing it the sane way:
[https://github.com/numpy/numpy/blob/75ea05fc0af60c685e6c071d...](https://github.com/numpy/numpy/blob/75ea05fc0af60c685e6c071dae1f04c489a3ce93/numpy/core/src/multiarray/calculation.c#L667)

Every step of this function is complex and expensive, especially printing a
float as a decimal is _very_ complex. And round is routinely used in a tight
loop.

~~~
Armisael16
How does truncating a positive number ever round up?

~~~
NohatCoder
Add 0.5 before rounding towards negative infinity, and you'll get standard
rounding.

------
dahart
Not entirely unlike how one of the better ways to deep-copy a JSON object in
Javascript is json.parse(json.stringify(obj))

------
kstenerud
This is where decimal floating point really shines. Since the exponential
portion is base 10, it's trivially easy to round the mantissa.

The only silly part of ieee754 2008 is the fact that they specified two
representations (DPD, championed by IBM, and BID, championed by Intel) with no
way to tell them apart.

------
science404
Misleading title is misleading...

 _CPython_ rounds float values by converting them to string and then back

~~~
duckerude
PyPy does it too:
[https://bitbucket.org/pypy/pypy/src/2fc0a29748362f2a4b99ab57...](https://bitbucket.org/pypy/pypy/src/2fc0a29748362f2a4b99ab57aa551fc906878a6b/rpython/rlib/rfloat.py#lines-113)

Jython instead uses BigDecimal::doubleValue:
[https://github.com/jythontools/jython/blob/b9ff520f4f6523120...](https://github.com/jythontools/jython/blob/b9ff520f4f65231209d5200c22724516a72e75f2/src/org/python/core/util/ExtraMath.java#L39)

But as another comment noted, BigDecimal::doubleValue can pull a similar
trick:
[https://news.ycombinator.com/item?id=20818586](https://news.ycombinator.com/item?id=20818586)

------
Jenz
I dunno, how efficient is this?

~~~
ben509
Compared to base=10^places, multiply, truncate, divide? Horriby inefficient.

~~~
dagw
Sure, but their approach is on the other hand more correct. All numerical code
reaches a point where you have to balance performance vs. correctness, and
here cpython has chosen correctness over speed.

~~~
ben509
They're creating a sequence of digits and then truncating. If you want to
replicate that precisely, you could use an accumulator and a loop to do the
same thing. At least then you could break early.

------
acoye
Another pragmatic aspect of Python as I see it.

------
seamyb88
Am I the only one grimacing at the lack of curlies around if/else scope? Just
good practice!

~~~
bhd_movie
In case you didn't know, Python doesn't use curly braces to block scope.
They're pretty much only used for dictionaries and sets.

~~~
seamyb88
In case you didn't know, we're talking about c code.

