
Herbie: Automatically Improving Floating Point Accuracy - lelf
https://herbie.uwplse.org/
======
vii
They made a related project -
[http://herbgrind.ucsd.edu/](http://herbgrind.ucsd.edu/) \- a tool that can
automatically find floating point issues in real programs. You can also
annotate the region you're interested in with herbgrind on/off controls.

Unfortunately - once you know you have a floating point problem, tracking it
down and fixing it may be easy. Not realising it, you won't use these tools
and the program will do the wrong thing :)

~~~
pavpanchekha
Very true. Floating point bugs are hard to find, because there's often no way
to compute a more accurate answer. Herbgrind was our attempt at helping track
down a _root cause_ once you knew floating point was the problem---that might
be the case if, for example, you changed from double to single precision and
suddenly the result was way off. But Herbgrind is pretty high overhead; you'd
want some other tool to tell you that floating point was a problem to begin
with.

------
contravariant
It doesn't seem to be doing too well on log(1+x). It does correctly identify
that the Taylor expansion is more accurate for small x (although in my opinion
it switches back to log(1+x) a bit too soon for positive x), however it
doesn't seem to take into account the essential singularity at x=-1 and keeps
using the Taylor series for negative values all the way up to x=-1 leading to
horrendous accuracy.

~~~
pavpanchekha
That's right. Herbie's error measure is average error over the whole range of
inputs; one way to think about that is that it's really measuring the area of
inputs that have reasonable error. There aren't many floating-point values
near -1, so Herbie doesn't consider the error there that important. To be
precise, Herbie tries to avoid overfitting by "charging" itself 1 bit of
accuracy for every branch; adding a branch for, say, -1 <= x <= -0.8 isn't
worth it. If you set a precondition, say -1 <= x <= -0.8, Herbie will instead
focus on that range.

Also note that the Herbie web demo has some options set that make it fast (to
handle load) at the cost of lower accuracy. For example, if you download and
install it yourself, you can turn on support for the special numeric functions
(like log1p) or increase the number of search iterations done.

~~~
contravariant
I see. I guess it ends up doing something a bit unexpected because the
distribution it's sampling from is quite different from the numbers people
often deal with (also unlike the example on the main page it can apparently
fail to find a function that is more accurate for all inputs).

Also this is a bit of a special case because it's easy to show that x - x^2 /
2 is within 10^-15 of the true value provided abs(x) is within 10^-6 or so, so
it's easy to figure out how good Herbie is doing.

~~~
pavpanchekha
That's right, the implicit distribution is uniform _over floating-point
values_ , so for example about a quarter of samples are between -1e-150 and
1e-150.

------
jonnycomputer
The natural application of this is in a code-inspection plugin for an IDE.
Pretty cool tool!

~~~
pavpanchekha
There used to be GHC and Rust plugins. I don't know much about building IDE
plugins, but I agree it'd be a great use of Herbie.

------
ImaCake
This seems like a great tool for certain statistical applications. I am
learning a bit about the controversey surrounding extremely small p-values in
Genome Wide Association Studies (think a million _n_ ANOVA) and machine
precision. I guess tools like this would be very useful to make sure your
statistical tests can more accurately calculate p-values approaching the
limits of machine precision.

~~~
contravariant
If your p-values are approaching machine precision then you don't need better
floating point precision you need to be using the log-likelihood (admittedly
for gaussian samples this is the variance which requires some careful
handling).

Also if your p-values are approaching machine precision, then maybe p-values
aren't the most meaningful metric.

------
b34r
I wouldn’t feel remotely comfortable using this on an existing project unless
it had an iron-clad test suite.

~~~
pavpanchekha
I totally agree, and in fact in big scientific code this is an ongoing
reproducibility issue---different platforms (like GPUs) can have subtly
different floating-point behavior, and how do you know the results hold up.

That said, your floating-point code probably didn't come with a numerical
analysis to begin with, so how do you know it's better than the replacement?
(Well... Herbie does dumb stuff sometimes, so do use with care.)

------
mrnuclear
I think it would be interesting to re-run experiments / data analyses (ones
sensitive to FP correctness) to see what holds up. I imagine that would be a
pain though; reproducibility isn’t usually easy.

------
bionhoward
We could have a field day applying this to all the PyTorch and TensorFlow
Probability distributions

