An interesting tool to help preserve floating point precision is Herbie (https://herbie.uwplse.org/), which takes an expression as input and outputs an optimized version of it that will retain precision.
For those curious, this tool uses arbitrary precision floating-point arithmetic (up to 3000 bits) to sample the input space and measure the error of different implementations of the function, then some clever numerical program synthesis to generate candidates and search for the most accurate implementation.
It is, there is an entire field called Numerical Analysis about this, but instead of spending a year learning numerical analysis throwing it at a tool that spits out something pretty decent after 15 minutes is probably a better ROI.
This tool is a really cool mix of hard math and soft reality. You need some pretty advanced tools to symbolically manipulate expressions like this, and then it takes the results and throws it on the wall to see what sticks best.
There are a few edge cases in geospatial computational geometry that require greater than quad precision to correctly compute a double precision result. This fact is sometimes used as a litmus test for implementation correctness. Many implementations don't have a conditional escape for these edge cases that switches to arbitrary precision algorithms in their double precision algorithms, which is easy to verify.
“Robustness” of results is a serious problem for any kind of computational geometry, because computational geometry algorithms generally only work when various assumed invariants (based on ideal arithmetic) actually hold, but those can easily be violated by arithmetic with rounding. When invariants are violated, the result can be an infinite loop, a program crash, or a comically incorrect result,
https://www.cs.cmu.edu/~quake/robust.html is some famous work from the mid 1990s showing how to achieve provably correct results in a computationally efficient way (on a CPU).
Conversely, if you use double-precision floating point arithmetic near “null island”, the lat/lon coordinate system becomes more precise than the Planck length.
Hence why floating-point is generally a bad basis for coordinate systems. Minecraft is another example, where precision loss results in strange effects far away from the origin.
There is a flipside to this: e.g. GLIBC will switch to an arbitrary precision internal implementation for some operations that would otherwise lose a little precision and as a result run 1000x slower on some inputs.
Not a good thing when you're attempting to do realtime signal processing.
I once wrote a "solar system simulator" in blender to generate images of solar eclipses from specific locations on earth. I modelled everything using meters, so literally placing the sun at the origin, and the earth at 1AU, and the moon near it. Turns out, blender uses single precision and the distance between earth and moon is incompatible with sun and earth, so all my renders were wrong. I divided all the sizes by 10 and it worked perfectly.
As a counter argument, in my field of MD, it was eventually shown that most force field terms can be computed in single precision, with the exception of the virial expansion (https://en.wikipedia.org/wiki/Virial_stress) which has to be computed in DP.
Simple but clever! I once spent a month debugging precision issues in custom python FEA code. Only after reading your post 10 years later do I realize I could have sidesteped the whole issue with a unit change. Doh XD!
There was an article here maybe about a month ago about testing Maya vs CAD vs Blender or something. Had to do with a litmus test where you'd create a screw and inspect how the threads terminated at the tip.
First thing they do is use low teen order polynomials and accept some loss of precision in the model representation based on real world needs. This means some systems have scale preferences. Others may limit model overall size. Still others use a tolerance based on model size, dynamic tolerancing.
Basically, these systems need to know whether points are conincident, lines colinear, curves are "cocurvular" (there is a better way to say that, but I coined that word with a friend in the 90's and it makes sense, so...), and so on.
Typically, there is a tolerance because there is always error due to computation limits and used to be data size limits.
In the 80's and 90's, solid models were done on megabytes of ram, including the OS. Solidworks ran on Windows 98, for example. A typical setup would be some 32 to 64Mb of RAM. Larger models were not possible, nor were extreme levels of detail, basically high surface count. Surface count is a good metric for how "heavy" a model is in terms of system resources, and it's scale is a good metric for what tolerance is needed.
There were times where it made sense to draw a model 10X scale or even 100X scale due to the actual model size being close to the system tolerance size lower limit. This can be true today still due to legacy geometry kernel compatability.
Early on, geometry kernels had a lot of limitations. Over time, these have been addressed and it's all down to a bazillion edge and corner cases. Millions and millions of man hours are in those code bodies. It's kind of like semiconductors and process nodes. Catching up is a seriously huge and expensive effort. It's generally not done and the kernels are licensed instead.
Parasolid, ACIS, OpenCascade are examples of ones that can be licensed, and others are in house, PTC, CATIA kernels are examples of those.
How these kernels handle geometry basically govern the CAD system and it's limits.
But I digress.
Getting back to CAD.
Imprecise representations were necessary, again due to CPU and data sizes possible early on. People were making manufacturable models on Pentium 90 machines with megabytes of RAM, and today we are doing it on multi-core machines that can approach 5Ghz and have Gigabytes of RAM.
The geometry kernels are not multi-threaded, though a CAD system can employ them in multi-threaded fashion depending on how they manage model history. They can also process parts concurrently, again all depending on what is linked to and dependent on what else.
Non concurrent / parallel execute capable geometry kernels are one reason why single threaded CPU performance remains important, and is a fundamental CAD system limit on model complexity. This is why, for example, very large systems, say airplanes, rockets, cars, are done design in place style rather than as fully constraint driven assembly style.
Flat models are the easiest to perform concurrent processing on. Hierarchical models are the worst, and it's all down to the dependency trees and model history being largely sequential, forward create from one or a few roots.
There is another difference too, and this is one way systems like MAYA and Blender are different from say, NX, CATIA, Solidworks. It's all in the data representation precision. Making pictures requires a lot less than running a mill along a surface does, for example. Entertainment centric software typically uses a lighter data representation and or has considerably looser tolerances. These models imported into a higher precision system will basically end up non-manifold and will often require a lot of rework and or decision making to become manufacturable.
On that note, a manifold model is one where each edge is shared by two and only two surfaces. This rule evaluates down to a real world representation for the most part.
A surface with a free edge means the model has one or more surfaces that do not contribute to a closed volume. That surface is an infinitely thin representation that has two sides. Not possible in the physical world. Yes, we have 2D materials, but they do have a thickness at the atomic level, and of course have sides because of that.
An edge that defined more than two surface boundaries contains an infinitely sharp edge or point of contact between volumes and or other surfaces. Again, at the atomic level, there is always a radius and there are no sharp edges smaller than the atoms matter is composed of.
Neither of these cases are manufacturable due to the properties of matter. There must be radii present, or more surfaces to define a manifold model.
In addition, there are some simple differences to account for and these are often ignored in the modeling phase of design because they will end up present in the manufacturing stage, and if not critical, are not important to be represented in the model properly.
One is sharp edges. Think of a cube. The truth is a manufactured cube would have some radii present on all edges, and vertices are theoretical constructions that do not exist in the real model at all. Curved edges blend into one another, leaving the vertex in space near the model at the theoretical intersection point of the theoretical edges.
Often this is ignored because it's simply not all that important. In some cases, say injection molding, it can't be ignored and all the radii need to be present on the model to allow for machining to happen properly, unless somehow the model tool can be machined with a ball end mill or something that will yield the radii as an artifact of manufacturing.
That's the big one.
Another one we often ignore are reference constructs, like partitions that subdivide a volume. These may be used to delineate material boundaries, or regions of a model that get special consideration or are needed for inspections, or similar kinds of tasks. A partition itself is manifold, but either entirely or partially shares volume with another manifold construct,
We often ignore these too, though sometimes an operator has to sort through model data in various ways to find and isolate the primary model or apply boolean operations to the partitions to yield a useful manifold model.
The numerical precision comes into play big here. All those surfaces are defined. Their edges are defined again with curves that must lie "on" the surface, and mathematically they just don't, so some deviation is allowable, and that's the tolerance compensation strategy I wrote about early on this comment.
Say we are machining something small. Something that would fit into a cubic centimeter or inch. The system tolerance would need to be 0.00001x" inches for an inch representation. The machining itself would operate to 0.0001.
On some systems locating a model like this far away from the origin would see a loss in precision not present near the origin. Many CAD systems employ a coordinate system hierarchy so that all models can be created with local part origins near the model, or ideally positioned so features can be mirrored and or patterned about the primary datums present in the origin. Axis, planes, and the point of origin itself.
This allows for very large, but also very detailed models, such as an airplane. It also allows for concurrent design as each subsystem can be assigned a coordinate system and those teams can build relative to it confident when they remain within their design volume their work will fit nicely into the parent models and all that into the full model.
On the other extreme, say we are modeling the solar system. On systems with dynamic tolerancing, the precision might be on the order of feet or meters. On others, it's not possible to do at scale, and the model would need a scale factor to fit into the model area allowed by the overall system precision.
Finally, going back to a difference between say, MAYA and NX, there is the amount of data given to define a surface. An entertainment model might be light, and some data may not even be present as it can be inferred or is implied in some way.
Take a two dimensional surface that has some curvature. In one dimension there is 10X the data given. The surface will look fine on the screen and when rendered, but when a surface like this is imported into a system with manufacturing grade precision, some operations will not be possible, or must be made with a stated, much looser tolerance than otherwise used. And doing that may well make other operations, such as milling toolpaths be imprecise and or erroneous, as in digging into material stock rather than making a smooth or appropriately sized cut.
The solution for that, BTW is to section the surface in various ways. The section curves created along the dimension that does not have much data will be imprecise. Guesses essentially. That's OK, because the data really isn't there to begin with.
But, doing that does lock in the guess. Basically, one commits to that data to continue on and manufacture something to a spec that is manufacturable, despite the original authority data being incomplete.
Now a new surface can be created with more data in both dimensions. Replace the old one with the new one, and now all operations can happen, intersections, trims, section curves, and the like will function properly and at standard tolerance.
The same thing can be done with erroneous surfaces containing too much data, or said data having more noise that drives the surface complexity into having folds and other problem states. The basic solution is the same, section it and obtain curves.
Select the curves that make the most sense, toss the outliers, create new surfaces, replace, trim, stitch and the resulting manifold model will generally more robust for ongoing operations, be they modeling or manufacturing, simulation, whatever.
Early on, these techniques were manual. And they still are, and can be, depending on how severe the problem cases are. That said, many CAD systems have automated functions to deal with problem surfaces and they are effective a high percentage of the time.
I remember a class I had where we were shown a methods for exact PC arithmetic with integers, rationals (e.g., 1/2) and algebraic numbers (e.g., sqrt(2)). Only thing it couldn't deal with were transcendental numbers (e.g., pi)
I think it worked by representing the numbers by integer matrices, all operations were then matrix operations. Unfortunately, the matrices were gaining dimension when new algebraic numbers got involved, so it probably wasn't useful for anything :-).
Anyway, it it blew me away anyway at the time, one of the few things that sticked with me from school.
Yeah there are a few number systems that can do this kind of thing, ranging from rational implementations where the numerator and denominator are stored, all the way to the computable reals (https://en.wikipedia.org/wiki/Computable_number), which can handle even trancendentals exactly! Fun fact, the built-in calculator on android actually uses the computable reals (https://dl.acm.org/doi/abs/10.1145/3385412.3386037), so you can do any calculation on it, and then ask for any number of digits (keep scrolling the result), and it will always be correct. The only downside is that they're a little too slow to use in a lot of performance-sensitive numerical applications.
If anyone ever runs out of precision using IIR filters: convert to second-order section (SOS). I stumbled upon this trick a few years ago and it's a game changer. No need for FIR.
I found the same. I once computed a high-order (something like 100) low-pass Butterworth using SOS sections as a test and did not notice any instability in the filtered signal. The key was to directly derive the SOS coefficients using bilinear transformation.
Yeah, they also allude to "better ways to do it" both in the main text and the last paragraph:
> The next post will discuss better ways to implement higher precision in numerical calculations.
There are lots of automatic row / column swap techniques, particularly for sparse matrices that ensure you avoid catastrophic cancellation. You wouldn't do this automatically, because most matrices are reasonably conditioned (and figuring out the optimal swaps is NP complete and so on), but there are lots of algorithms to do well enough when you need to.
With current posits, they're a clever way to give more bits to precision near 1 and more bits to exponent near 0 and infinity, but if you're not overflowing or underflowing then it's a marginal improvement at best. Then Unum glues intervals on top, right? But you could do intervals with normal floats too. And you still won't get the correct answer, you'll just know the error exploded. Is the giant accumulator feature able to help here?
For the other versions I tend to get completely lost.
Unfortunately while Posits (and Unums) are more accurate for some computations, they are still designed around using small numbers of bits to represent numbers, so they have error in computations just like floating point. The argument for posits are mostly that they produce less error on the inputs you care about, and more on the ones you don't, but it really depends on your computation; you really have to know what you're doing to use them safely.
>> Numerically stable algorithms exist for such problems, but these occasionally fail leaving us to brute force the calculation with higher precision to minimize floating point rounding errors.
Do algorithm authors really implement two code paths?
One that uses a cost function iterative algorithm to solve the matrix (this is what everybody does), and a second algorithm using the brute force approach the author showed (known not to work).
>> Do algorithm authors really implement two code paths?
I do sometimes. Sometimes I test the result, and if it is not correct, then I use a slower, more numerically stable algorithm. I'm not sure if I have ever written code that repeats the same code with quad precision, but of course that would often work. Sometimes you can compute a correction from the "residual".
>>One that uses a cost function iterative algorithm to solve the matrix (this is what everybody does), and a second algorithm using the brute force approach the author showed (known not to work).
We do often use Jacobi or Gauss-Seidel, or something similar (esp for computing estimates of A^(-1)b ). In those cases we never revert back to QR. We do sometimes use Jacobi or GS to improve a result obtained from an ill conditioned QR, but rarely.
Many years ago, I would have the code raise an exception if the result is bad so that I would get a call. When I got such a call, I would think about how to make the algorithm more stable.
First thought: the curve in that first plot reflects the pseudospectrum? (https://en.wikipedia.org/wiki/Pseudospectrum) Spectra of nonnormal matrices can be very hard to calculate accurately.
The are algorithms taught in numerical analysis class for re-ordering calculations to maximize precision. Computing in order you wrote on paper or the way you were initially taught may not be the best.
I'll admit being lost in some of the math here but I wonder perhaps naively, if a language like Julia, which does fractions as fractions, would handle it better.
It's really just a floating point precision issue. Even if you represented the input matrix as a rational elementary type the output eigenvalues would be a subset of the reals and calculated as being floating point.
You could get around it if you had a special matrix type where you knew you were definitely going to get rational eigenvalues and you could dispatch on that type.
Thankfully someone figured out that quad should be in C++ a while back, unfortunately they didnt figure out that the next few dozen should be too... the arbitrary precision stuff is an alternative in theory, and would be if any of the hundreds of half completed projects to convert unintelligeble, and unmaintainable, unclear, etc, C to the in this specific case extremely much better C++ interfaces was complete.
This shows that you can do exact real arithmetic without worrying about round-off errors. But it's expensive.
If you want something more practical (but dodgy), you can use arbitrary-precision arithmetic instead of exact real arithmetic. To that end, there's the Decimal module in the Python standard lib. There's also the BC programming language which is part of POSIX. If you want to do matrix arithmetic, you can use mpmath for Python like they did in this blog post, or Eigen3 for C++ with an arbitrary-precision type. This is dodgy of course because the arithmetic remains inexact, albeit with smaller rounding errors than double-precision floats.
If you're playing around subnormal numbers [0], most modern processors automatically switch to a microcode based calculation mode to preserve these numbers. However, these algorithms are very expensive, so it's not desired unless explicitly needed and generally used as an escape hatch for edge cases.
i'd be curious what matlab would do with finding the eigs for op's matrix. i'd bet a coffee that it (one of the eigs functions) prints a warning. entirely possible it may even switch search methods if precision loss is detected.
People have got to stop suggesting this. It doesn't make any sense and it's usually a waste of time.
First of all, the example in the article is computing eigenvalues. Let me ask you: how do you expect to calculate eigenvalues using rational numbers? As a refresher, if it's been years since you took a look at linear algebra, the eigenvalues of a matrix are the roots of the matrix's characteristic polynomial, and we remember from high-school algebra that the roots of a polynomial are often not rational numbers, even if the coefficients are rational. Therefore, using rational arithmetic here is a complete non-starter. It's just not even possible, right from the very start.
However, let's suppose that the particular problem that we are interested in solving does admit a rational solution. What then? Well, our real-world experience tells us that we often need an excessive amount computational resources and memory for exact, rational answers to linear algebra problems even of modest size.
As you perform more computations on rational numbers, the memory required continues to grow.
Floating point numbers are a particular way to represent (and modestly compress) rational numbers. I'm against rational number libraries because they use non-power-of-2 denominators and are, thus, fundamentally less efficient with respect to the actual cost of calculation.
> Rational arithmetic satisfies (a+b)+c=a+(b+c); floating point arithmetic does not. Therefore floats are not a subset of rationals.
This reasoning is not sound. The conclusion is only correct by accident.
The reason floats are not a subset of the rationals is because floats distinguish positive and negative zero, and because floats represent non-finite values (NaN and infinity).
The fact that floating-point operations are not commutative is a consequence of the fact that floating-point numbers are not a "subring" of the rational numbers, not a consequence of the fact that the floating-point numbers are not a subset of the rationals. You could easily take a subset of the floating-point numbers that is also a subset of the rationals, it would still not be associative (assuming it contains at least two different elements).
Rational arithmetic on a computer does not satisfy this property, either, for the exact same reason floating point does not: there is a finite amount of memory. Floating point embraces this edge-case in a rigorous way that none of the rational arithmetic libraries I've used have ever even attempted.
But it is worth pointing out that it is actually possible to maintain "perfect" accuracy when doing arithmetic even when sqrts (or other irrational numbers) are involved.
The idea is to store numbers as an AST consisting of an expression. Then you can perform any arithmetic on them by growing the AST. After each operation you simplify as much as you can. Once you have your final result you can evaluate the AST up to any precision that you need (and sometimes your final AST will already be a rational number because sqrts cancelled out)
I believe you're basically referring to the computable reals; for the curious, here's the wikipedia page (kind of abstract): https://en.wikipedia.org/wiki/Computable_number and here's the paper where they implement them programmatically, including for the android calculator application (I believe this is how the android calculator works now): https://dl.acm.org/doi/abs/10.1145/3385412.3386037
I still like to occasionally use rational number arithmetic because I "know" that the answer is correct. It is very true that it can be horribly slow, but if it does finish, then you do have a correct answer. I have only rarely used it in production code. In those rare cases, I think that I would have checked to make sure that the running time was acceptable.
In this case, we know it would be incorrect (in general) because eigenvalues are often irrational.
In other cases, there are often better ways to figure out that the answer is correct. Figuring out that the answer is correct using rational arithmetic is often just too inefficient, compared to doing a bit of numerical analysis or figuring out a way to validate the answer you got. That is, unless the problem is very small.
Often you can bound the running time --- certainly if there are no loops and you only use + - * and / operations. But, you are often screwed if there is a sin, cos, sqrt or other function that gives irrational results. You can sometimes get around sqrt function or other root functions by using an extension field, but I think I could count on one hand how many times I've done that.
This is actually fewer bits than they're using in the article, but that's not clear because they refer to their numbers by how many "[decimal] digits", not how many bits. The hardware doubles they're starting with are 64-bits wide, representing 15.5 decimal digits, but then they move into 128- and 256-bit representations. 80-bit math, such as that available on the old x87 floating point co-processors, is I believe about 21 decimal digits.
I found that very unintuitive. Is it typical in numerical analysis to talk about the number of decimal digits represented by a binary floating point number? I don't see what is gained by changing base.
It's a bad idea, or okay as an informal thing! It is just using the result that Log10(2^53) is 15.95 to say that 53 bits has 15.95 digits of 'information.'
Or 128 bits. FORTRAN has a REAL*16 type (quadruple precision floating point number). I encountered this at university in a physics department that studied non- linear systems also known as chaos theory. Rounding errors can quickly become a problem in these systems.
Fortran (note the post-'77 spelling) doesn't. It has KINDs. REALx is a non-standard extension, and there's definitely no requirement to implement 128-bit floating point. I don't know what hardware other than POWER has hardware 128-bit FP (both IEEE and IBM format, with a somewhat painful recent change of the ppc64le ABI default in GCC etc.).
GCC also has libquadmath which provides 128-bit floats by sticking 2 64-bit floats together. More than twice as slow, but extra precision just by changing a type and linking the library.
Works great in XaoS, which is where I like my extra precision, but a deep fractal zoom can consume as much precision as you can muster. Imagine a 64kbit-precision Mandelbrot zoom!
The main use of libquadmath seems to be to support gfortran. GCC/glibc also support _Float128 and <func>f128, which are nominally part of C, but only GCC implements them as far as I know.
Nobody uses the x87 instructions any more, they've been superseded by MMX/SSE/SSE2 which can't do 80-bit operations. And it's not clear whether 80 bits would be enough of an improvement over 64 to fix the problem.
MMX isn't a successor to x87, though; it only happens to share its registers, because that allowed them to work without changing the OS. It does integer math only. SSE was the first Intel x86 vector instruction set to do single-precision floats (AMD had 3DNow!, but it was little used), but didn't really include much integer support for the newer, wider vectors (that came back only with SSE2).
D (dlang.org) has a "real" type alongside float and double which represents the widest float available on the target (so 80bits if x87 is available). It does see some use.
(But don't use it without a valid reason or your program is going to be slow because no SIMD)
> a "real" type alongside float and double which represents the widest float available on the target
Initial question: what would that be on a system without hardware floating point?
Luckily, https://dlang.org/spec/type.html says (not 100% unambiguously) it isn’t necessarily “the widest float available on the target“, but either that or, if that doesn’t have both the range and precision of double, double.
So, if you decide to use it, you’re guaranteed not to give up range or precision, may gain some, but also may give up performance.
Part of the problem is that 80 bit operations were never designed to be a visible data type, you were expected to round to a 64 bit type at the end of a series of operations. There was never a good way of indicating to your compiler where and when those transitions would happen.
Not so strange, as they were never intended to be used for compilers to begin with (you'd write assembler code by hand). The basic idea with having 80-bit internal precision is that you could load your 64-bit values from RAM, compute on them with so much extra precision you could basically ignore precision issues within your expressions, and then store the result back correctly rounded.
The example they give on their site is: