My research concerns the formation of the first stars in the universe. In order to set up the initial conditions of these objects correctly, we have to resolve fluctuations in the background density of the Universe on the scale of kiloparsecs. However, we are also mostly concerned about how the fluid behaves on much smaller scales, at the tens to hundreds to thousands of AU scale, all the way down to the radius of the sun or even lower.
In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42. For the most part, however, we only use about 30-33 levels of refinement. In order to attain this resolution, we utilize extended precision positioning. (Future versions of the code will utilize integer positioning with relative offsets for Lagrangian elements like particles.) Inserting this functionality into the code took quite a bit of debugging and testing, in particular because we had to account for differences in how Intel, PGI and GNU passed extended precision values back and forth between Fortran and C.
I have uploaded a very simple 2D zoomin movie of one of these simulations, which only had 30 levels of refinement, to Vimeo. It's busy converting as I write this, but it will be found here when it has completed: http://vimeo.com/23176123
I was thinking of another instance where this is noticeable: In a certain moving-mesh hydrodynamics code, the author was forced to use integers instead of doubles to perform intersection tests, because the "uneven" precision of the floating points can give the wrong answer to which side of a face a point is on, which leads to errors in the Voronoi tessellation.
How do you deal with the discretization errors that come up because of this range? Doesn't that overshadow the errors you get from precision issues? I am just curious - not trying to challenge you.
I'm kind of a masochistic in that I thoroughly enjoy debugging non-scientific codes (I do indie game dev in my off time), but debugging scientific code makes me want to spoon my eyeballs out one at a time. Debugging normal code is like a puzzle, with interesting hints as to what the problem is. Debugging scientific code is like trying to crack a hash table by hand.
...In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42.
Astrophysicist? Simulations? 42? Is this Slartibartfast?
He got interested in decimal math early on but not because 'floats' such by binary floats really suck. That is a number system where .1 can't be accurately represented. How often do you use .1 in your calculations? He has a couple of examples of 34 digit precision which is representable in a 128 bit quantity. (3.7 bits per digit). Straight BCD would of course be 32 digits minus the digits which represent the exponent.
I wrote some CAD routines once that used decimal arithmetic rather than binary and was amazed at how much better it worked in terms of edges lining up and when you approximate non-square surfaces you get much better tesselation.
And of course for history buffs the IBM 1620 used decimal as a native representation for numbers. Now that we've got lots of transistors to play with, perhaps its a good time to revisit the way we represent numbers.
"How often do you use 1/3 in your calculations. Base 10 is a number system where 1/3 can't be accurately represented."
If you need exactness, use rationals. Don't just use a number system with more prime divisors. If you don't need exactness, use base 2.
That's precisely what I'm working on. What did you use?
P.S. Use 0.125, not 0.100
(Well, technically, 1/10 is not a repeating number in any base, but you know what I mean.)
If you want to use them for an application where you're doing frequent multiplications or divisions, I strongly suggest log transforming your domain and replacing your multiplies with adds and divisions with subtracts. This goes a long way to prevent numeric underflow when performing Viterbi parses without resorting to some godawful slow non-native numeric representation.
Haskell implements this as LogFloat (http://hackage.haskell.org/packages/archive/logfloat/0.8.2/d...), and I've cloned it for C++ as LogReal (http://www.bhickey.net/2011/03/logreal/ and http://github.com/bhickey/logreal)
When precision in coordinates becomes an issue, you make a transformation and use relative coordinates anyways.
Transforming to relative coordinates doesn't solve all the problems; you do your local calculation with the improved precision, but presumably at some point you then throw that back away again when you translate back to the global coords.
This isn't true for modern CPUs, especially if you use SIMD vector operations.
Celestia, for example uses 128 (64.64) fixed-point for positions.
The nasty case in games is when you're subtracting positions to get a relative result. It's okay to store all positions as fixed point, and then return a float for the difference operator.
In general, you're not simulating the whole large volume with the same fidelity. You might be simulating a certain volume around the camera.
And using relative coordinates everywhere might not be practical either. I don't understand why you are defending using floating point for everything so hard? There are cases in which it is not a good choice.
So let's say you have your 128-bit integer coordinates and now you want to do something useful with them. Like interpret them as a 3D vector and multiply by a typical transformation matrix.
Now you have all these intermediate terms inside the matrix gaining and losing precision. Rotation elements are going to have a range of -1 to 1. Translation elements have a similar range as your input coordinate system. Scaling elements have an almost arbitrary range.
You can't use a general purpose matrix library any more. You have to track down every little multiplication and addition and analyze it for precision loss. Many will need to be converted to least fixed point numbers.
You can do all mathematical operations with fixed point just as well.
Some things are much much better, some things are much much worse.
For something like rotation, you're multiplying by -1.0 to 1.0. You might use 64.64 fixed point numbers for this, but then you've wasted half the bits of precision you paid for.
Floating-point certainly has a lot of potential for surprising behavior. But it usually allows you to get reasonable accuracy from a naive combination of basic operations.
With fixed point, you have to decide how many fraction bits you're going to have. If you have more than zero fractions bits, multiplies and divdes end up looking like this (assuming 16 fraction bits):
result = (a * b + 1 << 15) >> 16
result = (a / b) << 16; // rounds to zero is a is close to b
result = ((a << 4) / b) << 20; // not much range or precision on the result
Of course, if your variables can't all be represented with the same number of fraction bits, you need to keep track of that and adjust your shifts for each calculation. You also need to do shifts when you add two numbers with different numbers of fraction bits. This stuff is all taken care of for you by floating point numbers.
With fixed point, if you mess up a calculation and it overflows, you get completely whacky results (wrap around) which will cause massive bugs. With floating point, you lose a bit or two of accuracy.
There are definitely places where fixed-point is needed. If you're on a microcontroller without an FPU, you'll have to learn your way around floating point. If you absolutely can't deal with the loss of precision you get with big floating point numbers, you'll have to use fixed point and make sure you understand the range of parameters that can go into each calculation and prevent overflow.
If floats work for you, just use them.
It's funny, I was just thinking about this a couple of weeks ago for a simulation project.
Google Code search reveals that Celestia uses a 128b (64.64) position class to solve the problem.
For example, remember to think about NaN and that x != x, if x is NaN. And did you know, that there are actually two kinds of NaN? Or did you know, that 1.0/0.0 == +infinity?
You assert that it's "easier to insert a bug with floats", but your first example (NaN) is one which programmers shouldn't be worrying about. The only case you'd want to check for NaN is in debug code (an assert).
But it isn't correct. 1/0 is undefined.
1/x approaches infinity as x approaches 0 from the positive side. 1/x approaches negative infinity as x approaches 0 from the negative side. If we were to define 1/0 it has to be both positive and negative infinity at the same time.
From a mathematical point of view, handling it as "+infinity" has a 50/50 chance of being "correct". As floating-point math is inherently an approximation, this is a reasonable tradeoff.
But from a pragmatic point of view, you don't care about the result of "x * foo" if x is +infinity or -infinity. The result isn't usable.
Yes, by raising an error . Or whistling innocently and pretending it's not my code while leaving the room.
handling it as "+infinity" has a 50/50 chance of being "correct"
It has zero chance of being correct because infinity can never be the result of any arithmetic computation. Infinity is not a number. Infinity only appears as the result of evaluating limits, an operation which is not arithmetic.
The spec allows the programmer to decide if the various degenerate cases in arithmetic will raise a signal, or result in a special value which propagates through the rest of the calcuations (INF, NaN, etc.).
Certainly there are times to prefer one over the other, but the facilities that were standardized back in the 80's are generally more than adequate for most uses.
Whether or not your platform C compiler lets you access these settings in a convenient and portable way is another question however...:-)
But, about your second paragraph, I disagree.
It turns out NaNs are quite useful to hold un-determined values, unknown values, error values, etc. The NaN will propagate through intermediate operations and "contaminate" the result in the appropriate way.
If you use this property of NaNs, you can exploit IEEE conformance to do a lot of hard work for free.
This is used in image processing all the time. You have an image; it is projected onto a grid; some values were not sampled and are unknown. Fill them with NaNs. Then run whatever operations you wish on the image (scaling, convolution, morphological operations). The NaNs will propagate through to the result image and indicate unknown values in exactly the correct way, without having to do extra work.
For example, a convolution will widen a single NaN to take in the whole footprint of the convolution kernel. A scaling will just pass the NaN through to one pixel in the output.
There is no analog of this property of the NaN for integers.
// calculate length.
float len = sqrtf( x*x + y*y + z*z );
// verify length is not zero.
assert( len != 0.0f );
if ( len == 0.0f )
len = 1.0f;
// divide by length.
len = (1.0f / len);
x *= len;
y *= len;
z *= len;
assert( len != 0.0f );
That said, I've experienced your standpoint --- A debug mode which was unusably slow. However, that results from programmers who rely too much on the optimizer. STL, for example, can perform very slowly in debug mode in certain circumstances.
- Why is that assert there?
- Why aren't that "len = (1.0f / len)" and the "= len" statements in an else clause? (are there CPUs where always doing the division and those 3 multiplications is faster? Will compilers optimize that, anyways? Are there IEEE subtleties that make x= 1.0f not a no-op?)
- Wouldn't it, on some platforms, be better to use double for intermediate results?
"any time you do a subtract (or an add, implicitly), consider what happens when the two numbers are very close together, e.g. 1.0000011 - 1.0. The result of this is going to be roughly 0.0000011 of course, but the "roughly" is going to be pretty rough. In general you only get about six-and-a-bit decimal digits of precision from floats (2^23 is 8388608), so the problem is that 1.0000011 isn't very precise - it could be anywhere between 1.0000012 or 1.0000010. So the result of the subtraction is anywhere between 1.210^-6 and 1.010^-6. That's not very impressive precision, having the second digit be wrong! So you need to refactor your algorithms to fix this.
The most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation."
I am not aware of any physical problem that actually requires 128 bit floats. I can make up some, but those would be either chaotic (in which case precision does not help), or I would be telling you about a solution that looks like it can be improved. There are of course some problems for which we have no physical insight, then high precision provides a nice safety buffer.
Precision does help to some degree in chaotic circumstances. For example, when forecasting the weather, you know that it's chaotic. But the chaos takes a while to show itself.
Fixed point is a special purpose encoding that is useful in many places (computer graphics and constrained simulations) when your range and precision are known and fixed. "Math" doesn't follow these rules.
Floats are good for math.
100! = 9332621544394415268169923885626670049071596826438162146859296389521759\
This is 2^(524.765) > 2^32 or 2^64 or 2^80 or 2^128
Note that "floating point" is commonly used to refer to 32-bit or 64-bit or 80-bit representations.
Floats are for games (with caveats), and possibly engineering, while purer math needs arbitrary-precision representations that are too slow for games and many simulations.
EDIT: I just looked at the paper again and realize that I was wrong. Von Neumann actually wanted to keep things simple and thus assumed all numbers to be normalized to be between -1 and 1...
Doubles are awesome and often used in science. 128 bit numbers aren't quite so awesome.
This is the classic trade of computer efficiency against human effort. The typical scientific program is run once, so it leaves as much work as possible to the machine. The efficient tradeoff might differ for games and such applications as AutoCAD or Splice.
ie. with our galaxy, wouldn't you have to include the thickness to get any co-ord:
(100 000 light years) * (12000 ly) / (2^128) = 3.336e^-17
I'm curious how many bits do we need to address a point in an universe with reasonable accuracy.
You might want to see a doctor about that.
For example, to point at objects reliably you need to factor their movement relative to gravity vector in.
I can totally imagine using just one (long) scalar, too.
(9.3e10 light years) / (0.001 meter) = 8.79 × 10^29
log2(8.79 × 10^29) = 99.471
So you need 100 bits to represent each coordinate at that precision.
1. The coordinates expand with the universe.
2. The coordinates are static and the universe just expands. Note that this isn't awful, planets and stars are always in motion. When planning inter-galactic missions you're always going to have to know where's the starting point and to calculate where the ending point will be at the relevant time.
So to say where Earth is you're going to write down a 4D vector, specifying at what time this was measured.
Btw, the article didn't mention 3D space did it?
Furthermore, we don't use 128-bit representations because they are slow for numerical computation, because we need more trips through the smaller-sized bus.
Lack of precision of floating point is something that comes up often in game development, now that levels are larger.
If you have a level that is 1-2km across, and you store all your positions in floats, then you will notice that animations for instance are quantized.
Near the world origin, everything is fine, but far from the world origin individual vertices in the character and props will become quantized to the smallest representable delta.
This results in atrocious geometric fizzing and popping.
Once you've subtracted out the offset of your spaceship though, everything you view from there is easier done with float or double.
Typically, it's performed as:
// start in local space. Deform by bone matrices.
float3 pos = 0;
for ( int i = 0; i < 4; i++ )
pos += vtx.boneWeight[i] * ( vtx.pos * boneMats[ vtx.boneIdx[i] ] );
// transform from local space to post-projection space.
return modelViewProjMatrix * float4( pos.xyz, 1.0 );
You may have done this instead:
// transform from local space into world space.
float3 pos = modelMatrix * vtx.pos;
// deform by bone matrices.
for ( int i = 0; i < 4; i++ )
pos += vtx.boneWeight[i] * ( vtx.pos * boneMats[ vtx.boneIdx[i] ] );
// transform from world space into post-projection space.
return viewProjMatrix * float4( pos.xyz, 1.0 );
The problem is subtracting two very large numbers. This happens when you calculate the camera-relative offset of a vert.
From the Tom Forsyth article I linked in another reply:
> The most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation.
>By the way, some rendering code doesn't even do that subtraction. Canonically, you bake that subtraction into the offset of your "camera matrix" in your rendering code and let the GPU do it. This is madness - GPUs have even less precision than full IEEE754 floating-point, and you're doing a whole bunch of vector math before you finally do the subtraction. So your precision problems are going to be a lot worse. Best all round if you just subtract camera position from object position on the CPU before it gets anywhere near a GPU.
Did you fix the problem? I don't know how I'd solve it.
Unfortunately in chopping them up, they were not re-centered. So, even though one of these new smaller levels might not be large, if the whole thing was far from the origin, madness ensued.
El-cheapo solution: pre-process the level geometry to centre it again.
Not elegant, but expedient.
People using floating point often have a fundamental lack of understanding how floating point works. Floating point numbers are a leaky abstraction that exist only for performance optimization. Using floats as a default is an example of premature optimization.
In that sense, saying that floating point suck shows a lack of knowledge about them, since the observed behavior by the OP is exactly what they have been designed for. If you want the same precision independently of the amplitude, then indeed fixed point may be what you want. But you have to keep in mind that it will give you a lot of other issues as well.
In a nutshell. If you want coordinates in galaxy use fixed point (or in game level). If you want to calculate values of hyperbolic trigonometric functions or whatever use floats.