Hacker News new | past | comments | ask | show | jobs | submit login
Why floats suck - accurate galaxy-wide coordinates in 128 bits (facepunch.com)
205 points by peterhajas on May 2, 2011 | hide | past | web | favorite | 83 comments



I am an astrophysicist, and I deal with this issue directly. In my simulations, I do deeply nested Adaptive Mesh Refinement calculations. These simulations run inside an Eulerian grid, where each cell contains a fluid element. These cells are collected into grid patches. When a cell inside a given grid exceeds some criteria, we refine it by inserting 8 cells; the next level of cells are then collected and minimum bounding boxes are created that cover the flagged cells, and we compute now at an additional 'level' of refinement.

My research concerns the formation of the first stars in the universe. In order to set up the initial conditions of these objects correctly, we have to resolve fluctuations in the background density of the Universe on the scale of kiloparsecs. However, we are also mostly concerned about how the fluid behaves on much smaller scales, at the tens to hundreds to thousands of AU scale, all the way down to the radius of the sun or even lower.

In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42. For the most part, however, we only use about 30-33 levels of refinement. In order to attain this resolution, we utilize extended precision positioning. (Future versions of the code will utilize integer positioning with relative offsets for Lagrangian elements like particles.) Inserting this functionality into the code took quite a bit of debugging and testing, in particular because we had to account for differences in how Intel, PGI and GNU passed extended precision values back and forth between Fortran and C.

I have uploaded a very simple 2D zoomin movie of one of these simulations, which only had 30 levels of refinement, to Vimeo. It's busy converting as I write this, but it will be found here when it has completed: http://vimeo.com/23176123


Hey Matt! ;-)

I was thinking of another instance where this is noticeable: In a certain moving-mesh hydrodynamics code, the author was forced to use integers instead of doubles to perform intersection tests, because the "uneven" precision of the floating points can give the wrong answer to which side of a face a point is on, which leads to errors in the Voronoi tessellation.


the dynamic range between the coarsest cells and the finest cells was 2^42

How do you deal with the discretization errors that come up because of this range? Doesn't that overshadow the errors you get from precision issues? I am just curious - not trying to challenge you.


I chat with PHD students at the Swinburne University Centre for Astrophysics and Supercomputing from tiem to time and they have many similar stories to tell. There are pretty sexy problems to deal with as a programmer enough to have me considered abandoning industry and joining the ranks of the professional student.


As someone in the opposite position to you (academic coder, considering abandoning academia for industry), I offer up my take: the painful bits of scientific coding are infinitely more painful and numerous than the painful bits of non-scientific coding.

I'm kind of a masochistic in that I thoroughly enjoy debugging non-scientific codes (I do indie game dev in my off time), but debugging scientific code makes me want to spoon my eyeballs out one at a time. Debugging normal code is like a puzzle, with interesting hints as to what the problem is. Debugging scientific code is like trying to crack a hash table by hand.


What did you use to create that visualization? I'm running a Monte Carlo simulation of > 1 billion photons propagating through water, and I'd like to visualize it better.


I am an astrophysicist, and I deal with this issue directly. In my simulations, I do deeply nested Adaptive Mesh Refinement calculations. ...

...In one of my "stunt" runs, I ran up to 42 levels of refinement inside a box; this means that the dynamic range between the coarsest cells and the finest cells was 2^42.

Astrophysicist? Simulations? 42? Is this Slartibartfast?


This is an interesting article and worth a read. Then head over to Mike Cowlishaw's rant on decimal [1]. I met Mike at a conference and he's a brilliant guy, prior to seeing his decimal stuff I was using REXX on an Amiga (what Tcl might have been) which was language he invented while at IBM.

He got interested in decimal math early on but not because 'floats' such by binary floats really suck. That is a number system where .1 can't be accurately represented. How often do you use .1 in your calculations? He has a couple of examples of 34 digit precision which is representable in a 128 bit quantity. (3.7 bits per digit). Straight BCD would of course be 32 digits minus the digits which represent the exponent.

I wrote some CAD routines once that used decimal arithmetic rather than binary and was amazed at how much better it worked in terms of edges lining up and when you approximate non-square surfaces you get much better tesselation.

And of course for history buffs the IBM 1620 used decimal as a native representation for numbers. Now that we've got lots of transistors to play with, perhaps its a good time to revisit the way we represent numbers.

[1] http://speleotrove.com/decimal/


That seems downright stupid.

"How often do you use 1/3 in your calculations. Base 10 is a number system where 1/3 can't be accurately represented."

If you need exactness, use rationals. Don't just use a number system with more prime divisors. If you don't need exactness, use base 2.


"I wrote some CAD routines once that used decimal arithmetic rather than binary and was amazed at how much better it worked in terms of edges lining up and when you approximate non-square surfaces you get much better tesselation."

That's precisely what I'm working on. What did you use?

P.S. Use 0.125, not 0.100


I used the Java decNumber class (linked off Mike's web site) and the Java 2D API. Once the 3D API came out it seemed like the visualization code at least should use that. Converting to/from decimals to floats was unnecessarily painful at the time.


I'd like hardware that represents fractional parts in binary-coded base-15 -- no rounding on 1/3 as well as 1/10!


1/10 would be a repeating number in base-15, 0.1777777...

(Well, technically, 1/10 is not a repeating number in any base, but you know what I mean.)


Oops, you're right! I was misremembering something I had figured out once. It's base-30 that does multiples of 2, 10, and 3 (and 5 etc.). (And 30 is pretty close to a power of 2 so that not much space gets wasted if a BCD-like method is used where the underlying representation is base-2.)


Floats are great, you just need to understand when to use them!

If you want to use them for an application where you're doing frequent multiplications or divisions, I strongly suggest log transforming your domain and replacing your multiplies with adds and divisions with subtracts. This goes a long way to prevent numeric underflow when performing Viterbi parses without resorting to some godawful slow non-native numeric representation.

Haskell implements this as LogFloat (http://hackage.haskell.org/packages/archive/logfloat/0.8.2/d...), and I've cloned it for C++ as LogReal (http://www.bhickey.net/2011/03/logreal/ and http://github.com/bhickey/logreal)


Well his point is that floats suck for coordinates, because they have a high precision at the origin, then become less and less precise further off. If you want to simulate a galaxy (his example) you want the same precision everywhere, and a uniform non-floating point representation is better.


If you want to simulate a galaxy, the smallest scale that you can resolve is much much larger than size*2^(-128). Since you need floating point arithmetic for many other computations, increasing the precision in your coordinates will cost a lot of extra computing without any gain in accuracy.

When precision in coordinates becomes an issue, you make a transformation and use relative coordinates anyways.


100000ly / 2^64 = 51 meters and change. And your simulation is probably faster with appropriate int math than any float math will be.

Transforming to relative coordinates doesn't solve all the problems; you do your local calculation with the improved precision, but presumably at some point you then throw that back away again when you translate back to the global coords.


"And your simulation is probably faster with appropriate int math than any float math will be."

This isn't true for modern CPUs, especially if you use SIMD vector operations.


> When precision in coordinates becomes an issue, you make a transformation and use relative coordinates anyways.

This.

Celestia, for example uses 128 (64.64) fixed-point for positions.

The nasty case in games is when you're subtracting positions to get a relative result. It's okay to store all positions as fixed point, and then return a float for the difference operator.

In general, you're not simulating the whole large volume with the same fidelity. You might be simulating a certain volume around the camera.


My apologies, I misunderstood the meaning of "simulation". I was thinking of a calculation where you compute the interactions between objects and evolve their positions and momenta accordingly. Celestia does not do this of course, it uses some time series where the coefficients are computed in advance. For these purposes using fixed point and/or integer arithmetic instead of floating points could be entirely appropriate, I do not know enough to form a preference.


Why would you need floating point arithmetic for anything? You can do all mathematical operations with fixed point just as well.

And using relative coordinates everywhere might not be practical either. I don't understand why you are defending using floating point for everything so hard? There are cases in which it is not a good choice.


Why would you need floating point arithmetic for anything?

So let's say you have your 128-bit integer coordinates and now you want to do something useful with them. Like interpret them as a 3D vector and multiply by a typical transformation matrix.

Now you have all these intermediate terms inside the matrix gaining and losing precision. Rotation elements are going to have a range of -1 to 1. Translation elements have a similar range as your input coordinate system. Scaling elements have an almost arbitrary range.

You can't use a general purpose matrix library any more. You have to track down every little multiplication and addition and analyze it for precision loss. Many will need to be converted to least fixed point numbers.

You can do all mathematical operations with fixed point just as well.

Some things are much much better, some things are much much worse.

For something like rotation, you're multiplying by -1.0 to 1.0. You might use 64.64 fixed point numbers for this, but then you've wasted half the bits of precision you paid for.

Floating-point certainly has a lot of potential for surprising behavior. But it usually allows you to get reasonable accuracy from a naive combination of basic operations.


Understanding the precision of floats is a small pain, but fixed point math is much more painful.

With fixed point, you have to decide how many fraction bits you're going to have. If you have more than zero fractions bits, multiplies and divdes end up looking like this (assuming 16 fraction bits): result = (a * b + 1 << 15) >> 16 result = (a / b) << 16; // rounds to zero is a is close to b result = ((a << 4) / b) << 20; // not much range or precision on the result Of course, if your variables can't all be represented with the same number of fraction bits, you need to keep track of that and adjust your shifts for each calculation. You also need to do shifts when you add two numbers with different numbers of fraction bits. This stuff is all taken care of for you by floating point numbers.

With fixed point, if you mess up a calculation and it overflows, you get completely whacky results (wrap around) which will cause massive bugs. With floating point, you lose a bit or two of accuracy.

There are definitely places where fixed-point is needed. If you're on a microcontroller without an FPU, you'll have to learn your way around floating point. If you absolutely can't deal with the loss of precision you get with big floating point numbers, you'll have to use fixed point and make sure you understand the range of parameters that can go into each calculation and prevent overflow.

If floats work for you, just use them.


Better description of how this issue affects video games, here:

http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BA%...

It's funny, I was just thinking about this a couple of weeks ago for a simulation project.

Google Code search reveals that Celestia uses a 128b (64.64) position class to solve the problem.


As a programmer it's easier to insert a bug with floats than with integers, because IEEE 754 has various subtle stumbling blocks.

For example, remember to think about NaN and that x != x, if x is NaN. And did you know, that there are actually two kinds of NaN? Or did you know, that 1.0/0.0 == +infinity?


You say "1.0/0.0 == +infinity" like it should be surprising, but it makes perfect intuitive sense to me.

You assert that it's "easier to insert a bug with floats", but your first example (NaN) is one which programmers shouldn't be worrying about. The only case you'd want to check for NaN is in debug code (an assert).


> You say "1.0/0.0 == +infinity" like it should be surprising, but it makes perfect intuitive sense to me.

But it isn't correct. 1/0 is undefined.

1/x approaches infinity as x approaches 0 from the positive side. 1/x approaches negative infinity as x approaches 0 from the negative side. If we were to define 1/0 it has to be both positive and negative infinity at the same time.


The goal for floats isn't to resolve equations. It's to perform arithmetic. (1.0 / 0.0) has to be handled somehow.

From a mathematical point of view, handling it as "+infinity" has a 50/50 chance of being "correct". As floating-point math is inherently an approximation, this is a reasonable tradeoff.

But from a pragmatic point of view, you don't care about the result of "x * foo" if x is +infinity or -infinity. The result isn't usable.


1.0/0.0 has to be handled somehow.

Yes, by raising an error [1]. Or whistling innocently and pretending it's not my code while leaving the room.

handling it as "+infinity" has a 50/50 chance of being "correct"

It has zero chance of being correct because infinity can never be the result of any arithmetic computation. Infinity is not a number. Infinity only appears as the result of evaluating limits, an operation which is not arithmetic.

[1] http://en.wikipedia.org/wiki/SIGFPE


http://en.wikipedia.org/wiki/IEEE_754-1985

The spec allows the programmer to decide if the various degenerate cases in arithmetic will raise a signal, or result in a special value which propagates through the rest of the calcuations (INF, NaN, etc.).

Certainly there are times to prefer one over the other, but the facilities that were standardized back in the 80's are generally more than adequate for most uses.

Whether or not your platform C compiler lets you access these settings in a convenient and portable way is another question however...:-)


As long as everybody decides consciously whether to propagate garbage or raise flags, the world is safe.


My policy is to prefer to signal or throw an exception. It's not something that can be accidentally ignored by an unconscious programmer.


This is why IEEE floating point has -0.0, which represents sequences approaching 0 (underflow) from the negative side; 0.0 represents underflow from the positive side.


I agree that the parent comment is overly worried by issues that are well-known if you program with IEEE arithmetic.

But, about your second paragraph, I disagree.

It turns out NaNs are quite useful to hold un-determined values, unknown values, error values, etc. The NaN will propagate through intermediate operations and "contaminate" the result in the appropriate way.

If you use this property of NaNs, you can exploit IEEE conformance to do a lot of hard work for free.

This is used in image processing all the time. You have an image; it is projected onto a grid; some values were not sampled and are unknown. Fill them with NaNs. Then run whatever operations you wish on the image (scaling, convolution, morphological operations). The NaNs will propagate through to the result image and indicate unknown values in exactly the correct way, without having to do extra work.

For example, a convolution will widen a single NaN to take in the whole footprint of the convolution kernel. A scaling will just pass the NaN through to one pixel in the output.

There is no analog of this property of the NaN for integers.


Shouldn't a programmer care about 0.0/0.0 ?


Yes.

  //----------------------------------------------
  void
  MVec3::normalize()
  {
    // calculate length.
    float len = sqrtf( x*x + y*y + z*z );
    
    // verify length is not zero.
    assert( len != 0.0f );
    if ( len == 0.0f )
      len = 1.0f;

    // divide by length.
    len = (1.0f / len);
    x *= len;
    y *= len;
    z *= len;
  }


Thanks. However, normalization is a rather simple case and the solution you propose works as expected. I posit that most code performing floating point division doesn't check all corner cases correctly; those that do are codes written by people who worry about NaN and the correct type of Inf.


I've had fairly heated arguments about this in the past and I'm curious to know if anyone else holds the same opinion: I contend that a normalize function is a really bad place to put an assert. Chances are the normalize will be called frequently and the assert check, in debug mode, will have an impact on performance (this goes for assert checks inside other frequently called functions too). The cumulative effect is a debug build that becomes unusably slow relative to the optimized build without the asserts, which rather defeats the purpose of having it in the first place: if you can't use it to find problems, then what's the point? (I'm remembering this being the case more often than not when working on games for the Xbox 360 or PS3).


I've been bitten by not having asserts (and, conversely, saved by having them) that I don't hold to the philosophy of "don't use asserts out of performance considerations".

  assert( len != 0.0f );
If that alone is enough to slow down your program by any significant margin, then you're doing something wrong.

That said, I've experienced your standpoint --- A debug mode which was unusably slow. However, that results from programmers who rely too much on the optimizer. STL, for example, can perform very slowly in debug mode in certain circumstances.


It was recollection on my part; not backed by empirical evidence. For this exact case, though: wouldn't you better handle this with floating point exceptions?


Just curious (and only marginally, if at all, on topic):

- Why is that assert there?

- Why aren't that "len = (1.0f / len)" and the "= len" statements in an else clause? (are there CPUs where always doing the division and those 3 multiplications is faster? Will compilers optimize that, anyways? Are there IEEE subtleties that make x= 1.0f not a no-op?)

- Wouldn't it, on some platforms, be better to use double for intermediate results?


there's a better reason, per TomF's blog[1] linked below by JabavuAdams. --

"any time you do a subtract (or an add, implicitly), consider what happens when the two numbers are very close together, e.g. 1.0000011 - 1.0. The result of this is going to be roughly 0.0000011 of course, but the "roughly" is going to be pretty rough. In general you only get about six-and-a-bit decimal digits of precision from floats (2^23 is 8388608), so the problem is that 1.0000011 isn't very precise - it could be anywhere between 1.0000012 or 1.0000010. So the result of the subtraction is anywhere between 1.210^-6 and 1.010^-6. That's not very impressive precision, having the second digit be wrong! So you need to refactor your algorithms to fix this.

The most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation."

[1] http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BA%...


To make various computations, you generally need powers of a given quantity. It seems to me that when you take large powers for intermediate results, the precision in fixed point arithmetic will rapidly drop, no? Furthermore you need to put together various quantities with different scaling factors. IMHO, that would make things very error prone (or you can again sacrifice precision).

I am not aware of any physical problem that actually requires 128 bit floats. I can make up some, but those would be either chaotic (in which case precision does not help), or I would be telling you about a solution that looks like it can be improved. There are of course some problems for which we have no physical insight, then high precision provides a nice safety buffer.


I agree in general.

Precision does help to some degree in chaotic circumstances. For example, when forecasting the weather, you know that it's chaotic. But the chaos takes a while to show itself.


Floats are for math. Math includes computing things like 100! (100 factorial), the volume of the observable universe (~3.3* 10^80), or any other countably finite value, most of which are far greater than 2^128.

Fixed point is a special purpose encoding that is useful in many places (computer graphics and constrained simulations) when your range and precision are known and fixed. "Math" doesn't follow these rules.

Floats are good for math.


You have it exactly backwards. 100 factorial cannot be represented by a float (or double, or long double)

100! = 9332621544394415268169923885626670049071596826438162146859296389521759\ 9993229915608941463976156518286253697920827223758251185210916864000000\ 000000000000000000

This is 2^(524.765) > 2^32 or 2^64 or 2^80 or 2^128

Note that "floating point" is commonly used to refer to 32-bit or 64-bit or 80-bit representations.

Floats are for games (with caveats), and possibly engineering, while purer math needs arbitrary-precision representations that are too slow for games and many simulations.


Floats are primarily designed for engineering, and not just floats. Read the original von Neumann paper. Basically, out dominant computer architectur was originally designed to optimally support floating-point for numerical calculations.

EDIT: I just looked at the paper again and realize that I was wrong. Von Neumann actually wanted to keep things simple and thus assumed all numbers to be normalized to be between -1 and 1...


And I thought arbitrary precision integers and rational numbers were for mathematics.


Author has probably never tried to optimize an algorithm on SSE2 or higher.

Doubles are awesome and often used in science. 128 bit numbers aren't quite so awesome.

http://download.oracle.com/docs/cd/E19963-01/html/821-1608/e...


In the 40s, floating point numbers were the hot new thing in programing. John von Neumann argued against them, because anyone smart enough to use a computer could track the exponents in their head. Or so I've heard.

This is the classic trade of computer efficiency against human effort. The typical scientific program is run once, so it leaves as much work as possible to the machine. The efficient tradeoff might differ for games and such applications as AutoCAD or Splice.


Great post. A question which doesn't have much to do with the main point - don't his calculations result in a plane with 2d co-ordinates rather than three dimensional?

ie. with our galaxy, wouldn't you have to include the thickness to get any co-ord:

(100 000 light years) * (12000 ly) / (2^128) = 3.336e^-17


How can he fit three-dimensional coordinates into a single 128-bit number? Seeing how he divides diameter, I assume he only took care of one axis.

I'm curious how many bits do we need to address a point in an universe with reasonable accuracy.


We'd need three axes to define a coördinate in space, four to include time. Given his estimation of the known Universe as 9.3e10 light years across, giving each axis 2^96 would yield an accuracy of about 1.1 kilometers, plenty enough for space navigation. Multiply that by the number of axes.


yay! someone else who uses diæresis marks:)


You should get a subscription to the New Yorker.


> diæresis

You might want to see a doctor about that.


1.1 kilometers is not even close to being enough precision. That's way larger than your typical missile or a space ship. You can't deal with that kind of error and expect collision detection to work correctly.


It will never be the case, fortunately, that humanity will have to navigate a missile or spaceship across the observable universe (or the galaxy) with fixed point or otherwise.


The interesting thing is that, as you get further and further from Earth or the Sun or the centre of the Milky Way, the level of precision which is even useful becomes less and less. What is one light year left or right for a quasar 13,000,000,000 light years away? Floats, then, behave entirely appropriately in this situation.


He was indeed only specifying a single axis, just as you wouldn't use a single float to specify three axes.


The number of scalars to specify a point in space varies from one to twelve and probably more - depending on the terrain.

For example, to point at objects reliably you need to factor their movement relative to gravity vector in.

I can totally imagine using just one (long) scalar, too.


You specify the accuracy you need. Let's say, 1 millimeter.

(9.3e10 light years) / (0.001 meter) = 8.79 × 10^29

log2(8.79 × 10^29) = 99.471

So you need 100 bits to represent each coordinate at that precision.


Interesting, but actually it's no longer the case that fixed point math is faster than floating point. Certainly GPUs are a lot faster at floats than ints.


Tell that to Android developers who are writing simulation apps that need to work on ARMv6 phones, like the LG Optimus series. No FPU there.


Sorry to be pedantic but since the universe is constantly expanding, aren't such coordinates already inaccurate in only a short while after they are created (unless they are highly localized which likely defeats their purpose in the first place)?


I see 2 possible solutions:

1. The coordinates expand with the universe.

2. The coordinates are static and the universe just expands. Note that this isn't awful, planets and stars are always in motion. When planning inter-galactic missions you're always going to have to know where's the starting point and to calculate where the ending point will be at the relevant time.

So to say where Earth is you're going to write down a 4D vector, specifying at what time this was measured.

Btw, the article didn't mention 3D space did it?


This displays a fundamental lack of understanding of how floating point works. In particular, we are concerned about the accuracy of the floating point representation, which is characterized by the machine precision. Typically, single-precision has 8 digit accuracy, while double-precision has 16-digit accuracy.

Furthermore, we don't use 128-bit representations because they are slow for numerical computation, because we need more trips through the smaller-sized bus.

See http://en.wikipedia.org/wiki/Floating_point


No, this is a real problem.

Lack of precision of floating point is something that comes up often in game development, now that levels are larger.

If you have a level that is 1-2km across, and you store all your positions in floats, then you will notice that animations for instance are quantized.

Near the world origin, everything is fine, but far from the world origin individual vertices in the character and props will become quantized to the smallest representable delta. This results in atrocious geometric fizzing and popping.


Yeah. I've also tried a galactic simulation game and ran into the limitations of floats. 32 bit floats only give you a usable resolution of 1 in 100000 for global coords. Even doubles are too small. This becomes an issue if you want to zoom from local ship view out to the local planetary body. You end up having to do goofy things like make a scaled billboard out of the nearby planets and place them far enough away that nobody notices.


I think we can conclude that galactic simulation games are probably a special case for which 128 bit integer coordinates make sense.

Once you've subtracted out the offset of your spaceship though, everything you view from there is easier done with float or double.


Of course, in particular since in most cases, one must communicate with the GPU using 32 bit floats. (I'm going to guess that this is changing, though.)


Yes, the recent generation of Nvidia GPUs has added hardware support for doubles. This was done to be able to sell hardware for use in financial analysis rather than any use in graphics.


The Panda3D Python library uses double-precision floats, due to it's being a Python library. Currently, these are converted to 32 bit floats which are then sent to the GPU.


Ah, OK, I come from a different background. But this issue of different precisions at different magnitudes is also well-known, and is made clear when studying the definition of floating point. Others have mentioned the "obvious" solution of using relative coordinates, but apparently that has some issues also.


I think you're referring to bone animation. With bone animation, there are 4 "weights" per vertex. Each weight refers to a specific matrix in a "bone matrix palette" (array of matrices).

Typically, it's performed as:

  // start in local space.  Deform by bone matrices.
  float3 pos = 0;
  for ( int i = 0; i < 4; i++ )
    pos += vtx.boneWeight[i] * ( vtx.pos * boneMats[ vtx.boneIdx[i] ] );
  
  // transform from local space to post-projection space.
  return modelViewProjMatrix * float4( pos.xyz, 1.0 );
In other words, a model is deformed (animated) first, then transformed into worldspace.

------------------------------------------------------

You may have done this instead:

  // transform from local space into world space.
  float3 pos = modelMatrix * vtx.pos;

  // deform by bone matrices.
  for ( int i = 0; i < 4; i++ )
    pos += vtx.boneWeight[i] * ( vtx.pos * boneMats[ vtx.boneIdx[i] ] );

  // transform from world space into post-projection space.
  return viewProjMatrix * float4( pos.xyz, 1.0 );
This will exhibit the quantization errors you mentioned, and is correctable by re-ordering the operations.


Thanks for the detailed answer, but no, it's still a problem. I mentioned animation, but actually it would happen without animation.

The problem is subtracting two very large numbers. This happens when you calculate the camera-relative offset of a vert.

From the Tom Forsyth article I linked in another reply:

> The most obvious place this happens in games is when you're storing world coodinates in standard float32s, and two objects get a decent way from the origin. The first thing you do in rendering is to subtract the camera's position from each object's position, and then send that all the way down the rendering pipeline. The rest all works fine, because everything is relative to the camera, it's that first subtraction that is the problem. For example, getting only six decimal digits of precision, if you're 10km from the origin (London is easily over 20km across), you'll only get about 1cm accuracy. Which doesn't sound that bad in a static screenshot, but as soon as things start moving, you can easily see this horrible jerkiness and quantisation.

>By the way, some rendering code doesn't even do that subtraction. Canonically, you bake that subtraction into the offset of your "camera matrix" in your rendering code and let the GPU do it. This is madness - GPUs have even less precision than full IEEE754 floating-point, and you're doing a whole bunch of vector math before you finally do the subtraction. So your precision problems are going to be a lot worse. Best all round if you just subtract camera position from object position on the CPU before it gets anywhere near a GPU.


Thank you for the equally detailed response. I apologize.

Did you fix the problem? I don't know how I'd solve it.


In our case, we were doing a port of a PC game to PS2. We were chopping up the PC levels to try and fit them into memory (~16MB).

Unfortunately in chopping them up, they were not re-centered. So, even though one of these new smaller levels might not be large, if the whole thing was far from the origin, madness ensued.

El-cheapo solution: pre-process the level geometry to centre it again.

Not elegant, but expedient.


This article understands how floating point works.

People using floating point often have a fundamental lack of understanding how floating point works. Floating point numbers are a leaky abstraction that exist only for performance optimization. Using floats as a default is an example of premature optimization.


Floating point do not exist for performance optimization - they were much slower than fixed points at first (not anymore on general purpose CPU). The point of floating point is to have the same accuracy (number of meaningful digits) independently of its amplitude, but that means that their precision is indeed not uniform (near 1, 64 bits float has a precision of ~ 1e-16, near 100000, near 1e-11, etc...).

In that sense, saying that floating point suck shows a lack of knowledge about them, since the observed behavior by the OP is exactly what they have been designed for. If you want the same precision independently of the amplitude, then indeed fixed point may be what you want. But you have to keep in mind that it will give you a lot of other issues as well.


Also people writing floating point off because it does not fit well into situations where you have evenly divide something (e.g galaxy into picometer sized voxels) show fundamental lack of understanding how floating point works. One of the biggest reasons floats are used is that error propagation is much better when floats are used in many numerical algorithms.

In a nutshell. If you want coordinates in galaxy use fixed point (or in game level). If you want to calculate values of hyperbolic trigonometric functions or whatever use floats.


The default should be rationals or even algebraic numbers?




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: