If you’re wondering why there’s no demo graphic, it’s because this idea is meant to produce correct results (in a bug-robust way), not anything different.
One thing that can go wrong in 3d graphics is z-fighting, where a scene has rendering artifacts because two objects are touching or intersect each other, and the triangles that are close to each other look bad when their z values (distance from the camera) hit machine tolerance differences. Basically the rendering has errors due to floating point limitations.
The post is pointing out that the float32 format represents many, many more values near 0 than near 1. Over 99% of all values represented in [0,1] are in [0,0.5]. And when you standardize the distances, it’s common to map the farthest distance you want to 1 and the nearest to 0. But this severely restricts your effective distances for non-close objects, enough that z-fighting becomes possible. If you switch the mapping so that z=0 represents the farthest distance, then you get many, many more effective distance values for most of your rendered view distance because of the way float32 represents [0,1].
I wasn't aware that logarithmic depth buffer could be implemented in WebGL since it lacks glClipControl(). It's cool that someone found a way to do it eventually (apparently by writing to gl_FragDepth).
If they do that, this disables early Z rejection performance optimization implemented in most GPUs. For some scenes, the performance cost of that can be huge. When rendering opaque objects in front-to-back order, early Z rejection sometimes saves many millions of pixel shader calls per frame.
And not to mention, floating point numbers are already roughly logarithmically distributed. Logarithmic distributions are most important in large differing orders of magnitude, so having the piecewise-linear approximation of logarithm is good enough for proximity buffers.
Indeed, logarithmic depth is pretty much useless on modern hardware, but it wasn’t always the case.
On Windows, the support for DXGI_FORMAT_D32_FLOAT is required on feature level 10.0 and newer, but missing (not even optional) on feature level 9.3 and older. Before Windows Vista and Direct3D 10.0 GPUs, people used depth formats like D16_UNORM or D24_UNORM_S8_UINT i.e. 16-24 bits integers. Logarithmic Z made a lot of sense with these integer depth formats.
I wonder is it possible to implement logarithmic depth in the vertex shader, as opposed to pixel shader? After gl_Position is computed, adjust the vector to apply the logarithm, preserving `xy/w` to keep the 2D screen-space position.
To be clear, I have never tried that and it could be issues with that approach, especially with large triangles. I’m not sure this gonna work, but it might.
I haven't really studied this either so I could be mistaken, but I think it's because OpenGL does the perspective divide between the vertex and fragment shaders, going from clip space to NDC space (which is in the [-1,1] interval and can only be changed by glClipControl()).
The value in gl_FragDepth, if written to, is final, but gl_Position is not and will go through the clip/NDC transformation and since the trick to get the extra precision is to put the depth range into [0,1] instead of [-1,1], this would fail.
So my guess is, it probably wouldn't work on WebGL/OpenGLES without also setting gl_FragDepth, which is, as you mentioned, impractical performance-wise.
Thanks - this is much clearer than the original article, which somehow never actually defines "reverse Z" before going on and on about it.
Why is it that most things you render are far away rather than close? I guess I'd expect the opposite, at least depending on what you're rendering. And I'd also assume you'd want better visual fidelity for closer things, so I find this a little counter-intuitive.
Are there certain cases where you'd want "regular Z" rather than "reverse Z"? Or cases where you'd want to limit your Z to [0, 0.5] instead of [0, 1]?
"Close" here means "millimeters away". The logarithmic nature of the depth buffer means that even your player character is considered far enough to be already at ~0.9 in a "regular Z" case (depending on where you put your near plane, of course). Go stare at a few depth buffers and you'll get a good taste of it.
I guess…but in terms of what you see on screen most of what you see is going to be dominated by closer things, is what I’m getting at. The world is very big but right now most of what I see is relatively speaking close to me. I don’t care much about the precise positions of stuff in China right now, but as for the stuff in my apartment in Canada it’s very important for my sense of reality that the positioning is quite exact - even though there’s a lot more stuff in China.
> The reason for this bunching up of values near 1.0 is down to the non linear perspective divide.
The function in question is nonlinear. So using "normal" z, values in the range [0, 0.5] are going to be very, very close to the camera. The vast majority of things aren't going to be that close to the camera. Most typical distances are going to be in the [0.5, 1] range.
Hence, reversing that gives you more precision where it matters.
You still need precision even for things that are far away. You don't necessarily care that the building your character is looking at a mile away is exactly in the right spot, but imprecision causing Z-fighting will be very noticeable as the windows constantly flicker back and forth in front of the facade.
If you're standing outside holding a phone up to your face, you have 2 things close (hand/phone) and everything else (trees, buildings, cars, furniture, animals, etc) around you is not close.
The demo graphic you need is not of the rendered scene, but the behind the scenes. Like the diagrams in the old Foley and Van Dam text. A visual of the perspective-transformed scene, but from a different angle, where you can see the Z coordinates.
Yes. The floats in [0,1) come in blocks: [1/2, 1), [1/4, 1/2), [1/8, 1/4), etc. There are 2^23 floats in each block and there are 127 blocks. The reason there are so many floats in [0, 0.5] is only one block, [1/2, 1), is outside that range. If you exclude the denormals (which have a special block that stretches to 0, [0, 1/2^126)), you still just excluded a single block.
Yes. Subnormals start at <1.0 * 2^(-126). Between [1.0 * 2^(-126), 0.5] there are 125x more distinct float values than there are in the range [0.5, 1].
I couldn't quite connect all the dots as to why this is better. Something I do understand is summing numbers close to zero before numbers of larger magnitude away from zero produces a more accurate sum because floating-point addition isn't truly associative.
My best guess is that this reverse-z convention keeps numbers at the same scale more often. I think it's important about the same scale rather than near zero because the relative precision is the same at every scale (single precision float: 24 bits stored in 23 bits with implied leading 1 bit). If the article is trying to say that numbers near zero have more relative precision because of denormalized representation of FP numbers it should call that out explicitly. Also the advantage of similar scales is for addition/subtraction there should be no advantage for multiplication/division AFAIK.
I'm not a graphics expert, but reading the article and the long roundabout way they got back to "we're just working with the 23 bit mantissa because we've clamped down our range" makes me wonder why you couldn't just do fixed point math with the full 32 bit range instead?
The thing to understand is that the perspective transform in homogeneous coordinates doesn't just shrink the X and Y coordinates based on Z distance, but it also shrinks the Z coordinate too!
The farther some objects are from the viewer, the closer together they are bunched in Z space.
Basically, the points out at infinity where parallel lines appear to converge under perspective are translated to a finite Z distance. So all of infinity is squeezed into that range.
Why is Z also transformed? In order to keep the transformation affine: to keep the correctness of straight lines under the perspective transformation.
If you have an arbitrarily rotated straight line in a perspective view, and you just divide the X and Y coordinates along that line by Z, the line will look just fine in the perspective view, because the viewer does not see the Z coordinate. But in fact in the perspective-transformed space (which is 3 dimensional), it will not be a straight line.
Now that is fine for wireframe or even certain polygon graphics, in which we can regard the perspective as a projection from 3D to 2D and basically drop the Z coordinate entirely after dividing X and Y by it.
An example of where it breaks down is when you're using a Z-buffer for the visibility calculations.
If you have two polygons that intersect, and you don't handle the Z coordinate in the perspective transform, they will not correctly intersect: the line at which they intersect, calculated by your Z buffer, will not be the right one, and that will be particularly obvious under movement, appearing as a weird artifact where that intersecting line shifts around.
You won't see this when you have polygons that only join at their vertices to form surfaces. The 2D transform is good enough: the endpoints of a shared edge are correctly transformed from 3D into 2D taking into account perspective, and we just render a straight line between them in 2D and all is good. In the intersecting case, there are interpolated points involved which are not represented as vertices and have not been individually transformed. That interpolation then goes wrong.
Another area where this is important, I think, is correctness of texture maps. If it is wrong, then a rotating polygon with a texture map will have creeping artifacts, where the pixels of the texture don't stick to the same spot on the polygon, except near the vertices.
Okay but now all your values are bunched up next to the far plane, between 990 and 1000? Using this kind of mapping seems inherently flawed no matter what range you pick.
Those graphs are inaccurate. They only show a few levels of exponent for the floating point.
There is a benefit, but it's overstated by those graphs.
The math is simple enough. When you calculate 1/990-ish, it comes out to about 0.001. Everything below about 0.001 is between 990 and the far plane.
Now, of the floating point values between 0 and 1, what percent of them are below 0.001? It's 91.4%. Only 9% of those values are being used.
And only 1/4 of floating point is between 0 and 1 to start with.
Floating point is just not the right way to store these values. It makes sense to have denser values near 0, but the exponential nature of floating point does that to an extremely overkill extent.
Articles about graphics and rendering without a single image should be a misdemeanor. I know it's not explicitly about any particular graphic and instead about performance, but I still think it's missing that extra pizaz
I disagree. Anyone with a basic understanding of how perspective projection matrices work will understand the article, especially with the code samples and equations. (Edit: and anybody without this understanding won’t be interested in the article anyway).
I thought it was great, and had a full-on “smack my forehead” moment as to why I had never realized something so simple and effective, after 20 years of programming 3D graphics!
oh, my. Back in the day, when learning graphics from books, there were some with equations and matrixes only. Imagine that! How did we learn at all in such savage environment?!
TLDR - floating point format has a great deal of precision near 0. Normally depth maps range from -1 to 1, but if you change that to 0 to 1 (and flip it) you can utilize this extra precision.
One thing that can go wrong in 3d graphics is z-fighting, where a scene has rendering artifacts because two objects are touching or intersect each other, and the triangles that are close to each other look bad when their z values (distance from the camera) hit machine tolerance differences. Basically the rendering has errors due to floating point limitations.
The post is pointing out that the float32 format represents many, many more values near 0 than near 1. Over 99% of all values represented in [0,1] are in [0,0.5]. And when you standardize the distances, it’s common to map the farthest distance you want to 1 and the nearest to 0. But this severely restricts your effective distances for non-close objects, enough that z-fighting becomes possible. If you switch the mapping so that z=0 represents the farthest distance, then you get many, many more effective distance values for most of your rendered view distance because of the way float32 represents [0,1].