Hacker News new | past | comments | ask | show | jobs | submit login
Reverse Z in 3D graphics (and why it's so awesome) (tomhultonharrop.com)
90 points by fanf2 3 months ago | hide | past | favorite | 46 comments



If you’re wondering why there’s no demo graphic, it’s because this idea is meant to produce correct results (in a bug-robust way), not anything different.

One thing that can go wrong in 3d graphics is z-fighting, where a scene has rendering artifacts because two objects are touching or intersect each other, and the triangles that are close to each other look bad when their z values (distance from the camera) hit machine tolerance differences. Basically the rendering has errors due to floating point limitations.

The post is pointing out that the float32 format represents many, many more values near 0 than near 1. Over 99% of all values represented in [0,1] are in [0,0.5]. And when you standardize the distances, it’s common to map the farthest distance you want to 1 and the nearest to 0. But this severely restricts your effective distances for non-close objects, enough that z-fighting becomes possible. If you switch the mapping so that z=0 represents the farthest distance, then you get many, many more effective distance values for most of your rendered view distance because of the way float32 represents [0,1].


There is also a technique called logarithmic depth buffer (which should be self-explanatory): https://threejs.org/examples/?q=dept#webgl_camera_logarithmi...


That's a pretty stunning visualization too


I wasn't aware that logarithmic depth buffer could be implemented in WebGL since it lacks glClipControl(). It's cool that someone found a way to do it eventually (apparently by writing to gl_FragDepth).


> apparently by writing to gl_FragDepth

If they do that, this disables early Z rejection performance optimization implemented in most GPUs. For some scenes, the performance cost of that can be huge. When rendering opaque objects in front-to-back order, early Z rejection sometimes saves many millions of pixel shader calls per frame.


And not to mention, floating point numbers are already roughly logarithmically distributed. Logarithmic distributions are most important in large differing orders of magnitude, so having the piecewise-linear approximation of logarithm is good enough for proximity buffers.


Indeed, logarithmic depth is pretty much useless on modern hardware, but it wasn’t always the case.

On Windows, the support for DXGI_FORMAT_D32_FLOAT is required on feature level 10.0 and newer, but missing (not even optional) on feature level 9.3 and older. Before Windows Vista and Direct3D 10.0 GPUs, people used depth formats like D16_UNORM or D24_UNORM_S8_UINT i.e. 16-24 bits integers. Logarithmic Z made a lot of sense with these integer depth formats.


Yeah, I agree, but I guess it's fine for a demo, which otherwise would not have been possible.


> which otherwise would not have been possible

I wonder is it possible to implement logarithmic depth in the vertex shader, as opposed to pixel shader? After gl_Position is computed, adjust the vector to apply the logarithm, preserving `xy/w` to keep the 2D screen-space position.

To be clear, I have never tried that and it could be issues with that approach, especially with large triangles. I’m not sure this gonna work, but it might.


I haven't really studied this either so I could be mistaken, but I think it's because OpenGL does the perspective divide between the vertex and fragment shaders, going from clip space to NDC space (which is in the [-1,1] interval and can only be changed by glClipControl()).

The value in gl_FragDepth, if written to, is final, but gl_Position is not and will go through the clip/NDC transformation and since the trick to get the extra precision is to put the depth range into [0,1] instead of [-1,1], this would fail.

So my guess is, it probably wouldn't work on WebGL/OpenGLES without also setting gl_FragDepth, which is, as you mentioned, impractical performance-wise.


At least on my GPU this is extremely slow compared to disabling it.


I'm getting obvious Z-fighting issues on that.


Thanks - this is much clearer than the original article, which somehow never actually defines "reverse Z" before going on and on about it.

Why is it that most things you render are far away rather than close? I guess I'd expect the opposite, at least depending on what you're rendering. And I'd also assume you'd want better visual fidelity for closer things, so I find this a little counter-intuitive.

Are there certain cases where you'd want "regular Z" rather than "reverse Z"? Or cases where you'd want to limit your Z to [0, 0.5] instead of [0, 1]?


"Close" here means "millimeters away". The logarithmic nature of the depth buffer means that even your player character is considered far enough to be already at ~0.9 in a "regular Z" case (depending on where you put your near plane, of course). Go stare at a few depth buffers and you'll get a good taste of it.


Why is it that most things you render are far away rather than close?

Because most things are far away rather than close...?


I guess…but in terms of what you see on screen most of what you see is going to be dominated by closer things, is what I’m getting at. The world is very big but right now most of what I see is relatively speaking close to me. I don’t care much about the precise positions of stuff in China right now, but as for the stuff in my apartment in Canada it’s very important for my sense of reality that the positioning is quite exact - even though there’s a lot more stuff in China.


The reason for this is in the article:

> The reason for this bunching up of values near 1.0 is down to the non linear perspective divide.

The function in question is nonlinear. So using "normal" z, values in the range [0, 0.5] are going to be very, very close to the camera. The vast majority of things aren't going to be that close to the camera. Most typical distances are going to be in the [0.5, 1] range.

Hence, reversing that gives you more precision where it matters.


You still need precision even for things that are far away. You don't necessarily care that the building your character is looking at a mile away is exactly in the right spot, but imprecision causing Z-fighting will be very noticeable as the windows constantly flicker back and forth in front of the facade.


If you're standing outside holding a phone up to your face, you have 2 things close (hand/phone) and everything else (trees, buildings, cars, furniture, animals, etc) around you is not close.


Here's a demo graphic for reversed Z in WebGPU; requires a capable browser.

https://webgpu.github.io/webgpu-samples/?sample=reversedZ


Then you could show some example scenes that render wrong the "standard" way but are fixed by the reverse z idea.


The demo graphic you need is not of the rendered scene, but the behind the scenes. Like the diagrams in the old Foley and Van Dam text. A visual of the perspective-transformed scene, but from a different angle, where you can see the Z coordinates.


> Over 99% of all values represented in [0,1] are in [0,0.5].

Is that true if you exclude denormals?


Yes. The floats in [0,1) come in blocks: [1/2, 1), [1/4, 1/2), [1/8, 1/4), etc. There are 2^23 floats in each block and there are 127 blocks. The reason there are so many floats in [0, 0.5] is only one block, [1/2, 1), is outside that range. If you exclude the denormals (which have a special block that stretches to 0, [0, 1/2^126)), you still just excluded a single block.


Ahhh that makes sense. That’s a much more clear explanation. Thanks!


Yes. Subnormals start at <1.0 * 2^(-126). Between [1.0 * 2^(-126), 0.5] there are 125x more distinct float values than there are in the range [0.5, 1].


I use reverse-z when I can, works great. Totally solves all z-fighting problems.


I couldn't quite connect all the dots as to why this is better. Something I do understand is summing numbers close to zero before numbers of larger magnitude away from zero produces a more accurate sum because floating-point addition isn't truly associative.

My best guess is that this reverse-z convention keeps numbers at the same scale more often. I think it's important about the same scale rather than near zero because the relative precision is the same at every scale (single precision float: 24 bits stored in 23 bits with implied leading 1 bit). If the article is trying to say that numbers near zero have more relative precision because of denormalized representation of FP numbers it should call that out explicitly. Also the advantage of similar scales is for addition/subtraction there should be no advantage for multiplication/division AFAIK.



This article is very good


This is the key paragraph: "The reason for this bunching up of values near 1.0 is down to the non linear perspective divide..."


That makes sense: many values close to 1.0 then transforming to represent that as 0.0 provides additional precision.


I'm not a graphics expert, but reading the article and the long roundabout way they got back to "we're just working with the 23 bit mantissa because we've clamped down our range" makes me wonder why you couldn't just do fixed point math with the full 32 bit range instead?


Modern GPUs aren't optimized for it. You can indeed do this in software, but this is the path hardware acceleration took.


It’s actually quite common to use a 24bit fixed point format for the depth buffer, leaving 8 bits for the stencil buffer.

GPUs do a lot of fixed point to float conversions for the different texture and vertex formats since memory bandwidth is more expensive than compute.


Plenty of GPUs do have special (lossless) depth compression; that's what the 24-bit depth targets are about.


The thing to understand is that the perspective transform in homogeneous coordinates doesn't just shrink the X and Y coordinates based on Z distance, but it also shrinks the Z coordinate too!

The farther some objects are from the viewer, the closer together they are bunched in Z space.

Basically, the points out at infinity where parallel lines appear to converge under perspective are translated to a finite Z distance. So all of infinity is squeezed into that range.

Why is Z also transformed? In order to keep the transformation affine: to keep the correctness of straight lines under the perspective transformation.

If you have an arbitrarily rotated straight line in a perspective view, and you just divide the X and Y coordinates along that line by Z, the line will look just fine in the perspective view, because the viewer does not see the Z coordinate. But in fact in the perspective-transformed space (which is 3 dimensional), it will not be a straight line.

Now that is fine for wireframe or even certain polygon graphics, in which we can regard the perspective as a projection from 3D to 2D and basically drop the Z coordinate entirely after dividing X and Y by it.

An example of where it breaks down is when you're using a Z-buffer for the visibility calculations.

If you have two polygons that intersect, and you don't handle the Z coordinate in the perspective transform, they will not correctly intersect: the line at which they intersect, calculated by your Z buffer, will not be the right one, and that will be particularly obvious under movement, appearing as a weird artifact where that intersecting line shifts around.

You won't see this when you have polygons that only join at their vertices to form surfaces. The 2D transform is good enough: the endpoints of a shared edge are correctly transformed from 3D into 2D taking into account perspective, and we just render a straight line between them in 2D and all is good. In the intersecting case, there are interpolated points involved which are not represented as vertices and have not been individually transformed. That interpolation then goes wrong.

Another area where this is important, I think, is correctness of texture maps. If it is wrong, then a rotating polygon with a texture map will have creeping artifacts, where the pixels of the texture don't stick to the same spot on the polygon, except near the vertices.


Okay but now all your values are bunched up next to the far plane, between 990 and 1000? Using this kind of mapping seems inherently flawed no matter what range you pick.


No, because depth values are in screen space (1/z), not world space (like 990-1000). Here are some graphs that make the benefits obvious: https://developer.nvidia.com/blog/visualizing-depth-precisio...


Those graphs are inaccurate. They only show a few levels of exponent for the floating point.

There is a benefit, but it's overstated by those graphs.

The math is simple enough. When you calculate 1/990-ish, it comes out to about 0.001. Everything below about 0.001 is between 990 and the far plane.

Now, of the floating point values between 0 and 1, what percent of them are below 0.001? It's 91.4%. Only 9% of those values are being used.

And only 1/4 of floating point is between 0 and 1 to start with.

Floating point is just not the right way to store these values. It makes sense to have denser values near 0, but the exponential nature of floating point does that to an extremely overkill extent.


Articles about graphics and rendering without a single image should be a misdemeanor. I know it's not explicitly about any particular graphic and instead about performance, but I still think it's missing that extra pizaz


I disagree. Anyone with a basic understanding of how perspective projection matrices work will understand the article, especially with the code samples and equations. (Edit: and anybody without this understanding won’t be interested in the article anyway).

I thought it was great, and had a full-on “smack my forehead” moment as to why I had never realized something so simple and effective, after 20 years of programming 3D graphics!


LOL. Me too. 35 years here. There's always something new to learn.

This, coupled with all the fascinating new ideas on how to optimize 3D math has made me determined to go rewrite my old x86 code.


oh, my. Back in the day, when learning graphics from books, there were some with equations and matrixes only. Imagine that! How did we learn at all in such savage environment?!


this you are completely correct. just anything to show me why this is awesome instead of just the math!


TLDR - floating point format has a great deal of precision near 0. Normally depth maps range from -1 to 1, but if you change that to 0 to 1 (and flip it) you can utilize this extra precision.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: