Hacker News new | comments | show | ask | jobs | submit login
John Carmack's Comment on Hardware Ray Tracing (arstechnica.com)
153 points by lispython 1555 days ago | hide | past | web | 67 comments | favorite

Excellent response by Carmack.

"Ray Tracing" has always been an overhyped or misunderstood technology, at least in my experience. Because of the impressive lighting effects it's famous for producing, people view it as some "holy grail" of superior computer graphics technology that we just need to optimize a bit for use in games.

As Carmack described, those highly realistic ray traced renders come at a price: billions of ray calculations. You don't get all that pixel-perfect refraction, etc. for free (and probably never will - at least not with ray tracing.) He also explains that for most people, this (e.g. pixel perfect refraction) really doesn't matter, when rasterization techniques exist that achieve extremely high quality approximations at a fraction of the computational cost.

Conversely though, ray tracing and related concepts (ray casting) are not at all without value. Many modern video games today actually use a sort of hybrid rasterization / ray casting approach. Macrostructure objects are rasterized the normal way, while smaller high resolution detail patterns (like a brick wall, or stones in the ground) are "added" to surfaces via per-pixel displacement mapping, which is ray casting at its core. This is one of the few cases where you can take advantage of the "log2" efficiency Carmack mentioned -- in a highly specialized implementation without a huge time constant.

Pretty much agree, the only thing that we might disagree on is the 'billions' number, while I expect good ray tracing to trace billions of rays, if transistors are cheap enough this becomes more interesting. The typical HDMI 1080p display is 2 megapixels (or megatexels) if you have a rendering engine with 2,073,600 cores each of which is looking at a billion ray 'view' based on where it sits in the scene, its easier to set up the scene and light it. That presumes that you can do those cores like DRAM instead of like current processors of course. My point is that cheap transistors keep amazing me and whenever I say "Oh that will never happen" some fool goes and shows me I'm wrong.

I'm trying to imagine how complex a 'core' would be that computed the incident rays on a single pixel. Then figuring out how big that is in a 18nm process technology and then trying to see if I can fit 2M on a reasonable size die. My head exploded sadly.

Throwing more ( or finer etched) silicon at an algorithm will not help the algorithm, compared to a superior algorithm. For almost all scenes you get better results with rasterization, the two exceptions I can think of is scenes where the geometry of light rays is not flat, so for example involving lenses or black holes, or where there is a lot more optical complexity than you can reasonably use. ( A forest where each of the leafs is an object, you will never be able to notice that the advantage of this over an forest where the trees in the background are just textures.)

> Throwing more ( or finer etched) silicon at an algorithm will not help the algorithm, compared to a superior algorithm

Actually, it will when the complexity classes of the algorithms differ. Which is true in this case -- the time taken to ray trace a scene rises as the log(n) of the size of that scene, while the time taken to rasterize rises linearly with regards to the scene.

There exists a threshold of computing power/scene complexity above which raytracing beats rasterization in speed. However, like Carmack pointed out, the constant factors are massive, so this won't be reached in the near future, if ever.

The reason the constant factors are so huge is that complexity of raytracing rises linearly with the count of pixels to be drawn, while in rasterization a lot of the work can be shared by neighbouring pixels.

Intriguingly, this means that if you can reduce the pixel counts, you can vastly improve the value of raytracing.

Notably, if you can do eye tracking and rendering in less than 15ms, you can reach the same visual quality as a full-screen, high-resolution render by rendering only the areas you are actually looking at in high resolution, and rendering the rest of the scene at a progressively lower resolution farther from the focus point. The cone of high-precision vision is surprisingly small, something like tens of pixels when looking at a screen from a normal viewing distance. If you did this, you should be able to cut the amount of rays you need to send by at least two orders of magnitude, which would bring raytracing to real-time quality on modern hardware.

Naive question:

Does your last paragraph have any implications for VR? Could something like the Oculus Rift benefit from raytracing technology? Or have I misunderstood what you are saying?

I only skimmed the paper referenced below [1]. But what I understand at the moment is, that you essentially measure the area of the screen where high quality is important (using an eye tracker). And with this you can allocate your resources better. The downside is, that the rendering looks rather strange for anyone who looks on it without proper alignment of the high quality region. So VR goggles seem to be a good candidate for the technology, if they include an eye tracker, because they guarantee a single viewer and provide a frame to attach the eye tracker.

[1] http://research.microsoft.com/apps/pubs/default.aspx?id=1766... thanks mynameismiek

The idea is that raytracing can degrade gracefully, so there's a sliding scale between rendering time and rendering quality for every pixel. And yes, this means that, theoretically, the computer could naturally degrade pixels that you're not currently looking at.

I suspect this would look highly unnatural in practice, though, as even static scenes would flicker and change, especially with reflections and any surface or technique that uses random sampling methods (which is virtually every algorithm that is both fast and looks good).

> I suspect this would look highly unnatural in practice

In pair tests test subjects were unable to see a difference between the foveated image and the normally rendered one. The flicker can be removed by blurring the image to the point where flicker no longer exists. Because your brain is used to a very low sampling rate outside the fovea, it actually helps in hiding the artifacts, because they occur naturally in your vision.

Is raytracing still linear with pixel count if you're doing something like Metropolis light transport to get global illumination effects?

Yes. But the "constant factor" will be very large, even larger than for traditional ray tracing, for some global illumination algorithms, even when you chose to only render a small part of the scene. For example photon mapping requires a pre-pass where you trace photons from the light source onto the scene, that means that reducing the pixel count doesn't decrease the time for the pre-pass. http://en.wikipedia.org/wiki/Photon_mapping

Yes, as I pointed out in the last sentence, perhaps I should have put a big-O somewhere into it. Do you have a link on the eye tracking/rendering combo? That technique sounds rather fascinating.

When CPUs have millions of cores and cost next to nothing to produce, the superior algorithm may be the one which scales linearly with parallel hardware. He mentions the problem of the log2 vs linear with rasterization, but that's just moving the log2 problem to hardware. Ray casting may still win out due to it's simplicity of throwing more hardware at it.

e.g. mirrors?

Yes, but it perhaps more instructive to think of a curved mirror. In case of a raster engine, you need to calculate a texture for every triangle of the mirror. ( Which is essentially the same as rendering the scene in the first place.) On the other hand for a ray tracing engine you just need to reflect one ray on the mirror surface for every pixel the mirror occupies in the final rendering. So compared to the same scene with a window instead of the mirror, you just need to calculate one additional reflection instead of running the entire rendering pipeline again.

I think John says not just one ray per pixel?

Yes, I was essentially using one ray per pixel as shorthand for constant amount of rays per pixel. (As in 'I did not think about that point.') One needs more than one to avoid creating artifacts like aliasing ( see [1] for a very nice example), or depending on how 'naive' or complex the algorithm is for stuff like shadows or caustics. But for nice scenes (which do not create artifacts) with mirrors and lenses, one ray per pixel would work fine.

[1] https://en.wikipedia.org/wiki/Aliasing

I think people rather think of ray-tracing as much better motivated by physics ( and intuition) compared to all the complicated matrices in raster engines. Instead of thinking it is superior because of some demos.

An even more physics based rendering method is called radiosity, but it's also hugely computationally intensive.

Physically Based Rendering - often GPU accelerated - is the new thing -- check out Octane Renderer, Lux-Render, Indigo, the Cycles renderer in Blender.

These renderers start at the light source and stochastically generate and follow photons. Like a digital camera they suffer from noise :-)

"These renderers start at the light source and stochastically generate and follow photons."

Yes, that's similar to ray tracing, but going 'backwards'

Rasterizing is certainly faster, requiring more work in the 'intelligence' of scene assembly.

With raytracing (or physics rendering) you have lights, objects 'naturally interacting' with each other so shadows, reflection, radiosity comes ""automatically"" (still hard to do)

With rasterizing you have triangles with different colors, and it's up to you to paint them accordingly.

Here's a commercial, realtime raytracing tool. Runs happily on a dual-core macbook pro. No custom card needed. Yes, it runs better on a $20,000 64-thread Tigerton, but it will run usably on a laptop.


Carmack is right about games (surprise!). I can't imagine "Imagination Technologies" is pushing the Caustic R2500 at games.

We use KeyShot at work and its an amazing package, its the industry standard now for rendering product design CAD. The demo video showing the Caustic card running is about equivalent to KeyShot running on a $3000 PC. They have a way to go until they convince me that they have anything better than what KeyShot is capable of.

Having no real knowledge of ray tracing (or rendering for that matter), can someone tell me why we can't be moving towards a massive cluster of gpu (in the order of 10,000) ray tracing a scene in parellel? I imagine each ray trace call is quite independent of any other. SOooner or later, cpu (and gpu) will have many cores available for such form of rendering. Is it because of the chicken/egg problem?

This is indeed ray tracing's huge advantage, of being highly amenable to parallelization. However, the same thing is more or less equivalently true for other rendering methods, such as rasterization. Modern GPUs have thousands of rendering sub-unit "cores". And with CUDA you can use those cores for almost anything, including ray tracing. But for a given scene and a given amount of finite computing power you're almost always going to be able to get superior results using a rasterization technique than ray tracing.

I think that rasterization is easier to parallelize than ray tracing, since raster engines essentially multiply each vertex with the same matrix, while for ray tracing you have to traverse an octtree for each step in each ray. ( Both are of course easy to parallelize, I just think that in general you will run into memory bandwidth problems faster with ray tracing algos.)

Yes, memory bandwidth is one of the biggest slow downs of ray tracing. Rasterization access memory very linearly which makes it very easy to cache and easy to optimize in hardware on the GPUs. But ray tracing access memory very randomly, two rays originating from pixels next to each other easily diverges and so hits objects far from each other in the scene. This means that ray tracing will spend most of its time waiting for memory.

Ray tracing enjoys a ridiculous amount of inherent parallelism. That's not much of an advantage though, since rasterisers also enjoy a ridiculous amount of inherent parallelism.

Large scale "render farms" that exploit both already exist and are in commercial use in the VFX industry.

I work in the VFX industry, and I agree. We often need to make a choice between a rasterisation-like method - which in VFX usually means micropolygon rendering like Pixar's REYES algorithm - or a raytracing-based method, like SideFX Mantra's Phsically-Based-Rendering. It is also possible to hybridize the two, which usually means casting rays in order to shade micropolygons. Both approaches have their own advantages.

There are some tools that allow you to render a frame quickly by using many hosts to parallelise the task. This is usually only practical when the data that generates the frame is relatively small, in other words not gigabytes of fluid simulation voxels. It's also not realtime, by any stretch!

A renderfarm is usually not technologically any different to local rendering on a workstation. We generally don't parallelize across hosts within a single frame, but at least you can render lots of frames simultaneously!

Another thing that doesn't seem to have been mentioned on this post is that raytracing can be accelerated massively (where applicable) by using instancing. A single object can be used many times in a scene, only differing in its transformation. This allows the geometry to be stored in ram once and re-used, incoming rays that are incident on an instance's bounding box can simply be transformed into the local space of the instance. Of course this is of no help in an extremely complex scene full of unique objects, but in practice you can make great savings (and create very complex scenes that are cheap to raytrace) this way.

> Another thing that doesn't seem to have been mentioned on this post is that raytracing can be accelerated massively (where applicable) by using instancing. A single object can be used many times in a scene, only differing in its transformation. This allows the geometry to be stored in ram once and re-used, incoming rays that are incident on an instance's bounding box can simply be transformed into the local space of the instance. Of course this is of no help in an extremely complex scene full of unique objects, but in practice you can make great savings (and create very complex scenes that are cheap to raytrace) this way.

You're also forgetting about the flip-side. A classic benefit of scanline renderers was that a scene could be split into multiple parts, which (via a z-buffer) could then be combined without any further rendering. A raytracer, on the other hand, has to have access to the whole scene (if you consider reflections, which make culling objects essentially impossible) to calculate any given pixel sample.

I don't know whether this is still an issue, but for a while it was a barrier to raytracing scenes on a Pixar level.

Minor note: as far as I am aware (i do not write 3d engines for a living but have written games and small 3d libraries for my own use) instancing works fine with both (all?) rendering approaches in just the same way.

The benefits of instancing are felt more keenly in ray tracing because of the need to keep geometry in memory. If you're just dealing with direct lighting it's very easy to load and discard geometry as required as you rasterize the frame. One you start casting secondary rays you end up having to keep more of the scene in memory in order to shade a particular surface, since the lighting contribution from other geo can't be localised.

AMD's vision of "heterogenous computing"(I think they came up with a new name for this?) would essentually be the "true" fusion of a CPU and a GPU in a sense that the chip consists of general purpose execution units(current CPU cores) and of special purpose parallel processing units (curent GPU SIMD/VLIW units) which operate concurrently in such sense that both sequential and parallel code can be executed efficiently on per-thread basis and as threads processed concurrently.

Something like this would really help with things like raytracing. But then again, as Carmack mentions, the huge disadvantage are the acceleration structures which need to be discarded and rebuilt per frame for dynamic objects. It's like saying that raytracing is by it's nature inferior(or at least very wasteful) from performance point of view. It's like a problem to which you simply don't say "Just throw more hardware at it!".

Rasterization also requires "acceleration structures," kept in memory. I could be wrong here, but I think the point John was trying to make was that there was again a constant factor handicapping ray-tracing. But constant factors are, well, constant, and in 2050 we may well have ray-tracing done in hardware delivering scenes that are indistinguishable from reality.

I wrote a ray tracer once, but it was primitive. So I'm not completely talking out of my butt, just mostly.

> Rasterization also requires "acceleration structures," kept in memory

Yeah but surely not for the geometry itself. In rasterization, we may use simple low-overhead acc. structures to efficiently traverse a relevant sub-set of the whole (but coarsely described) scene for some fancy culling, collision detection (OK that's not really rendering) ... but geometry (vertices, polygons, vertex attributes) does not need to be traversed like that and thus does not have to be stored as individual triangles in an octree or bounding-volume hierarchy or what not. Quite a difference in overheads here. In GPU terms, with rasterization you have geometry neatly stored in vertex buffers and an awesome Z-aware traversal method with vertex shaders. In a simplistic current-gen fragment-shader-based raytracer, each pixel traces a ray traversing through your acceleration structure which may be stored in a volume texture (ouch, so many texel fetches...) since vertex buffers are not sensibly accessible in a frag shader.

> scenes that are indistinguishable from reality

This would also require a high dynamic range output device. Looking directly into perfectly raytraced sunlight still won't glare and blind me like the real world does, but indeed, by 2050...

> This would also require a high dynamic range output device. Looking directly into perfectly raytraced sunlight still won't glare and blind me like the real world does, but indeed, by 2050...

You could try to look directly into a beamer instead of onto the screen, if glare is all you want. ( And by 2050 I would suspect that the scene is directly copied into the frontal lobe, bypassing the visual cortex.)

Rasterisation doesn't require any real acceleration structures. There are various caches, but nothing comparable to a kd-tree or BVH.

You would need all nodes to hold almost all geometry, even when you achieve this, you would have to cut the screen into 10,000 pieces of pixels, and then trace each one individually. Some of these small ares would finish much faster than the others, and it would be hard to move them to do another job (or expensive).

But that's from my simple understanding, possibly there would be other problems too...

Current rasterization methods also parallelize well. The problem is that ray tracing is just less efficient for most scenes.

I think that's what he's implying when he says

   I am 90% sure that the eventual path to integration of ray tracing hardware into consumer devices will be as minor tweaks to the existing GPU microarchitectures.

Its so cool that someone as high up in the games industry as carmack actually knows his technical shit.

He might be high up now, but he started off as a programmer. He co-founded id software and (along with others) developed Wolfenstein 3D, Doom, etc. http://en.wikipedia.org/wiki/Id_Software

He still is a programmer. He's the lead programmer at id and writes code for Armadillo Aerospace.

yeah, he can probably do whatever he wants, the fact he's still programming means he obviously enjoys it.

Not only that but I believe he is still "in the trenches" and cuts industry leading code to this day.

If you follow him on Twitter, he posts almost exclusively highly technical content.

I was aware of this, but Mark Z. Also started in the programming, today he's mister CEO, too important for code.

He's known for knowing his shit. :)

another example ...

"I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?"


I think path tracing is one of the most promising methods for realtime ray tracing.

Brigade is one example: http://igad.nhtv.nl/~bikker/ Here an in-game example: http://www.youtube.com/watch?v=6_DrgiwLABk

And a nice blog with posts about Brigade, Octane and others: http://raytracey.blogspot.nl/

"For example, all surfaces that are shaded with interpolated normal will have an unnatural shadow discontinuity at the silhouette edges with single shadow ray traces."

For some reason I really want to understand what this sentence means. I don't know why it jumped out at me, maybe because it seems both accessible and arcane, I want some path to even just tour the arcane concerns of someone so deep into a (this) particular domain.

In English: if you're only using one ray for shading, then you'll have areas that are completely in shadow directly next to areas that are completely not in shadow; what we call hard shadows. Soft shadows are a better approximation of reality, because they show the shades of grey between "covered" and "uncovered", but they require more ray intersections (unless you're using sphere tracing!).

Sphere tracing is awesome. So many effects can be so easily and cheaply approximated, apart from soft shadows also subsurface scattering, bloom/glare etc. Downside, while great for geometry describable by distance fields, it's quite useless with polygonal assets...

Typically when ray tracing you follow one (or more) rays from the eye through a pixel on the screen until it hits something. Then you have to decide what that point on the surface of thing it hits should look like. Usually this breaks down into a combination of how much ambient light you're simulating, the material properties of the surface, the results of any reflection/refraction rays you fire off, and the effect of light sources on it.

The simplest way to calculate the light sources is to follow another ray to each of your light sources, and if there's nothing in the way, you add the intensity of that light to the pixel.

The problem with this simple approach is that something is either blocking your path to the light source or it isn't, which creates absolutely sharp shadows because it's simulating the lights as if all the light's brightness is emanating from a single infinitesimal point.

In real life, lights tend not to be like that.

To get realistic looking shadows with ray tracing you have to send multiple rays to different parts of each of your lights, adding a portion of the lights brightness each time (you're essentially doing a monte carlo integration over the area of the light). The more you do, the better looking shadow edges you can get, but also the more effort you have to spend in calculating.

Raytracing is rather fun because you can get really interesting results with relatively little code and it's very visual - you actually see your bugs. Some people do simple ray tracers as katas. I did a basic one as a way of learning Scala. If you're interested in having a look it's here (very simple of course - I do treat my light sources as points): https://github.com/kybernetikos/ScalaTrace/wiki/ScalaTrace

I think he is not wrong at all, but personally, I've been on both sides of the debate, and then I kind of formed the opinion that it's not worth debating about. The one thing I would strongly, strongly push for is general purpose hardware (we are kind of headed in that direction with OpenCL and CUDA, but industry-wide interest in these things is really low). Don't force me to use rasterization or raytracing (or even triangles), just give me a really powerful, flexible parallel processor...even if that means sacrificing a bit of speed. I think above all Carmack would appreciate such hardware...there was a kind of beauty in his old code you don't find any more -- you can do lots of tricks with GPUs, but overall it is a much more limited, less creative programming experience.

> just give me a really powerful, flexible parallel processor

Isn't that exactly what a GPU is? They aren't terribly fantastic at traditional computation, but then again we're talking parallelism here.

Yes, GPUs are becoming increasingly flexible, but they are still limited by CPU-GPU bottleneck, GPU memory (which for the most part has been limited to 2 GB per card), really poor branching performance and limited ability to modify data in any other way than 1:1 correspondence. They are first and foremost designed to pump out polygons at high fill rates, which is fine unless you want to do some untraditional rendering or computation. :) Again, CUDA and OpenCL are paving the way for a new path, but I still think we should be using general purpose silicon underneath it instead of hardware designed to to do things like MSAA, interpolation, triangle setup, etc.

Sure, but I still believe it's far closer to the ideal parallel hardware than this n-core-shared-memory business that is really hard to program well.

Typically limitations aid creativity, not hamper it.

Limitations force you to come up with creative solutions to problems that should have been easier in the first place. :) But sometimes limitations force you to focus on the important things (if you could render absolutely anything, you could spend forever building an engine instead of a game).

I think sometimes the creative solutions end up being better than the simple "obvious" solutions that were apparent before the limitations were put in place.

As anyone who ever wrote a raytracer would know (or find out soon): Primary rays cache. Secondary rays trash! Brazil guys wrote a decent raytracer (entirely based on Glassner's book) and are really competent devs in general (both of them). Current state of the art, performance oriented, offline renderer is Arnold by SolidAngle, with more smart guys behind it. I think we will see convergence of offline and online RT soon enough. There are already steps being made in that direction, in offline world, towards RT previews.

This reminds me of http://en.wikipedia.org/wiki/Euclideon ... Not trying to draw a comparison, i just find these "future of gaming" technologies interesting, vapor or not.

Within his comment is a great example / reminder to consider constant factors when doing your 'big-oh' analysis of different approaches to a problem.

""" Because ray tracing involves a log2 scale of the number of primitives, while rasterization is linear, it appears that highly complex scenes will render faster with ray tracing, but it turns out that the constant factors are so different that no dataset that fits in memory actually crosses the time order threshold. """

I wish I could sit down and talk with him for an hour or so.

I'm a Unix programmer but I suspect I could learn as much from him about software design as from weeks of talking with just about anyone else.

Check out his QuakeCon keynote. It's basically him talking for an hour about technical things that interest him.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact