
The Nvidia Turing GPU Architecture Deep Dive: Prelude to GeForce RTX - kartD
https://www.anandtech.com/comments/13282/nvidia-turing-architecture-deep-dive
======
jaytaylor
So the new top-end Nvidia cards will have dedicated ray tracing cores.
However, real-time ray tracing is still so computationally expensive that
games can only implement a hybrid form of it whereby ray tracing is applied
for a certain effect or single object, and plain old rasterization is used for
everything else.

I applaud NV for stepping up and delivering something in a new direction. Just
think- how long has it been since something truly new has been introduced in
the gfx world?

Reading through the full article, it was no small feat to dream up and build
these cards. A very complex project from both product and engineering
perspectives. Hardware-wise, the 2080 specs are quite insane, and these babies
are thirrrsty, drawing ~250 watts.

That said, the move definitely seems risky, as it increases complexity
significantly for game devs to code for this hybrid approach. What if the
market doesn't warm up to this, or perhaps ATI or someone else comes up with
something more novel?

I also wonder if it's just a few years premature. Not feeling compelled to
give up the good old 970GTX yet. Wake me up when full ray tracing is ready :)

\---

P.S. I couldn't help but snicker when looking at the table near the end
showing which games will support what new imaging modes.

Of course, PUBG doesn't support ray tracing. Hardly a surprise, considering
they can't patch without hours of downtime, and also frequently deliver
patches containing "fixes" which break more than they fix.

(FWIW I've stopped PUBG and moved on to Elite:Dangerous, aka space truckers,
thanks to it being recommended in an HN thread. Fun game, if you enjoy the
solitude of loneliness of endless space! ;)

~~~
jarfil
10 GigaRays/sec = 80 rays per pixel at 1080p60fps.

That should be enough to do full scene real-time raytracing with rays per
pixel to spare (usually 10 rays per path are more than enough).

~~~
MBCook
I was under the impression from the initial coverage that they were actually
getting something like 1 to 2 rays per pixel per frame. Perhaps even less.

From there they were using some sort of smoothing and/or temporal anti-
aliasing to gather the data from multiple frames to get decent quality out of
it.

Or are you proposing how many raise they would NEED to be able to do real time
full raytracing? By the time we have that ability I imagine everyone will just
want the 4K version anyway and we will be behind again.

Either way what nVidia has shown looks great. It’s too bad I’ll have to wait
years to be able to use it as a console gamer.

~~~
Covzire
Why does ray tracing require more than 1 ray per pixel anyway?

~~~
magicalhippo
Due to noise and aliasing. If multiple features (edges, materials etc) cover a
single pixel, you usually get aliasing. If you do random sampling (of
reflections, lights, whatever) you get noise.

For anti-aliasing you'll usually want at least on the order of 10 rays per
pixel for a nice result. If you do random sampling, you quickly need 100 to
1000 rays per pixel to get an acceptable noise level.

------
CoolGuySteve
Why are RT cores so different than normal shader cores? What
instructions/memory fetch does a ray trace operation do that couldn't be
implemented as an added instruction set on the shader cores to navigate the
volume tree?

From the article, the best I can see is the following, but can't that be
solved with microcode or as an extra rendering pipeline stage?

> In comparison, traversing the BVH in shaders would require thousands of
> instruction slots per ray cast, all for testing against bounding box
> intersections in the BVH

I ask, because having more slightly larger general purpose cores seems better
for traditional rendering and raytracing than dedicating all that die space to
pure single-purpose RT cores.

~~~
exDM69
It is more like a texture unit than a shader core. Tree traversal is a pointer
chasing problem, where the CPU/shader core executes a few instructions, then
starts a memory load and then sits idle for tens or hundreds of clock cycles
waiting for memory. Cache prefetching can help but is usually not a good fit
for tree traversal where there is very little computation per node.

It is all about memory latency hiding and not really about computation.

~~~
kbwt
But GPU cores are already king at latency hiding. They can run hundreds of
threads doing pointer chasing, switching between them round-robin as the
memory reads complete.

~~~
exDM69
The switching isn't free. Waking up a thread to do just a few computation
cycles (a few ray-aabb intersections) and then going back to sleep while
waiting for the next node to be fetched from the memory is super inefficient.

If there was significant computation needed per node, this wouldn't be an
issue.

~~~
kbwt
> The switching isn't free.

It absolutely is, on current GPUs. Think of it like a larger-scale version of
SMT (Intel's hyperthreading). GPUs are able to do this because they execute
instructions in-order and do not need to track thousands of instructions per
thread.

~~~
david-gpu
It's more complex than that. Switching warps thrashes your caches. There is
definitely a cost associated with it.

~~~
kbwt
Well, yeah. If you are memory bandwidth-constrained it's a bad idea to go off-
chip.

But for ray-tracing, what does it really matter? We are already assuming that
you will wait a full memory fetch cycle to get the next node's child AABBs and
child indices. The warps will do their intersection test on the data they just
read and fire off the next read. Each thread's hot context should fit in under
a cache line, since it's basically just a single ray to keep track of.

------
jaytaylor
On mobile all I see is the comments section.

Friendlier link:

[https://www.anandtech.com/print/13282/nvidia-turing-
architec...](https://www.anandtech.com/print/13282/nvidia-turing-architecture-
deep-dive)

~~~
mrep
Happened for me on desktop too probably because the link is a comments link.
Mods, can you change the link?

------
marvin
Seems to me like ray-traced rendering provides a feasible path to foveated
rendering for VR, meaning much better performance for VR scenes at high
resolutions. This would be a big deal for VR developers, since they don't have
to do unlikely amounts of magic to implement this. If NVIDIA is able to drag
everyone along, they will get the hardware for this without making any huge
strategic moves on their own part.

------
cjhanks
Don't forget that NVIDIA is not only a gaming company. They are involved in a
lot of computational geometry fields, localization reconstruction, machine
learning, etc.

There are a lot of use cases for ray tracing that are not games. So far NVIDIA
has done a great job at changing their GPU architecture in a way that is
mutually beneficial to all of their diverse customer base.

~~~
dan678
The use case that seems obvious to me is computationally efficient ray-tracing
in robotics/autonomy simulation. I wonder if ISAAC sim will take advantage of
RTX. What do you think?

~~~
cjhanks
I am not familiar with ISAAC, but it appears that ray tracing would be
relevant in this case.

------
npunt
The thing that strikes me with the RTX announcement is a general point about
_how important identifying useful intermediate steps are to bringing about new
paradigms_.

Unless a technological breakthrough is just around the corner, or you have the
resources to push it forward (Space Race / Manhattan Project), it’s better to
spend your energy identifying useful intermediate steps that you can offer to
the market to fund & bridge yourself to the new paradigm. By having funding
all along the way, you can gain a significant advantage to those pursuing the
new elegance directly. [1]

A few examples:

STREAMING. People used to go to video stores to rent movies. As the internet
emerged we dreamed of a new, more elegant paradigm: streaming. No more driving
to a store, no more physical copy or late fees or damages, etc. But it was the
clever discovery of an intermediate step - to use the internet to rent DVDs
via mail - that created the brand and customer base that established the
market leader (Netflix). Once internet infrastructure caught up, the switch
was seamless. Meanwhile, there were many people who pursued streaming
directly, but failed because they didn’t take the intermediate step
(Broadcast.com).

ELECTRIC CARS. Traditional cars have _super_ complex drivetrains. As battery
tech improved, we dreamed of a new, more elegant paradigm of electric vehicles
that improved efficiency and eschewed most moving parts, transmissions,
exhaust systems, etc. But there existed a valuable, infrastructure-free
intermediate step to get there: hybrids. Ironically they were even more
complex, but they employed many new techs that helped move electric cars
forward. Toyota has hugely benefitted from being the discoverer of this
intermediate step. Obviously we now have Tesla leading the vanguard, but in
the context of global development, nobody can predict if an Elon will show up
in your generation.

AUGMENTED REALITY. Our current physical reality is awash in information -
street signs, road paint, branding, menus, maps, clocks, games, warnings,
nutrition labels, interfaces, etc. These are often completely irrelevant to us
at a given time, and certainly not personalized to our needs. We dream of the
day we can render overlays on our eyes to deliver the personalized versions of
these (as well as entirely new things), which would over time mean our
physical reality would get simpler, cleaner, and less wasteful. To deliver
this elegant solution requires a lot of breakthroughs in display technology
that are years if not decades away. Bundling SLAM tech into smartphones
(looking at you Apple) and pursuing incremental use cases is an intermediate
step that can grow the market until the point where the displays are ready, at
which point those who best pursue this are likely to be the market leader.

Ray tracing is now on the same course. It's been known for decades that it is
a far more elegant paradigm to reason about and generate images (vis-a-vis
rasterization), but its compute requirements are so high that there's been
this chasm people haven't been able to cross to get to ray tracing. Nvidia has
now provided a bridge between these two worlds, by allowing raytracing of
parts of the rendering pipeline alongside rasterization. Subsequent
generations will slowly swallow the remaining parts that rasterization
performs today. Basically the RTX is the graphics card equivalent of a Prius,
growing into a full electric.

The addition of ray-tracing cores in the RTX line was a pleasant surprise to
me, not only because it speeds the development of ray-tracing hardware, but
because it showed intermediate steps existed that I didn’t know about before.
It showed me we weren’t stuck waiting indefinitely for a promise of an elegant
future that always seems a decade away. Pretty exciting.

[1] What I mean by paradigm is not just incrementalism or an evolution of one
product into another (like iPod -> iPhone), but of a wholly different way to
solve a problem that is more elegant / higher abstraction than previous ways,
but that require breakthroughs in enabling technologies to get there. Rockets
-> Space Elevators (material science; elegance is in ease of transport),
Retail -> Online Shopping (internet; elegance is in personalization + stay-at-
home), Coal -> Solar (energy storage; elegance is in eco footprint, low entry
point & simpler tech), Driving -> Autonomous Driving (ML/sensors; elegance is
in time savings / one less thing to learn & simplification+density of road
infra). This is admittedly a fuzzy definition, and perhaps these examples are
not perfect.

~~~
TheForumTroll
Toyota is just as far ahead as Tesla. Electric cars isn't better than
hydrogen. Just different. And Toyota can sell 1000 cars and have less problems
than one Tesla.

~~~
sieabahlpark
Not to mention the Eco impact of building a Tesla but that's off topic for
this.

------
rl3
The Turing architecture is also used in Quadro RTX cards, and those have a
ridiculous amount of VRAM.

Is there any professional/computational use for these RT cores beyond
raytracing?

One case that comes to mind is perhaps raytracing acoustics, and although
interesting it's technically still raytracing.

As far as gaming is concerned, personally I'd love if the RT cores could
contribute—however inefficiently—to rendering workload in non-RTX games. It's
annoying that 50% of the die is allocated to hardware that requires feature-
specific implementations.

~~~
kbwt
> It's annoying that 50% of the die is allocated to hardware that requires
> feature-specific implementations.

That's the future. While we may be able to cram more transistors onto "7nm"
chips, only a tiny fraction of the chip area can be powered on because leakage
currents are no longer decreasing with transistor size [1]. Hence Apple's
Neural Engine and Nvidia's RTX. You have to waste the extra transistor count
on something specialized.

[1] [https://semiengineering.com/is-dark-silicon-wasted-
silicon/](https://semiengineering.com/is-dark-silicon-wasted-silicon/)

~~~
rl3
The RT and Tensor cores are primarily intended to power raytracing and DLSS,
respectively. Both of those features will be used in conjunction with
traditional shading/compute units, so the entire die is utilized at once, at
least at a high level.

It would be really interesting to know the power consumption of a Turing card
maxing out just its shading units, versus full utilization with RTX/DLSS.

------
beerlord
We are at the end of the console cycle, which means games hardware
requirements will plateau until the next gens are out. So from now, at 1080p,
a 1070 is enough to handle _everything_ at 60fps until then. That probably
translates to a 2060 this generation - which is the card I'm really interested
in.

Other than that Nvidia cards are severely restricted due to their lack of
support for Adaptive Sync or HDMI 2.1 VRR.

~~~
zokier
Considering that 1070 is struggling already with some 2016 games[1], I find
the claim that it would be able to handle everything until next console gen
comes out (2020 earliest?) very dubious.

[1]
[https://www.anandtech.com/bench/product/1941](https://www.anandtech.com/bench/product/1941)

~~~
beerlord
There is only one game on that list - GTAV - that performs below 60fps. And
for that game, its trivial to lower a few settings and hit 60.

------
TwoQ
Will be interesting to see if this goes the way of PhysX or not.

