
Nvidia Unveils Quadro RTX, a Raytracing GPU - tntn
https://www.nasdaq.com/press-release/nvidia-unveils-quadro-rtx-worlds-first-raytracing-gpu-20180813-00977
======
swerner
Since there's some confusion in a few threads, here a little bit of
explanation. I spent the last ~18 months speeding up a production renderer for
an upcoming animated feature.

29 hours for a frame is high but not unusual. Referred to as "Blinn's Law",
the tendency is that scene assets grow with computing power, eating up any
performance gains that you may have gained from newer hardware. Render time
for an animated movie today is still in the same order of magnitude as it was
in 1995, using state of the art hardware.

Now, how can they do that at 25fps all of a sudden? They're not. They're using
the same method, but not even remotely the same scene complexity or image
fidelity. Ray tracing, and its extension path tracing, scales very well. The
same algorithm that works for 500 polygons works for 500 million polygons, as
long as you can fit them in memory - it's just going to be slower (building
quality acceleration structures is often O(n log(n)). If 10 rays per pixel
don't give you the quality you want, you can go up to 100 or 1000 rays per
pixel, with render time going up respectively.

For film, we're rendering hundreds of primary rays per pixel (tens of
secondary rays for each primary ray), with hundreds of millions of polygons
per scene. I have no insight about the scene of Nvidia's demo, but it is
claimed to run at 5 rays per pixel. So even if scenes were identical, it's
already 20-500 times faster just by doing 20-500 times less work.

Shading complexity is also a huge difference. Production rendering uses tens
of GBs up to the TB range just for textures, on top of procedural shading. Not
only does the computing part of that often dominate render times (instead of
the ray tracing part), but just paging texture data in and out of memory on
demand can become a bottleneck by itself.

That Nvidia demo is a simple scene with no I/O overhead, rendered using a
fraction of the rays used for film production, on 18 billion transistors of
specialised hardware. Of course it's several orders of magnitudes faster, how
could it not be?

~~~
master-litty
How did you get into this industry? I'm fascinated by the nonchalant
discussion that goes into developing animated features, it's been a dream of
mine to penetrate the space.

I'm a senior engineer, made my own renderer and software "suite" to go with it
as a hobby project for the resume, but I'm not sure what else I can make to
get my foot in the door. Do you have any general advice?

~~~
swerner
15 years of experience in 3D application development outside of the of film
industry, two years of contributions to Blender and a lot of luck of being at
the right place at the right time.

In my opinion, don't write your own renderer unless you can really invest a
lot of time into it. There are countless "spheres in a Cornell box" path
tracers out there, you'll not excite anyone with the basics. Contribute to
existing software like Blender or Appleseed instead, that will give you a lot
more visibility and demonstrate that you can write robust code.

------
jdietrich
Anandtech have coverage of the SIGGRAPH keynote at the link below.

Headline specs for Turing:

 _754mm^2 die area

18.6 billion transistors

up to 16TFLOPs

up to 500 trillion tensor ops per second_

It looks like an absolute beast, but don't expect to see anything remotely
like it at a consumer price point.

[https://www.anandtech.com/show/13215/nvidia-
siggraph-2018-ke...](https://www.anandtech.com/show/13215/nvidia-
siggraph-2018-keynote-live-blog-4pm-pacific)

~~~
abkumar
Pricing [1] since the anandtech live blog doesn't list it.

* Quadro RTX 8000 with 48GB memory: $10,000 estimated street price

* Quadro RTX 6000 with 24GB memory: $6,300 ESP

* Quadro RTX 5000 with 16GB memory: $2,300 ESP

[1] [https://nvidianews.nvidia.com/news/nvidia-unveils-quadro-
rtx...](https://nvidianews.nvidia.com/news/nvidia-unveils-quadro-rtx-worlds-
first-ray-tracing-gpu)

~~~
samstave
> __ __Quadro RTX 5000 with 16GB memory: $2,300 ESP __*

For reference, I purchased my first 3D card from Evan's and Sutherland to run
Softimage with 32MB Ram in ~1996 for $1,800.

~~~
bri3d
I found an old Computer Shopper from the late 1990s and had forgotten how
ridiculously expensive computer equipment was - the real sub-$1000 market
wasn't even a thing for PCs until 1997, and the range between a sub-$1000
computer and an expensive one was astounding even for day-to-day tasks.

~~~
stevesimmons
Prices back then were really high...

I won an AT&T Safari NSX/20 laptop [1] in 1992 in the ACM Programming
Competition. RRP was $5749 then, for a 386SX processor running at 20MHz, 4MB
RAM and a monochrome screen. $10,200 in today's money. It was actually a
beautifully made machine.

A year later, I switched to a Dell with 386DX and a 387 math coprocessor
because my PhD needed the number crunching. That cost twice as much (i.e.
around $20k in today's money), paid by the military lab sponsoring my
research.

In our current times of cheap compute, it is easy to forget how much top-end
computers cost 25-30 years ago.

[1]
[https://books.google.co.uk/books?id=AoKUhNoOys4C&lpg=PP142&o...](https://books.google.co.uk/books?id=AoKUhNoOys4C&lpg=PP142&ots=YOf0COMTSD&dq=at%26T%20safari%20386sx%20laptop%20%20price%201992&pg=PP148#v=onepage&q=at&T%20safari%20386sx%20laptop%20%20price%201992&f=true)

------
kayamon
It's not the world's first ray-tracing GPU. Imagination Technologies had one
ages ago.

[https://www.imgtec.com/legacy-gpu-cores/ray-
tracing/](https://www.imgtec.com/legacy-gpu-cores/ray-tracing/)

~~~
ShroudedNight
I also remember ray-tracing being a primary use-case for the Radeon 4870 _ten
years ago_ : [https://www.techpowerup.com/64104/radeon-hd4800-series-
suppo...](https://www.techpowerup.com/64104/radeon-hd4800-series-
supports-a-100-ray-traced-pipeline)

Besides the increase in raw compute, what is materially different this time
around?

~~~
mattnewport
Hardware acceleration of BVH traversal and tight integration with the shader
cores to allow efficient scheduling of shader execution and new ray generation
when ray hits are detected.

~~~
ChickeNES
The former is discussed here: [https://devblogs.nvidia.com/thinking-parallel-
part-ii-tree-t...](https://devblogs.nvidia.com/thinking-parallel-part-ii-tree-
traversal-gpu/)

~~~
izym
That's a software implementation though. These new cards have actual hardware
specifically for that.

------
JohnBooty
Always irks me that nVidia claims they invented the GPU. Graphics hardware,
including hardware T&L, existed long before nVidia was around. From the press
release:

    
    
        > NVIDIA's (NASDAQ:NVDA) invention of the GPU in 
        > 1999 sparked the growth of the PC gaming market, 
        > redefined modern computer graphics and revolutionized 
        > parallel computing.
    

Errrrr, okay.

From Wikipedia:
[https://en.wikipedia.org/wiki/GeForce_256](https://en.wikipedia.org/wiki/GeForce_256)

    
    
        > GeForce 256 was marketed as "the world's first 'GPU', 
        > or Graphics Processing Unit", a term Nvidia defined at 
        > the time as "a single-chip processor with integrated 
        > transform, lighting, triangle setup/clipping, and 
        > rendering engines that is capable of processing a 
        > minimum of 10 million polygons per second."[2]
    

That's a bit like Honda claiming they invented the automobile because they
came up with their own definition of "automobile" that just happens to fit one
of their autos.

Certainly, the Geforce series did some things first. It was the first to cram
certain things onto a single chip and target it at the consumer market. But
their claim to have "invented the GPU" is just silly.

------
flipgimble
It's important to remember that you can always make it look like you are ahead
of Moore's Law curve if you are willing to spend more for silicon and the
energy to power it.

This appears to be a GPU design for those applications where extreme (by
mobile and desktop standards) power consumption and cost can be justified.
Correspondingly NVIDIA has been trying to move into the data center for deep
learning and rendering. They pushed out a lot of good research on how to do
the computationally expensive parts of rendering on the server, and leave the
client to do the last interactive part locally.

The market allows for this silicon giant because of the current Deep Learning
and VR hype. If there were to be another AI and VR winter I'd expect NVIDIA's
marketing to re-focus on power efficiency and architecture tweaks once again.

~~~
ethbro
_> The market allows for this silicon giant because of the current Deep
Learning and VR hype._

One of three things is true:

1) Nvidia has misstepped and no one will buy large, expensive chips

2) Companies are buying large, expensive chips because they're purchasing
based on hype rather than use

3) Companies are buying large, expensive chips because even at their prices,
their performance delivers greater business value

I'd argue that improvements in ML techniques have tipped us into (3). Intel
and Microsoft's fortunes weren't built on being faster: they were built on
allowing businesses to do _new_ categories of work on computers.

Amdahl is the new Moore

~~~
jchw
You gotta trace the money, though. I don't think ML is all hype, but if you
only look one level deep you can get some silly conclusions. For example, just
because Nvidia can sell these things does not mean the purchaser will
ultimately recoup their value, but even if they do the clients of that
business might not recoup their costs. At the end of the day, a lot of people
are investing huge amounts of money into ML, but it is hugely possible that
this is in part due to expectations that won't actually be met in the end.

~~~
mattnewport
What they were emphasising today was replacing a $2 million CPU based render
farm used for offline rendering with their new hardware at 25% the cost and
10% the power usage for the same or better results. To the extent those
numbers are realistic (and it seems like they are) it's going to be a pretty
straightforward value proposition for companies that have render farms. This
was SIGGRAPH so ML wasn't the main emphasis.

~~~
jchw
Yep, fair points. I was mostly replying regarding ML hype. Clearly, the value
for renderfarms is proven at this point.

------
nielsbot
So they're not ray tracing _everything_ , right? If I understand correctly,
they're "filling in some of the blanks" using an ML algorithm? Does that
really work for general purpose rending?

~~~
swerner
It is never mentioned explicitly, but my hunch would be that primary
visibility is still done using rasterisation and that ray tracing is only used
for reflections, shadows and ambient occlusion.

This is how most film rendering was done ten years ago too.

~~~
bufferoverflow
They are properly raytracing soft shadows, no need for AO.

~~~
swerner
AO and soft shadows are orthogonal and are often used simultaneously. A few
direct light sources (point lights, spots, infinite) with soft shadows and
additionally an environment light (dome or half dome) with AO.

------
asparagui
96GB of RAM with a NVLink connection! It will be interesting to see what
happens on the RTX 2080 front next week, am hoping we finally get more RAM!

~~~
abledon
May I ask... what on earth does one do with 96GB of GPU ram? I mean , what
type of model or problem uses this? Pix2pix at 4K resolution?

~~~
TomVDB
I think it’s used to load all the geometry of really a complex scene with a
lot of geometry. (Think Pixar movie.)

With ray tracing, if you can’t load the whole scene in memory, it’s simply not
going to work.

Disney recently released a scene of Moana for public use. It has 15 billion
primitives!

[http://www.cgchannel.com/2018/07/download-disneys-data-
set-f...](http://www.cgchannel.com/2018/07/download-disneys-data-set-for-
motunui-island-from-moana/)

~~~
dahart
> Disney recently released a scene of Moana for public use. It has 15 billion
> primitives!

FWIW, that’s after instancing, which doesn’t take up much memory. The Moana
scene’s data files are mostly ptx textures, the geometry is only a fraction of
it. The subd geometry might expand to be larger in memory than the textures,
though, depending on if it’s subsivided and how much.

~~~
TomVDB
Do you by any chance know how much RAM the CPUs in a render farm need?

That would help to put the 48GB in context...

Answering my own question: [http://pharr.org/matt/blog/2018/07/08/moana-
island-pbrt-1.ht...](http://pharr.org/matt/blog/2018/07/08/moana-island-
pbrt-1.html)

“...with memory use upward of 70 GB during rendering.”

The author is later able to reduce the amount of memory significantly by
rewriting small parts of his renderer.

~~~
dahart
Depends heavily on the production & team tools/workflow, and it’s been a few
years since I was in production, but I personally believe hundreds of GBs is
reasonably common.

The Moana scene data in PBRT form doesn’t quite contain all the data they had
in production, and I don’t know, but PBRT might be better about texture
caching than some production renderers too.

------
acd
How can you trade mark Turing the surname of Alan Turing the inventor of the
Turing machine?

Is this trademark violating the Lanham act 15 U.S. Code § 1052 section?

Lanham act "Consists of or comprises immoral, deceptive, or scandalous matter;
or matter which may disparage or falsely suggest a connection with persons,
living or dead, institutions, beliefs, or national symbols, or bring them into
contempt, or disrepute; or a geographical indication which, when used on or in
connection with wines or spirits, identifies a place other than the origin of
the goods and is first used on or in connection with wines or spirits by the
applicant on or after one year after the date on which the WTO Agreement (as
defined in section 3501(9) of title 19) enters into force with respect to the
United States."

"A mark that is primarily a surname does not qualify for placement on the
Principal Register under the Lanham Act unless the name has become well known
as a mark through advertising or long use—that is, until it acquires a
secondary meaning. Until then, surname marks can only be listed on the
Supplemental Register."

[https://www.law.cornell.edu/uscode/text/15/1052](https://www.law.cornell.edu/uscode/text/15/1052)

~~~
tntn
How is this any different than Tesla?

------
John_KZ
The fact that will have reasonably priced, real-time unbiased global scene
rendering within 5-10 years really blows my mind. I knew this was coming for a
long time, but seeing it actually happen is so odd.

After seeing the bright potential of an AI-powered future turn into a sad
totalitarian reality it's nice to see a "futuristic" technology materialize
with little potential for malevolent use.

~~~
gspetr
Little? How about ever more realistic deepfakes?

[https://en.wikipedia.org/wiki/Deepfake](https://en.wikipedia.org/wiki/Deepfake)

~~~
John_KZ
Fake video is happening regardless (and a lot better) with generative neural
network architectures. Also fake video was already possible, we just see its
usage ""democratized"", in a way.

Empowering everyone to be able to make Hollywood-grade feature films with like
$10k in hardware has a lot of negative side-effects, but also a lot positive.
The only problem is that video is no longer hard evidence for justice. It's a
huge loss, but it happened already.

------
agumonkey
I find this product anticlimactic. I'm gonna go back to my 8bit grayscale
console.

------
bhouston
Unreal and Unity are already used in TV production. This will continue. They
will add some ray tracing features in the future.

Real-time TV production is here and it will get more popular. The main reason
isn't because of RTX but because Unreal and Unity just look amazing using
current real-time techniques.

------
_ph_
If any product presentation should be accompanied by lots of high quality demo
videos, then this.

------
blauditore
> revolutionizing the work of 50 million designers and artists by enabling
> them to render photorealistic scenes in real time

Is it just me or is this extremely flowery?

~~~
bduerst
Depends on whether or not they're rendering a view of a meadow.

------
miffe
Free sync support yet?

~~~
lagadu
How is that relevant for professional hardware?

------
gok
500 trillion tensor ops/sec? So if that’s one chip... 11x faster than a Google
TPU (or ~3x faster than the four chip modules)

~~~
p1esk
No, that's for INT4 ops

~~~
gok
Sure, I guess I meant: if the task can be run with 4 bit math...

~~~
p1esk
Google TPU does not have 4 bit ALUs.

~~~
gok
I don’t think that negates my point? If you have tensor ops that can get away
with 4-bit precision, this is a great chip for you.

~~~
p1esk
Yes, on INT4 tasks this chip will be faster than TPU. That's a fairly rare use
case.

~~~
TomVDB
Is that a limited use case because not many workloads map to INT4 or has this
Avenue simply not been explored because there weren’t any INT4 processors?

My understanding is that during inference, precision is often not critical,
and that some workloads even work with 1 bit?

~~~
p1esk
NN quantization has been an area of active research in the last 3 years, but
it's not trivial when going to 4 bits or below. Usually to achieve good
accuracy during inference, a model needs to be trained or finetuned at low
precision. The simple post training conversion usually won't work (it does not
always work even at 8 bits). Models that are already efficient (e.g.
MobileNet) are harder to quantize than fat, overparameterized models such as
AlexNet or VGG. Increasing a model size (number of neurons or filters) helps,
but obviously it offsets the gains in efficiency to some degree. Recurrent
architectures are harder to quantize.

See Table 6 in [1] to get an idea of the accuracy drop from quantization, it
seems like 4 bits would result in about 1% degradation, which is pretty good.
However, as you can tell from the methods they used to get there, it's not
easy.

[1] [https://arxiv.org/abs/1807.10029](https://arxiv.org/abs/1807.10029)

------
ww520
Can this be used in a VR setting? Is this the first step toward building a
viable holodeck?

~~~
shawn
_Is this the first step toward building a viable holodeck?_

No. Don't be fooled by the marketing. No one knows how to produce realistic-
looking 100% CGI. Raytracing is even worse in some respects than traditional
raster based graphics, and better in other respects.

Here's a heuristic that has never failed: If someone claims to have found a
way to make photorealistic graphics, ignore them, because they're lying.

(This is true until it isn't, but at no point in history has it ever been
true, yet. But that doesn't stop thousands of people and companies from
claiming it.)

~~~
mattnewport
This hasn't been true for a while for certain applications under the right
circumstances. There are architectural renderers, product renders and certain
movie shots that will fool 99% of people into thinking they're looking at a
photograph. Once motion is involved it gets much harder but there are still
snippets of rendered video that could pass for real video.

It is true however that the problem is not fully solved in general. There are
still things that are not handled very well in still renders and animation
(particularly human animation) breaks down in more situations.

~~~
hobofan
Watching the short "The Third & The Seventh"[0] was the last time I believed
any photo or video of architecture to be real, and that was already 8 years
ago!

[0]: [https://vimeo.com/7809605](https://vimeo.com/7809605)

------
aidenn0
I can't decide if the caption writer for the image was incompetent or is
putting massive spin on very poor frame rates:

"NVIDIA Turing architecture-based GPUs enable production-quality rendering and
_cinematic_ frame rates..."

Raytracing at 24fps?

~~~
mileycyrusXOXO
Well it takes Pixar 29 hours to raytrace a single frame, so I would consider
this much faster than current methods.

~~~
prolikewh0a
With 29 hours per frame it would take 378 years to render Toy Story 3 with
~144k frames.

~~~
dagenix
I have no idea if the 29 hour figure is accurate. However, regardless of how
long it takes, I would assume that frames are rendered in parallel.

~~~
dtf
The original Coco frame times were 1000hrs/f, thanks to millions of light
sources (and coming in at 30GB to 120GB of scene data). But engineers got that
down to 50hrs/f over the course of production - light acceleration structures
that have now ended up in RenderMan 22. (These numbers are single core
benchmarks, so you then need to scale by some factor of numbers of cores you
have available). In general, I think it’s the huge scene data size that keeps
feature films mostly on the CPU.

[https://www.fxguide.com/featured/rendermans-visuals-for-
coco...](https://www.fxguide.com/featured/rendermans-visuals-for-coco/)

~~~
RantyDave
Being why nvidia are selling GPU's with 48GB of ram...

------
tomxor
Is this path tracing or ray tracing? both seem to be mentioned. The examples
look too physically accurate for ray tracing but I know it's often incorrectly
used as a synonym for a superset of path tracing (which it's not).

~~~
modeless
It is not incorrect to use "ray tracing" as a generic term. Path tracing is a
kind of ray tracing. The actual process accelerated by the hardware is finding
the intersection of a ray with a set of triangles, which can be used for any
kind of ray tracing including path tracing.

~~~
tomxor
I disagree, it's quite a different algorithm, the main difference being ray
tracing traces a path to each light in the scene at each intersection which
requires far less steps to produce a visually pleasing image but is physically
inaccurate, whereas path tracing takes no shortcuts, it traces a new ray at
each intersection based only on the BRDF function until it either reaches the
recursion limit or hits a light source, this is why it's so accurate but
slow... It's not an extension of the ray tracing algorithm.

But as others point out I suppose this hardware works at a lower level than
either of these algorithms by essentially just accelerating ray intersections.

~~~
modeless
You are mistaken about the term "ray tracing". It is a generic term. The
specific algorithm you describe is "Whitted ray tracing", a particular type of
ray tracing.

~~~
tomxor
That's what most people mean when they say ray tracing... but who says ray
tracing path tracing... no one. There's plenty of articles comparing the two
and they almost always refer to the former as just "ray tracing".

------
josteink
So... A GPU for (3D) graphics processing? That hardly sounds unique ;)

------
jrs95
Would be nice if they’d unveil the next generation of consumer GPUs instead of
sitting on them just to milk more profits out of the current generation.

~~~
athirnuaimi
Tech companies rarely sit on finished engineering work. If they are not
released it’s because they are not ready

~~~
013a
It is finished, at least the R&D. The GTX 10xx cards are based on Pascal,
since which we've had Volta and now Turing. Both Volta and Turing have only
been used in workstation and server class cards.

There are a few things in play.

\- Nvidia has no competition in the consumer graphics card space; AMD is so
far behind that Nvidia can release a (completely amazing) architecture in 2016
and have it still be top of class.

\- Consumer graphics demands are still pretty much met with Pascal, though
this could be a chicken/egg problem. 4K monitors are pricey, and >60fps 4K has
only _now_ just entering the market, and even then a 1080Ti can drive it
alright. At every other resolution, you get a 1070, 1070Ti, or a 1080 and
you're set. The demand isn't there on the high-end, the only thing a new arch
would help is delivering better performance at a cheaper price.

\- Every non-consumer application of GPGPUs is exploding. Rendering, AI,
Cloud, they all need silicon, and every wafer you "waste" in the low margin
consumer segment is a wafer that could have went to one of these markets. I
mean, its not that simple, but that's the idea.

\- The one consumer use that is exploding (less so today, but in months past)
is crypto mining, which is highly volatile. Nvidia likely doesn't want to
encourage this use. You've got miners in China buying up thousands of consumer
cards, and whenever crypto crashes they enter the used market, driving down
demand for first party cards.

\- Much of the rest of the consumer markets are surprisingly dominated by AMD.
AMD has consoles and Mac on lockdown. A lot of this is because Nvidia has
always been unwilling to play in this space, but let's say you're senior
leadership at Nvidia. You have two choices: Play with Apple and ship silicon
in the Mac even though Apple themselves clearly isn't committed to the
platform, or play with Amazon, Microsoft, etc and ship silicon to the
datacenter, which Mac users will often end up using anyway (AI, cloud
rendering, etc). And hey, you want that local processing power, just use
Windows. No brainer.

My guess is that they'll keep Pascal alive for 2018 to clear out inventory,
and we'll see a line of Volta-based consumer cards in 2019. The want to
establish a completely solid moat in these high growth markets before fishing
for pennies in the consumer markets with their older established
architectures.

~~~
shaklee3
The only Volta asic so far is the gv100. They are not high yield parts, since
as another poster pointed out, is likely the largest commercially available
asic. They'll eventually make lower versions of that, but it hasn't been out
that long.

