Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia Unveils Quadro RTX, a Raytracing GPU (nasdaq.com)
316 points by tntn on Aug 13, 2018 | hide | past | web | favorite | 189 comments

Since there's some confusion in a few threads, here a little bit of explanation. I spent the last ~18 months speeding up a production renderer for an upcoming animated feature.

29 hours for a frame is high but not unusual. Referred to as "Blinn's Law", the tendency is that scene assets grow with computing power, eating up any performance gains that you may have gained from newer hardware. Render time for an animated movie today is still in the same order of magnitude as it was in 1995, using state of the art hardware.

Now, how can they do that at 25fps all of a sudden? They're not. They're using the same method, but not even remotely the same scene complexity or image fidelity. Ray tracing, and its extension path tracing, scales very well. The same algorithm that works for 500 polygons works for 500 million polygons, as long as you can fit them in memory - it's just going to be slower (building quality acceleration structures is often O(n log(n)). If 10 rays per pixel don't give you the quality you want, you can go up to 100 or 1000 rays per pixel, with render time going up respectively.

For film, we're rendering hundreds of primary rays per pixel (tens of secondary rays for each primary ray), with hundreds of millions of polygons per scene. I have no insight about the scene of Nvidia's demo, but it is claimed to run at 5 rays per pixel. So even if scenes were identical, it's already 20-500 times faster just by doing 20-500 times less work.

Shading complexity is also a huge difference. Production rendering uses tens of GBs up to the TB range just for textures, on top of procedural shading. Not only does the computing part of that often dominate render times (instead of the ray tracing part), but just paging texture data in and out of memory on demand can become a bottleneck by itself.

That Nvidia demo is a simple scene with no I/O overhead, rendered using a fraction of the rays used for film production, on 18 billion transistors of specialised hardware. Of course it's several orders of magnitudes faster, how could it not be?

The special issue of the ACM Transactions on Graphics has several articles describing the inner workings of some current production renderers: https://dl.acm.org/citation.cfm?id=3243123&picked=prox

Thanks for this heads up.

I remember years ago, us developers would spin up slave program overnight so that the graphics guys could eke out a few frames of ray traced images.

It's hilarious to me that Blinn's Law is alive and well (although the state of the web itself shows it). It was pre 3d card (and up until 3fdx/ early nvidia days), and drove me nuts that every time we managed to get another fps, the graphics guys would add a few hundred polys to be rendered by the realtime engine.

To be fair, it’s completely different now.

Artists can previsualize almost everything that’s happening in the renderer, often in real-time or close to it. It’s just the final frames that go to production that take hours to render.

How did you get into this industry? I'm fascinated by the nonchalant discussion that goes into developing animated features, it's been a dream of mine to penetrate the space.

I'm a senior engineer, made my own renderer and software "suite" to go with it as a hobby project for the resume, but I'm not sure what else I can make to get my foot in the door. Do you have any general advice?

15 years of experience in 3D application development outside of the of film industry, two years of contributions to Blender and a lot of luck of being at the right place at the right time.

In my opinion, don't write your own renderer unless you can really invest a lot of time into it. There are countless "spheres in a Cornell box" path tracers out there, you'll not excite anyone with the basics. Contribute to existing software like Blender or Appleseed instead, that will give you a lot more visibility and demonstrate that you can write robust code.

Not OP, but look at the credits after films and look for special effect companies to apply to. There are so many, if you are free to move and you can write a renderer, you should be able to get an interview somewhere.

Companies like that are constantly staffing up and laying people off with the film cycles so it shouldn’t be too hard to get in.

Once in you just have to keep your eyes peeled for a good job in an abusive industry.

Probably a stupid question: At what point there is a real perceptible difference for common user who is watching movie 10ft away from their 40” TV?

I get that doing 100 Ray per pixel gives you quality that an expert need to look hard to find defects given expensive calibrated displays in well lighted controlled environment. But a normal user isn’t doing that. Their room lighting is already washing out on several artifacts. Their TV sets are not even properly calibrated or even 4K. For those common people, what is good enough?

Its way more complicated than that.

In the old days, artists would change the scenes to match the technology. If human skin couldn't be rendered well, you just showed off plastic and metals. Or, they'd make a "cartoony style" and hope that the audience would get used to the plastic barbie-looking skin.

Or in the case of Toy Story: you just make a story revolve around plastic toys. Plastic looks good, remember, in 90s CGI. As long as actually humans are rare, the movie will look "realistic".

But you can't just keep making movies about Toys with plastic-like skin. And improving the algorithms to have realistic skin requires harder calculations such as "subsurface scattering".

Today, artists want to show off more materials, and they also want all those materials to be more believable. Pixar is insane with their details: they spent two years on creating the "Hank the Octopus" in "Finding Dory". And it shows: good reflections off the skin highlight the wetness of the Octopus, Hank walks and moves very much like a real octopus, etc. etc.


In short: there are some materials which don't require much processor time. You probably can get away with a toy-story like plastic look and feel with 10 samples per pixel (talking 90s era here, where things were still relatively flat and primitive).

But if you want to create a CGI Grand Moff Tarkin in an otherwise live-action movie (Star Wars: Rogue One)... you'll need to do better than that. These scenes will naturally require 100s or 10,000 even samples before the human eye is tricked.

So this is a toy example used for marketing. They are pulling the same shit with their automated driving "research".

Yes and definitely no. RT rendering (or close to it, well more like RT rendering techniques) are making it into production. Not as much as film, but TV definitely and it's only a matter of time when two worlds converge. ZAFARI is a recent example - https://www.unrealengine.com/en-US/blog/animated-children-s-...

Nvidia definitely knows how to massage perf figures for press releases. But I guess the tensor cores can help alleviate lower rays per pixel/more noise with machine learning based noise reduction - https://youtu.be/pp7HdI0-MIo

Do you have a link to a video of the demo? I'd love to check how much I notice a difference with movie scenes.


Real-time demo at 19m30s.

> That Nvidia demo is a simple scene with no I/O overhead, rendered using a fraction of the rays used for film production.

But if the consumer can't tell the difference, does it matter?

Many films - in this example because we are talking about films - use a variety of visual techniques to reduce the complexity. Horror films for example rarely show the antagonist. The creature is shown in the dark, for split seconds, until a final expensive moment, leaving your brain to fill the rest in.

Although YOU can't unsee it, the audience, nor the studio, should care that the stormtroopers are rendered on a simple set with the snowstorm being used solely to avoid the necessity of rendering. The sound of the creature gave the effect of a complex monster lurking in the midst and I don't need a raytracing demo to show that.

> But if the consumer can't tell the difference, does it matter?

The consumer gets used to it. The CG effects in Toy Story 1 look dated, and if you were to watch the Incredibles and the Incredibles 2 back to back, you would also notice immediately. Or compare the movie "Gravity" to "Babylon 5".

Plus I've not even mentioned hair, fur and volumetrics. A staple in pretty much any feature film FX, they were completely absent in the real time ray tracing demos. Hair and fur are commonly not ray traced as triangles but use custom intersection routines. Those are an extra layer of complexity on top. I don't know yet how Nvidia's own API for the RTX card looks like, but Microsoft DXR and Apple's Metal ray tracing both are restricted to triangle geometry only.

>The consumer gets used to it.

This. You could do 100x more polygon and Ray Tracing, but consumer knows it is still CG.

I often wonder what will happen next? Once the economy of scale runs out for shrinking transistors. We have 7nm, 5nm and 3nm, and may be 450mm Wafers that leads to 2nm. We may only have 16x more transistor density improvements, and it seems 16x performance I still isn't enough for CG production.

Will OTOY's RNDR fix this by just through more compute at it?

I did watch Incredibles and Incredibles 2 nearly back to back, and did not notice any difference.

I do remember Zoolandia a striking in terms of effects though.

For reference here’s a comparison video: https://youtu.be/WzNIyoMmjQ4

Thank you! I didn't watch #2 yet (I think that in Switzerland they'll start showing it only in October) but when watching the comparison I tend to like more the old/#1 version. Maybe because it looks more "plastic"/"synthetic" and therefore "cartoonish"??? Not sure... .

Wow. I haven't seen the sequel yet (unfortunately), but the differences is incredibly striking.

I read Zoolandia as Zoolander, and was really confused trying to figure out the special effects there!

blue steel took 10 days to render

Look at the hair in the scene where they're in the ocean in Incredibles 1, and... pretty much any comparable scene in 2. The hair in 1 looks almost like plastic; in the sequel (or, even better, Moana), the wet hair looks much more lifelike.

In DXR, you can specify a custom intersection shader, thus it's not only limited to triangles.

Thanks, I was wrong then.

Np. All this marketing show is quite elaborate and it's hard to get true answers. Thanks for that top level comment by the way, this whole thread is crazy with speculation and misunderstanding.

This effect gives interesting perspective on pre CGI movies. You see how movie makers conveyed emotions, effect without very limited toolset and built great movies without rubbing the visual organ constantly.

I find a lot of movies today are falling in the Lucas trap: too much CGI (cost reign supreme I guess). It's a bad soil for movies...

Just compare "South Park" to "Final Fantasy: The Spirits Within". CGI can't replace a good script.

I dunno. I still get goose bumps thinking about and watching the protagonist's hair swing realistically in the opening scenes of the FF movie. It was an awesome accomplishment for its time.

FF:TSW was more than 'look CGI' though, it was a failed movie, but there was a great deal of beauty in the landscape, characters etc. Epic was known for aesthetics mastery.

Toy Story 1 still had a budget of 30 million and a box office of 370 million, in 1995 dollars.

Lets skip past the discipline and lets focus on the consequences. What are the consequences here?

Why is Nvidia's new chip potentially failing product market fit?

Who mentioned failure?

I'm merely explaining why and how Nvidia can show 5 rays/pixel @ 25pfs while we're at the same time budgeting several hours per frame for state of the art production rendering. Sure, more and better hardware will help film rendering. It always has.

Anandtech have coverage of the SIGGRAPH keynote at the link below.

Headline specs for Turing:

754mm^2 die area

18.6 billion transistors

up to 16TFLOPs

up to 500 trillion tensor ops per second

It looks like an absolute beast, but don't expect to see anything remotely like it at a consumer price point.


Pricing [1] since the anandtech live blog doesn't list it.

* Quadro RTX 8000 with 48GB memory: $10,000 estimated street price

* Quadro RTX 6000 with 24GB memory: $6,300 ESP

* Quadro RTX 5000 with 16GB memory: $2,300 ESP

[1] https://nvidianews.nvidia.com/news/nvidia-unveils-quadro-rtx...

I know meta comments are unwelcome but "nvidianews.nvidia.com/news/" is so egregious that it caught my eye.

It looks like their entire website is a huge legacy mess.

"news.nvidia.com" leads nowhere.

"nvidia.com/news/" leads to "nvidia.com/content/redirects/news.asp" and produces "Corrupted Content Error"

The page you are trying to view cannot be shown because an error in the data transmission was detected.

I suspect probably something like: spending money on website appears to do nothing to increase sales.

Alas, there's nothing at https://nvidianews.nvidia.com/news/new.

That's kind of amazing... also impressed you took the time to determine this.

I understand the technical novelty, but from a financial perspective for the NVIDIA company and the customers and data centers they pop these things in: is this a big deal?

Is this just routine new cycle stuff that amazon and microsoft are going to buy en masse like they already would have no matter what Nvidia came out with?

Or is this a big deal for reasons that you will explain and the stock market is going to go wild attempting to "price this in"?

> Quadro RTX 5000 with 16GB memory: $2,300 ESP*

For reference, I purchased my first 3D card from Evan's and Sutherland to run Softimage with 32MB Ram in ~1996 for $1,800.

I found an old Computer Shopper from the late 1990s and had forgotten how ridiculously expensive computer equipment was - the real sub-$1000 market wasn't even a thing for PCs until 1997, and the range between a sub-$1000 computer and an expensive one was astounding even for day-to-day tasks.

Prices back then were really high...

I won an AT&T Safari NSX/20 laptop [1] in 1992 in the ACM Programming Competition. RRP was $5749 then, for a 386SX processor running at 20MHz, 4MB RAM and a monochrome screen. $10,200 in today's money. It was actually a beautifully made machine.

A year later, I switched to a Dell with 386DX and a 387 math coprocessor because my PhD needed the number crunching. That cost twice as much (i.e. around $20k in today's money), paid by the military lab sponsoring my research.

In our current times of cheap compute, it is easy to forget how much top-end computers cost 25-30 years ago.

[1] https://books.google.co.uk/books?id=AoKUhNoOys4C&lpg=PP142&o...

>the real sub-$1000 market wasn't even a thing for PCs until 1997

YES! I was at intel when they were doing the initial research to even see if a <$1,000 machine was feasible. With the celeron... and this is when they were literally paying millions to companies to optimize to the intel processors so they had software that would be subjectively digestible by the market to purchase software and machines thinking that they were getting compute power for their buck.


My 386SX 20 MHz, 2MB RAM, 40 MB HDD bought in 1991, was about 1 500 euros in today's money.

i haven't done the math, but the 1996 $1800 seems like more money than 2018 $2300.

Apparently, 1996 $1800 = 2018 $2891...

But the amount of actual capability one gets from one of these cards is thousands of times more than what one could do in 1996.


$40,000+ for Indigo2, also running Softimage! Daylight robbery.

1991: SGI 4D/340 with VGX gfx, 4x MIPS R3000, 64MB RAM, 700MB disk: $180,000.

I sometimes laugh to myself when people complain about the price of GPUs. Yes, $10,000 is a lot, but in historical context, it's pretty reasonable for top of the line technology.

Beyond graphics, I think of all the "terascale" talk in the high-performance computing world when I was in school. Now your consumer GPU does multiple Tflops instead of hoping a supercomputer to maybe possibly reach 1 Tflop some day for a few lucky users.

The same thing has happened with RAM and storage. My first Linux PC had 20 MB of RAM and 80 MB of disk and was sufficient to do most of my CS projects at university in the early-mid 1990s. Now, a sub $200 smartphone has over 100x the space, while desktops are commonly 2000x.

The research side not only moved on 1000x to "petascale" but that's now boring and there is real talk of "exascale" with the same gleam in the eye. One million times the performance we dreamed of at the beginning of my career, though I think this is partly by expanding the scope of one machine to larger and larger distributed systems as well as scaling up the capacity of individual elements.

As a man from history, I concur. It's amazing what we're having today at those prices... but, they could go down too!

Inflation-adjusted $1800 in 1996 is ~$2850 in 2017.

$10,000? They'll have people buy the whole lot anyways. Buy NVDA

Yes, they will. Not even counting cryptomining, universities and big-businesses use NVidia-powered supercomputers with 5000+ $10k cards in them, for AI deep learning or for astrophysical modelling or other things.

Previously those cards have been the ~$8k-$10k P100 or V100. Not any more.

Interestingly, I game on a P1something, through AWS and parsecgaming.com — very cool tech, and actually worthwhile for me price wise as I don’t game often enough to invest a couple grand in a gaming PC.

AWS and GCE as well, they can put them in their datacenters and run them 24/7, with their customers either renting it for a few minutes to hours for the odd task, researchers for hours / days to run a huge task, and the spot market making sure they never go unused. For them it's more of a matter of how much they're used in terms of % of time, and how much they can charge per compute hour. They can earn the investment back within a year.

rtx 5000 is actually not too expensive as comparing to the launch of Volta series.

«don't expect to see anything remotely like it at a consumer price point.»

Strictly based on the perf metrics you listed, AMD already sells something close (matching TFLOPS within 15%), and at a consumer price point:

AMD Radeon RX Vega 64 Liquid ($699):

• 486 mm²

• 12.5 billion transistors

• up to 13.7 TFLOPS (single precision, which is probably what Nvidia is quoting, although I'd like to see confirmation)

• no tensor ALU though

I haven't kept up with GPUs since a long time ago. I remember reading that even though Quadro GPUs were for heavy workstation loads, it still didn't compare to consumer GPUs when used for playing games.

How would this Quadro RTX compare to a Titan GTX 1080 Ti for games with standard rendering techniques?

> I remember reading that even though Quadro GPUs were for heavy workstation loads, it still didn't compare to consumer GPUs when used for playing games.

Different focus. Gaming GPUs are tweaked for speed. Workstation GPUs are tweaked for precision. Some other considerations like compatibility, thermal/power efficiency, and duty cycles also factor in.

This was borne out more obviously ~10 years back when you could soft-mod some nvidia gaming GPUs to be quadros, turning a $200 gaming GPU into a $600 workstation card just by melting a hardware fuse. Today, the cards are probably too complicated to allow for that, even if you do find a way of tricking the drivers.

754mm^2 = 1.17 square inches. That's a die size that's bigger than a postage stamp.

Die size is ridiculous, causing such high prices due to exponential costs associated with die size. It seems like an opening for AMD to use their EPYC Infinity Fabric interconnect for their next-gen GPUs in multi-chip modules with smaller dies.

Die prices rise exponentially with area if your product requires a flawless die. But in architectures where bad subsections can get mapped out, such as GPUs and multicore CPUs, the price rise is not exponential.

Kepler GPU die sizes were pretty small. Maxwell die sizes on the same process were much larger. (Up to 600mm2.)

Pascal die sizes were pretty small. The Turing die size is much larger on (essentially) the same process.

You’re seeing the power of improving yields as the process matures.

It’s likely that a 7nm chip with the same functionality as this one would cost more for the foreseeable future, so it makes sense to make it this big.

The BW of the infinity fabric is but a fraction of what’s needed for a GPU, so that’s a total non-starter.

It's smaller than Volta, but that isn't saying much - as far as I can tell, Volta is the biggest commercially-produced processor die ever.


It's not the world's first ray-tracing GPU. Imagination Technologies had one ages ago.


I'm not sure whether this is the reason in this case, but it's worth noting that NVIDIA marketing (ab)uses the term "GPU" in a very particular way. Basically, their definition is contrived to make the GeForce 256 the first GPU ever.

> NVIDIA invents the graphics processing unit, putting it on a path to reshape the industry. GeForce 256 is launched as the world's first GPU, a term NVIDIA defines as "a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. [1]

[1] https://www.nvidia.com/en-us/about-nvidia/corporate-timeline...

That's only mildly off-putting to me since nvidia marketing invented the term GPU.

They definitely popularized it, but they didn't invent it. The graphics chip of the PlayStation was referred to as the "GPU" at least as far back as 1995, and I don't think NVIDIA had any involvement in its development or marketing (although the company was founded early enough for that to have been possible).

The Atari Jaguar also had a GPU, and this was in 1993.


It had a "Tom" chip. The question is to whether they called it a GPU. I think they called it DSP back then.

Even 3Dfx, the chief competitor to NVIDIA, made “graphics cards” or “3D accelerators” not “GPUs”. At least in that fight NVIDIA used the GPU term first, to emphasize that hey did transformation and lightning as well as rasterization. Although it wasn’t until a few generations later with the first register programmed GeForce architectures that we had anything resembling what we call GPUs today.

Here is a link to the technical manual to the Jaguar, written for and distributed to developers in 1992: https://www.hillsoftware.com/files/atari/jaguar/jag_v8.pdf

A search indicates many uses of the GPU term.

The original GeForce had hardware transform and lighting (T&L), but lacked FSAA. The Voodoo5, vice-versa.

In a parallel universe where 3dfx survived and thrived, maybe they would be defining FSAA to be an important part of the first modern "GPU"

No, that's kinda a dead-end evolutionary pathway. 3dfx's "FSAA" was, essentially, render at 4x the resolution and down-sample at the DAC. Sure, that works, but about the only thing it does is FSAA or motion blur (but not both, selectively).

Register-programmed T&L on the other hand was the first programmable SIMD shader pathway. After Stanford showed[1] somewhat remarkably in 2000-2001 that it was powerful enough to be the target of a RenderMan-like shader language, it pushed architectures towards full programmability.

Now we think of the "GPU" as a vector processing unit able to run arbitrary kernels across massively parallel data sets, with high bandwidth connections to its own high-speed RAM. The fact this is useful for real time graphics rendering is almost incidental, as TFA demonstrates. That whole revolution, and the impact it had on scientific compute, machine learning, and much else, began with the first register-programmable GeForce (the GeForce 2 I believe? But Wikipedia says the GeForce 256 supported the same architectural feature, but was just not user accessible).

[1] https://graphics.stanford.edu/projects/shading/ Although pursuing the website it looks like it was a general OpenGL 1.1 pipeline, rather than using the NVIDIA extensions as I remember. However these were translated by NVIDIA's driver into register presets for the hardware T&L engine.

I also remember ray-tracing being a primary use-case for the Radeon 4870 ten years ago: https://www.techpowerup.com/64104/radeon-hd4800-series-suppo...

Besides the increase in raw compute, what is materially different this time around?

Hardware acceleration of BVH traversal and tight integration with the shader cores to allow efficient scheduling of shader execution and new ray generation when ray hits are detected.

That's a software implementation though. These new cards have actual hardware specifically for that.

While the 4800 series Radeon cards supported fully ray-traced pipelines it wasn't necessarily dedicated to it. These have a section of the chip entirely dedicated to ray-tracing computations and acceleration.

Their denoising looks interesting: https://www.youtube.com/watch?v=9yy18s-FHWw

> It's not the world's first ray-tracing GPU. Imagination Technologies had one ages ago.

Caustic Graphics as well, IIRC.

Did Imagination acquire Caustic Graphics?


I can't find any evidence that they actually brought it to market.

Caustic Technologies had this in 2000 and then Arm bought them. I believe the date of 2018 as the release by NVIDIA is actually related to the Caustic Technologies patents expiring.

Always irks me that nVidia claims they invented the GPU. Graphics hardware, including hardware T&L, existed long before nVidia was around. From the press release:

    > NVIDIA's (NASDAQ:NVDA) invention of the GPU in 
    > 1999 sparked the growth of the PC gaming market, 
    > redefined modern computer graphics and revolutionized 
    > parallel computing.
Errrrr, okay.

From Wikipedia: https://en.wikipedia.org/wiki/GeForce_256

    > GeForce 256 was marketed as "the world's first 'GPU', 
    > or Graphics Processing Unit", a term Nvidia defined at 
    > the time as "a single-chip processor with integrated 
    > transform, lighting, triangle setup/clipping, and 
    > rendering engines that is capable of processing a 
    > minimum of 10 million polygons per second."[2]
That's a bit like Honda claiming they invented the automobile because they came up with their own definition of "automobile" that just happens to fit one of their autos.

Certainly, the Geforce series did some things first. It was the first to cram certain things onto a single chip and target it at the consumer market. But their claim to have "invented the GPU" is just silly.

It's important to remember that you can always make it look like you are ahead of Moore's Law curve if you are willing to spend more for silicon and the energy to power it.

This appears to be a GPU design for those applications where extreme (by mobile and desktop standards) power consumption and cost can be justified. Correspondingly NVIDIA has been trying to move into the data center for deep learning and rendering. They pushed out a lot of good research on how to do the computationally expensive parts of rendering on the server, and leave the client to do the last interactive part locally.

The market allows for this silicon giant because of the current Deep Learning and VR hype. If there were to be another AI and VR winter I'd expect NVIDIA's marketing to re-focus on power efficiency and architecture tweaks once again.

VR hype has very little to do with NVIDIA selling these cards. The primary market they are targeting with the new Quadros is offline rendering which is a pretty big market for high end CPUs and which hasn't used GPUs as heavily as you might expect until now because GPUs have not been very efficient at ray tracing extremely large scenes. I was at the presentation today and I don't even remember them mentioning VR. The emphasis was all on these new GPUs accelerating high end offline rendering for movies, TV, architecture, advertising, etc.

100%. I do a lot of renders for our marketing from time to time. Because its currently done on a CPU, anything even a little complex takes many hours to get retina renders out without fuzz.

> The market allows for this silicon giant because of the current Deep Learning and VR hype.

One of three things is true:

1) Nvidia has misstepped and no one will buy large, expensive chips

2) Companies are buying large, expensive chips because they're purchasing based on hype rather than use

3) Companies are buying large, expensive chips because even at their prices, their performance delivers greater business value

I'd argue that improvements in ML techniques have tipped us into (3). Intel and Microsoft's fortunes weren't built on being faster: they were built on allowing businesses to do new categories of work on computers.

Amdahl is the new Moore

Don't forget 4) Companies are buying large, expensive chips because they are being used and delivering value, but hype has occluded other more cost effective solutions. See the articles about, 'you don't need deep learning' (or was that big data?)

You gotta trace the money, though. I don't think ML is all hype, but if you only look one level deep you can get some silly conclusions. For example, just because Nvidia can sell these things does not mean the purchaser will ultimately recoup their value, but even if they do the clients of that business might not recoup their costs. At the end of the day, a lot of people are investing huge amounts of money into ML, but it is hugely possible that this is in part due to expectations that won't actually be met in the end.

What they were emphasising today was replacing a $2 million CPU based render farm used for offline rendering with their new hardware at 25% the cost and 10% the power usage for the same or better results. To the extent those numbers are realistic (and it seems like they are) it's going to be a pretty straightforward value proposition for companies that have render farms. This was SIGGRAPH so ML wasn't the main emphasis.

Yep, fair points. I was mostly replying regarding ML hype. Clearly, the value for renderfarms is proven at this point.

There is still the implementation, debugging and testing/validation cost.

Or... What if we do not care if the first, second or third order of customers recoup anything.

What if only what is important is that by these being available, and the first several adoption catagories identify new categories/opportunities of work that will allow for the next leap in whichever direction can only manifest on the carcasses of these early adoptions...

I expect there's certainly a lot of that.

But entry data analysts are ~USD$60k / yr?

At that price, substituting hardware for even low-hanging, first-order prediction models starts to look attractive.

NVidia has increased power efficiency and cost efficiency every generation. Sometimes they haven't increased as much as we would like, but in the absence of serious competition this is what we get.

The chip shown today (GT102?) is much larger than GP102 but smaller than GV100, so it's not clear to me what it should be compared against.

Better than intel desktop/laptop cpus which have not increased core counts in literally a decade.

The market need is not yet clear, at this point. However, the need for CPUs which are individually more powerful and which use up less power has been proven.

They're just focusing on where the money is now.

> If there were to be another AI and VR winter I'd expect NVIDIA's marketing to re-focus on power efficiency and architecture tweaks once again.

Others have already pointed out that Nvidia has been improving efficiency at a pretty steady pace (and they are way ahead of AMD in the respect.)

But Volta was much more than just an just an architectural tweak compared to Pascal.

We don’t know the details yet about Turing, but Jen-Hsun said on stage today that it can execute floating point and integer ops in parallel, which suggests that it’s also based on the Volta architecture.

> power efficiency

I don't know about their marketing, but Volta has pretty impressive perf/watt. Loads of the systems in the Green500 have NVIDIA GPUs.

I totally agree with your core point. We definitely needs to be more responsible with how we design next generation computing chips. We need to think about power consumption especially in current world where climate change, power/resource shortages are looming large at horizon.

But that’s exactly Nvidia has been doing over the last 9 years???

Even for the same process, they’ve improved efficiency for the same workloads.

And indeed NVidia placed a big emphasis on how much more power efficient this chip is for its intended workloads than an equivalent CPU based render farm, using less than 10% of the power.

There was thing called VR hype, it is no longer the case any more. The new hype is crypto.

So they're not ray tracing everything, right? If I understand correctly, they're "filling in some of the blanks" using an ML algorithm? Does that really work for general purpose rending?

It is never mentioned explicitly, but my hunch would be that primary visibility is still done using rasterisation and that ray tracing is only used for reflections, shadows and ambient occlusion.

This is how most film rendering was done ten years ago too.

They are properly raytracing soft shadows, no need for AO.

AO and soft shadows are orthogonal and are often used simultaneously. A few direct light sources (point lights, spots, infinite) with soft shadows and additionally an environment light (dome or half dome) with AO.

Yes they are raytracing some pixels and filling the blanks with ML/DL. I believe you could actually ray-trace everything depending on your target output. In these demos they're targeting many dynamic lights, reflections and refractions. If you were content with less reflection bounces, less lights and lower resolution you could easily ray-trace the whole frame with 166 mil rays per frame available.

What is your source for the ML claim? I know very decent and fast screen space filters that do kot use any learning and can work with 1spp (pure noise for the hunan eye).

ML based denoising for various ray tracing applications is a hot topic at SIGGRAPH this year. Here's one NVIDIA paper on the topic from last year: http://research.nvidia.com/sites/default/files/publications/...

There are also some impressive new non ML based denoising results too however and it's unclear to me at the moment where NVIDIA is using ML based denoising and where they are using more traditional filters in the demos they've been showing. I think at least some of the real time ray traced ambient occulsion and soft shadow demos have been using more traditional non ML based filtering approaches.

They are ray tracing everything. They use an ML-based filter to reduce sampling noise.

96GB of RAM with a NVLink connection! It will be interesting to see what happens on the RTX 2080 front next week, am hoping we finally get more RAM!

Disclosure: I work on Google Cloud.

You don’t need to wait, you can put together 8xV100s with NVLINK on major clouds including GCE and AWS. That’s 128 GB of densely connected HBM2.

May I ask... what on earth does one do with 96GB of GPU ram? I mean , what type of model or problem uses this? Pix2pix at 4K resolution?

I think it’s used to load all the geometry of really a complex scene with a lot of geometry. (Think Pixar movie.)

With ray tracing, if you can’t load the whole scene in memory, it’s simply not going to work.

Disney recently released a scene of Moana for public use. It has 15 billion primitives!


> Disney recently released a scene of Moana for public use. It has 15 billion primitives!

FWIW, that’s after instancing, which doesn’t take up much memory. The Moana scene’s data files are mostly ptx textures, the geometry is only a fraction of it. The subd geometry might expand to be larger in memory than the textures, though, depending on if it’s subsivided and how much.

Do you by any chance know how much RAM the CPUs in a render farm need?

That would help to put the 48GB in context...

Answering my own question: http://pharr.org/matt/blog/2018/07/08/moana-island-pbrt-1.ht...

“...with memory use upward of 70 GB during rendering.”

The author is later able to reduce the amount of memory significantly by rewriting small parts of his renderer.

Depends heavily on the production & team tools/workflow, and it’s been a few years since I was in production, but I personally believe hundreds of GBs is reasonably common.

The Moana scene data in PBRT form doesn’t quite contain all the data they had in production, and I don’t know, but PBRT might be better about texture caching than some production renderers too.

Textures is a big one. Film rendering sometimes references terabytes of textures.

Yes, need at least 32GB vram to render them console ports.

How can you trade mark Turing the surname of Alan Turing the inventor of the Turing machine?

Is this trademark violating the Lanham act 15 U.S. Code § 1052 section?

Lanham act "Consists of or comprises immoral, deceptive, or scandalous matter; or matter which may disparage or falsely suggest a connection with persons, living or dead, institutions, beliefs, or national symbols, or bring them into contempt, or disrepute; or a geographical indication which, when used on or in connection with wines or spirits, identifies a place other than the origin of the goods and is first used on or in connection with wines or spirits by the applicant on or after one year after the date on which the WTO Agreement (as defined in section 3501(9) of title 19) enters into force with respect to the United States."

"A mark that is primarily a surname does not qualify for placement on the Principal Register under the Lanham Act unless the name has become well known as a mark through advertising or long use—that is, until it acquires a secondary meaning. Until then, surname marks can only be listed on the Supplemental Register."


How is this any different than Tesla?

The fact that will have reasonably priced, real-time unbiased global scene rendering within 5-10 years really blows my mind. I knew this was coming for a long time, but seeing it actually happen is so odd.

After seeing the bright potential of an AI-powered future turn into a sad totalitarian reality it's nice to see a "futuristic" technology materialize with little potential for malevolent use.

Little? How about ever more realistic deepfakes?


Fake video is happening regardless (and a lot better) with generative neural network architectures. Also fake video was already possible, we just see its usage ""democratized"", in a way.

Empowering everyone to be able to make Hollywood-grade feature films with like $10k in hardware has a lot of negative side-effects, but also a lot positive. The only problem is that video is no longer hard evidence for justice. It's a huge loss, but it happened already.

I find this product anticlimactic. I'm gonna go back to my 8bit grayscale console.

Unreal and Unity are already used in TV production. This will continue. They will add some ray tracing features in the future.

Real-time TV production is here and it will get more popular. The main reason isn't because of RTX but because Unreal and Unity just look amazing using current real-time techniques.

If any product presentation should be accompanied by lots of high quality demo videos, then this.

> revolutionizing the work of 50 million designers and artists by enabling them to render photorealistic scenes in real time

Is it just me or is this extremely flowery?

Depends on whether or not they're rendering a view of a meadow.

Just you.

Free sync support yet?

How is that relevant for professional hardware?

500 trillion tensor ops/sec? So if that’s one chip... 11x faster than a Google TPU (or ~3x faster than the four chip modules)

No, that's for INT4 ops

Sure, I guess I meant: if the task can be run with 4 bit math...

Google TPU does not have 4 bit ALUs.

I don’t think that negates my point? If you have tensor ops that can get away with 4-bit precision, this is a great chip for you.

Yes, on INT4 tasks this chip will be faster than TPU. That's a fairly rare use case.

Is that a limited use case because not many workloads map to INT4 or has this Avenue simply not been explored because there weren’t any INT4 processors?

My understanding is that during inference, precision is often not critical, and that some workloads even work with 1 bit?

NN quantization has been an area of active research in the last 3 years, but it's not trivial when going to 4 bits or below. Usually to achieve good accuracy during inference, a model needs to be trained or finetuned at low precision. The simple post training conversion usually won't work (it does not always work even at 8 bits). Models that are already efficient (e.g. MobileNet) are harder to quantize than fat, overparameterized models such as AlexNet or VGG. Increasing a model size (number of neurons or filters) helps, but obviously it offsets the gains in efficiency to some degree. Recurrent architectures are harder to quantize.

See Table 6 in [1] to get an idea of the accuracy drop from quantization, it seems like 4 bits would result in about 1% degradation, which is pretty good. However, as you can tell from the methods they used to get there, it's not easy.

[1] https://arxiv.org/abs/1807.10029

Can this be used in a VR setting? Is this the first step toward building a viable holodeck?

1. Yes 2. No, what are you smoking? I want some.

A "viable" Holodeck is a technology sufficiently advanced compared to the present day so as to be indistinguishable from magic. Ergo, no Holodeck anytime soon. No Arnold Judas Rimmer hard light holograms either because WTF is a hard light hologram in the first place?

photons do have momentum...

enough photons to make you feel that momentum will also carry enough energy to vaporize you.

Well, when you put it that way..

Is this the first step toward building a viable holodeck?

No. Don't be fooled by the marketing. No one knows how to produce realistic-looking 100% CGI. Raytracing is even worse in some respects than traditional raster based graphics, and better in other respects.

Here's a heuristic that has never failed: If someone claims to have found a way to make photorealistic graphics, ignore them, because they're lying.

(This is true until it isn't, but at no point in history has it ever been true, yet. But that doesn't stop thousands of people and companies from claiming it.)

This hasn't been true for a while for certain applications under the right circumstances. There are architectural renderers, product renders and certain movie shots that will fool 99% of people into thinking they're looking at a photograph. Once motion is involved it gets much harder but there are still snippets of rendered video that could pass for real video.

It is true however that the problem is not fully solved in general. There are still things that are not handled very well in still renders and animation (particularly human animation) breaks down in more situations.

Watching the short "The Third & The Seventh"[0] was the last time I believed any photo or video of architecture to be real, and that was already 8 years ago!

[0]: https://vimeo.com/7809605

Photon mapping with physically based quantum BRDFs would be "realistic-looking 100% CGI." Because you'd be doing an exact physics-based simulation of electromagnetic light interacting with physically accurate materials.

There's no way in hell you'd be able to do it real-time with models of sufficient detail though. So I presume you mean "... in realtime."

If there's one thing that VR has taught us is that photorealism is completely unnecessary for immersion.

Pure raytracing sure, but raytracing assisted by other methodologies can get you farther. Imagine doing low resolution ray tracing and then compositing that information with other rendering techniques. If you raytrace then interpolate you can get really really good.

afaik raytracing is not worse if given the time

needs to be combined with diffusion lighting or it doesn't look right.

I can't decide if the caption writer for the image was incompetent or is putting massive spin on very poor frame rates:

"NVIDIA Turing architecture-based GPUs enable production-quality rendering and cinematic frame rates..."

Raytracing at 24fps?

Well it takes Pixar 29 hours to raytrace a single frame, so I would consider this much faster than current methods.

First, those render times sound hugely inflated to me.

Second, this is apple and oranges. Just look at 'ray tracing' is simple and naive. Pixar uses their own renderer, PRman, which has been developed over multiple decades. There is a big difference in renders for film, including micropolygons, texture filtering, more textures, higher res textures, way fewer lighting cheats, much more customization, motion blur, depth of field, sub surface scattering, multi-layered glossy surfaces that are neither fully specular or diffuse, better area lighting models, etc.

With 29 hours per frame it would take 378 years to render Toy Story 3 with ~144k frames.

I have no idea if the 29 hour figure is accurate. However, regardless of how long it takes, I would assume that frames are rendered in parallel.

The original Coco frame times were 1000hrs/f, thanks to millions of light sources (and coming in at 30GB to 120GB of scene data). But engineers got that down to 50hrs/f over the course of production - light acceleration structures that have now ended up in RenderMan 22. (These numbers are single core benchmarks, so you then need to scale by some factor of numbers of cores you have available). In general, I think it’s the huge scene data size that keeps feature films mostly on the CPU.


Being why nvidia are selling GPU's with 48GB of ram...

It takes 29 hours for a single CPU core (or thread?) to render one frame, which is reasonable for Pixar-quality scenes.

Big Hero 6 was rendered using 55,000 CPU cores in parallel, which would bring the final render time closer to 70 hours for 144k frames (assuming maximum efficiency and ignoring all the tests and overhead).

For reference you can download a production benchmark scene (https://www.blender.org/download/demo-files/) from the Blender team's short film and try to render it at 1080p — my 16-thread Ryzen CPU takes over an hour to finish one frame.

Also feature films are rendered in 4K or above which quadruples the render time over 1080p.

Parallelism is a thing.

29 hours wallclock is long for a render but not uncommon.

I'm guessing 29h is kind of an upper limit on the most complex scenes.

But not all scenes have the same number of lights and elements.

It won't take 29h to render the title screen for example.

How exactly is good quality Raytracing at 24 fps not impressive?

IIRC in the conference they showed ~40ms+ per frame for the Star Wars demo so that would be around the 24 fps region.

Is this path tracing or ray tracing? both seem to be mentioned. The examples look too physically accurate for ray tracing but I know it's often incorrectly used as a synonym for a superset of path tracing (which it's not).

It is not incorrect to use "ray tracing" as a generic term. Path tracing is a kind of ray tracing. The actual process accelerated by the hardware is finding the intersection of a ray with a set of triangles, which can be used for any kind of ray tracing including path tracing.

I disagree, it's quite a different algorithm, the main difference being ray tracing traces a path to each light in the scene at each intersection which requires far less steps to produce a visually pleasing image but is physically inaccurate, whereas path tracing takes no shortcuts, it traces a new ray at each intersection based only on the BRDF function until it either reaches the recursion limit or hits a light source, this is why it's so accurate but slow... It's not an extension of the ray tracing algorithm.

But as others point out I suppose this hardware works at a lower level than either of these algorithms by essentially just accelerating ray intersections.

You are mistaken about the term "ray tracing". It is a generic term. The specific algorithm you describe is "Whitted ray tracing", a particular type of ray tracing.

That's what most people mean when they say ray tracing... but who says ray tracing path tracing... no one. There's plenty of articles comparing the two and they almost always refer to the former as just "ray tracing".

Most resources I've seen refer to the whole class of methods that involve tracing rays as "raytracing," and use "path tracing" to refer to the special case of raytracing with a Monte Carlo integrator.

I'm not sure why you are so adamant that it is incorrect; it's fairly common to refer to more than just ray tracing with a Whitted integrator with the term.

These terms tend to be used interchangeably. In the context of your original quote I think path tracing is the more accurate term. Even Arnold Renderer is described as a raytracer. ( https://en.wikipedia.org/wiki/Arnold_(software) )


So... A GPU for (3D) graphics processing? That hardly sounds unique ;)

Would be nice if they’d unveil the next generation of consumer GPUs instead of sitting on them just to milk more profits out of the current generation.

Tech companies rarely sit on finished engineering work. If they are not released it’s because they are not ready

It is finished, at least the R&D. The GTX 10xx cards are based on Pascal, since which we've had Volta and now Turing. Both Volta and Turing have only been used in workstation and server class cards.

There are a few things in play.

- Nvidia has no competition in the consumer graphics card space; AMD is so far behind that Nvidia can release a (completely amazing) architecture in 2016 and have it still be top of class.

- Consumer graphics demands are still pretty much met with Pascal, though this could be a chicken/egg problem. 4K monitors are pricey, and >60fps 4K has only now just entering the market, and even then a 1080Ti can drive it alright. At every other resolution, you get a 1070, 1070Ti, or a 1080 and you're set. The demand isn't there on the high-end, the only thing a new arch would help is delivering better performance at a cheaper price.

- Every non-consumer application of GPGPUs is exploding. Rendering, AI, Cloud, they all need silicon, and every wafer you "waste" in the low margin consumer segment is a wafer that could have went to one of these markets. I mean, its not that simple, but that's the idea.

- The one consumer use that is exploding (less so today, but in months past) is crypto mining, which is highly volatile. Nvidia likely doesn't want to encourage this use. You've got miners in China buying up thousands of consumer cards, and whenever crypto crashes they enter the used market, driving down demand for first party cards.

- Much of the rest of the consumer markets are surprisingly dominated by AMD. AMD has consoles and Mac on lockdown. A lot of this is because Nvidia has always been unwilling to play in this space, but let's say you're senior leadership at Nvidia. You have two choices: Play with Apple and ship silicon in the Mac even though Apple themselves clearly isn't committed to the platform, or play with Amazon, Microsoft, etc and ship silicon to the datacenter, which Mac users will often end up using anyway (AI, cloud rendering, etc). And hey, you want that local processing power, just use Windows. No brainer.

My guess is that they'll keep Pascal alive for 2018 to clear out inventory, and we'll see a line of Volta-based consumer cards in 2019. The want to establish a completely solid moat in these high growth markets before fishing for pennies in the consumer markets with their older established architectures.

The only Volta asic so far is the gv100. They are not high yield parts, since as another poster pointed out, is likely the largest commercially available asic. They'll eventually make lower versions of that, but it hasn't been out that long.

Why would they release new consumer GPUs now?

HPC and datacenter are way more profitable and they already lead gaming. They've had faster gaming silicon for literally years. AMD's failure to release anything competitive is why you don't see faster gaming GPUs.

NVIDIA's biggest threat right now is being labeled a monopoly.

Also there are very few games that use intensive raytracing right now. It'll take some time to get the libraries embedded into graphics frameworks and popular engines. I don't imagine they'll release a consumer Turing card until they have at least one game they can use to market it with.

Or because they need to clear the older inventory a bit more

I would say that happens normally, but recent reports say Nvidia has a large glut of older silicon. I'm sure they want to clear that inventory.

That is set to happen at Gamescom in only a few weeks.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact