29 hours for a frame is high but not unusual. Referred to as "Blinn's Law", the tendency is that scene assets grow with computing power, eating up any performance gains that you may have gained from newer hardware. Render time for an animated movie today is still in the same order of magnitude as it was in 1995, using state of the art hardware.
Now, how can they do that at 25fps all of a sudden? They're not. They're using the same method, but not even remotely the same scene complexity or image fidelity.
Ray tracing, and its extension path tracing, scales very well. The same algorithm that works for 500 polygons works for 500 million polygons, as long as you can fit them in memory - it's just going to be slower (building quality acceleration structures is often O(n log(n)). If 10 rays per pixel don't give you the quality you want, you can go up to 100 or 1000 rays per pixel, with render time going up respectively.
For film, we're rendering hundreds of primary rays per pixel (tens of secondary rays for each primary ray), with hundreds of millions of polygons per scene. I have no insight about the scene of Nvidia's demo, but it is claimed to run at 5 rays per pixel. So even if scenes were identical, it's already 20-500 times faster just by doing 20-500 times less work.
Shading complexity is also a huge difference. Production rendering uses tens of GBs up to the TB range just for textures, on top of procedural shading. Not only does the computing part of that often dominate render times (instead of the ray tracing part), but just paging texture data in and out of memory on demand can become a bottleneck by itself.
That Nvidia demo is a simple scene with no I/O overhead, rendered using a fraction of the rays used for film production, on 18 billion transistors of specialised hardware. Of course it's several orders of magnitudes faster, how could it not be?
I remember years ago, us developers would spin up slave program overnight so that the graphics guys could eke out a few frames of ray traced images.
It's hilarious to me that Blinn's Law is alive and well (although the state of the web itself shows it). It was pre 3d card (and up until 3fdx/ early nvidia days), and drove me nuts that every time we managed to get another fps, the graphics guys would add a few hundred polys to be rendered by the realtime engine.
Artists can previsualize almost everything that’s happening in the renderer, often in real-time or close to it. It’s just the final frames that go to production that take hours to render.
I'm a senior engineer, made my own renderer and software "suite" to go with it as a hobby project for the resume, but I'm not sure what else I can make to get my foot in the door. Do you have any general advice?
In my opinion, don't write your own renderer unless you can really invest a lot of time into it. There are countless "spheres in a Cornell box" path tracers out there, you'll not excite anyone with the basics. Contribute to existing software like Blender or Appleseed instead, that will give you a lot more visibility and demonstrate that you can write robust code.
Companies like that are constantly staffing up and laying people off with the film cycles so it shouldn’t be too hard to get in.
Once in you just have to keep your eyes peeled for a good job in an abusive industry.
I get that doing 100 Ray per pixel gives you quality that an expert need to look hard to find defects given expensive calibrated displays in well lighted controlled environment. But a normal user isn’t doing that. Their room lighting is already washing out on several artifacts. Their TV sets are not even properly calibrated or even 4K. For those common people, what is good enough?
In the old days, artists would change the scenes to match the technology. If human skin couldn't be rendered well, you just showed off plastic and metals. Or, they'd make a "cartoony style" and hope that the audience would get used to the plastic barbie-looking skin.
Or in the case of Toy Story: you just make a story revolve around plastic toys. Plastic looks good, remember, in 90s CGI. As long as actually humans are rare, the movie will look "realistic".
But you can't just keep making movies about Toys with plastic-like skin. And improving the algorithms to have realistic skin requires harder calculations such as "subsurface scattering".
Today, artists want to show off more materials, and they also want all those materials to be more believable. Pixar is insane with their details: they spent two years on creating the "Hank the Octopus" in "Finding Dory". And it shows: good reflections off the skin highlight the wetness of the Octopus, Hank walks and moves very much like a real octopus, etc. etc.
In short: there are some materials which don't require much processor time. You probably can get away with a toy-story like plastic look and feel with 10 samples per pixel (talking 90s era here, where things were still relatively flat and primitive).
But if you want to create a CGI Grand Moff Tarkin in an otherwise live-action movie (Star Wars: Rogue One)... you'll need to do better than that. These scenes will naturally require 100s or 10,000 even samples before the human eye is tricked.
Real-time demo at 19m30s.
But if the consumer can't tell the difference, does it matter?
Many films - in this example because we are talking about films - use a variety of visual techniques to reduce the complexity. Horror films for example rarely show the antagonist. The creature is shown in the dark, for split seconds, until a final expensive moment, leaving your brain to fill the rest in.
Although YOU can't unsee it, the audience, nor the studio, should care that the stormtroopers are rendered on a simple set with the snowstorm being used solely to avoid the necessity of rendering. The sound of the creature gave the effect of a complex monster lurking in the midst and I don't need a raytracing demo to show that.
The consumer gets used to it. The CG effects in Toy Story 1 look dated, and if you were to watch the Incredibles and the Incredibles 2 back to back, you would also notice immediately. Or compare the movie "Gravity" to "Babylon 5".
Plus I've not even mentioned hair, fur and volumetrics. A staple in pretty much any feature film FX, they were completely absent in the real time ray tracing demos. Hair and fur are commonly not ray traced as triangles but use custom intersection routines. Those are an extra layer of complexity on top. I don't know yet how Nvidia's own API for the RTX card looks like, but Microsoft DXR and Apple's Metal ray tracing both are restricted to triangle geometry only.
This. You could do 100x more polygon and Ray Tracing, but consumer knows it is still CG.
I often wonder what will happen next? Once the economy of scale runs out for shrinking transistors. We have 7nm, 5nm and 3nm, and may be 450mm Wafers that leads to 2nm. We may only have 16x more transistor density improvements, and it seems 16x performance I still isn't enough for CG production.
I do remember Zoolandia a striking in terms of effects though.
I find a lot of movies today are falling in the Lucas trap: too much CGI (cost reign supreme I guess). It's a bad soil for movies...
Lets skip past the discipline and lets focus on the consequences. What are the consequences here?
Why is Nvidia's new chip potentially failing product market fit?
I'm merely explaining why and how Nvidia can show 5 rays/pixel @ 25pfs while we're at the same time budgeting several hours per frame for state of the art production rendering. Sure, more and better hardware will help film rendering. It always has.
Headline specs for Turing:
754mm^2 die area
18.6 billion transistors
up to 16TFLOPs
up to 500 trillion tensor ops per second
It looks like an absolute beast, but don't expect to see anything remotely like it at a consumer price point.
* Quadro RTX 8000 with 48GB memory: $10,000 estimated street price
* Quadro RTX 6000 with 24GB memory: $6,300 ESP
* Quadro RTX 5000 with 16GB memory: $2,300 ESP
It looks like their entire website is a huge legacy mess.
"news.nvidia.com" leads nowhere.
"nvidia.com/news/" leads to "nvidia.com/content/redirects/news.asp" and produces "Corrupted Content Error"
The page you are trying to view cannot be shown because an error in the data transmission was detected.
Is this just routine new cycle stuff that amazon and microsoft are going to buy en masse like they already would have no matter what Nvidia came out with?
Or is this a big deal for reasons that you will explain and the stock market is going to go wild attempting to "price this in"?
For reference, I purchased my first 3D card from Evan's and Sutherland to run Softimage with 32MB Ram in ~1996 for $1,800.
I won an AT&T Safari NSX/20 laptop  in 1992 in the ACM Programming Competition. RRP was $5749 then, for a 386SX processor running at 20MHz, 4MB RAM and a monochrome screen. $10,200 in today's money. It was actually a beautifully made machine.
A year later, I switched to a Dell with 386DX and a 387 math coprocessor because my PhD needed the number crunching. That cost twice as much (i.e. around $20k in today's money), paid by the military lab sponsoring my research.
In our current times of cheap compute, it is easy to forget how much top-end computers cost 25-30 years ago.
YES! I was at intel when they were doing the initial research to even see if a <$1,000 machine was feasible. With the celeron... and this is when they were literally paying millions to companies to optimize to the intel processors so they had software that would be subjectively digestible by the market to purchase software and machines thinking that they were getting compute power for their buck.
But the amount of actual capability one gets from one of these cards is thousands of times more than what one could do in 1996.
I sometimes laugh to myself when people complain about the price of GPUs. Yes, $10,000 is a lot, but in historical context, it's pretty reasonable for top of the line technology.
The same thing has happened with RAM and storage. My first Linux PC had 20 MB of RAM and 80 MB of disk and was sufficient to do most of my CS projects at university in the early-mid 1990s. Now, a sub $200 smartphone has over 100x the space, while desktops are commonly 2000x.
The research side not only moved on 1000x to "petascale" but that's now boring and there is real talk of "exascale" with the same gleam in the eye. One million times the performance we dreamed of at the beginning of my career, though I think this is partly by expanding the scope of one machine to larger and larger distributed systems as well as scaling up the capacity of individual elements.
Previously those cards have been the ~$8k-$10k P100 or V100. Not any more.
Strictly based on the perf metrics you listed, AMD already sells something close (matching TFLOPS within 15%), and at a consumer price point:
AMD Radeon RX Vega 64 Liquid ($699):
• 486 mm²
• 12.5 billion transistors
• up to 13.7 TFLOPS (single precision, which is probably what Nvidia is quoting, although I'd like to see confirmation)
• no tensor ALU though
How would this Quadro RTX compare to a Titan GTX 1080 Ti for games with standard rendering techniques?
Different focus. Gaming GPUs are tweaked for speed. Workstation GPUs are tweaked for precision. Some other considerations like compatibility, thermal/power efficiency, and duty cycles also factor in.
This was borne out more obviously ~10 years back when you could soft-mod some nvidia gaming GPUs to be quadros, turning a $200 gaming GPU into a $600 workstation card just by melting a hardware fuse. Today, the cards are probably too complicated to allow for that, even if you do find a way of tricking the drivers.
Pascal die sizes were pretty small. The Turing die size is much larger on (essentially) the same process.
You’re seeing the power of improving yields as the process matures.
It’s likely that a 7nm chip with the same functionality as this one would cost more for the foreseeable future, so it makes sense to make it this big.
The BW of the infinity fabric is but a fraction of what’s needed for a GPU, so that’s a total non-starter.
> NVIDIA invents the graphics processing unit, putting it on a path to reshape the industry. GeForce 256 is launched as the world's first GPU, a term NVIDIA defines as "a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second. 
Even 3Dfx, the chief competitor to NVIDIA, made “graphics cards” or “3D accelerators” not “GPUs”. At least in that fight NVIDIA used the GPU term first, to emphasize that hey did transformation and lightning as well as rasterization. Although it wasn’t until a few generations later with the first register programmed GeForce architectures that we had anything resembling what we call GPUs today.
A search indicates many uses of the GPU term.
In a parallel universe where 3dfx survived and thrived, maybe they would be defining FSAA to be an important part of the first modern "GPU"
Register-programmed T&L on the other hand was the first programmable SIMD shader pathway. After Stanford showed somewhat remarkably in 2000-2001 that it was powerful enough to be the target of a RenderMan-like shader language, it pushed architectures towards full programmability.
Now we think of the "GPU" as a vector processing unit able to run arbitrary kernels across massively parallel data sets, with high bandwidth connections to its own high-speed RAM. The fact this is useful for real time graphics rendering is almost incidental, as TFA demonstrates. That whole revolution, and the impact it had on scientific compute, machine learning, and much else, began with the first register-programmable GeForce (the GeForce 2 I believe? But Wikipedia says the GeForce 256 supported the same architectural feature, but was just not user accessible).
 https://graphics.stanford.edu/projects/shading/ Although pursuing the website it looks like it was a general OpenGL 1.1 pipeline, rather than using the NVIDIA extensions as I remember. However these were translated by NVIDIA's driver into register presets for the hardware T&L engine.
Besides the increase in raw compute, what is materially different this time around?
Caustic Graphics as well, IIRC.
> NVIDIA's (NASDAQ:NVDA) invention of the GPU in
> 1999 sparked the growth of the PC gaming market,
> redefined modern computer graphics and revolutionized
> parallel computing.
From Wikipedia: https://en.wikipedia.org/wiki/GeForce_256
> GeForce 256 was marketed as "the world's first 'GPU',
> or Graphics Processing Unit", a term Nvidia defined at
> the time as "a single-chip processor with integrated
> transform, lighting, triangle setup/clipping, and
> rendering engines that is capable of processing a
> minimum of 10 million polygons per second."
Certainly, the Geforce series did some things first. It was the first to cram certain things onto a single chip and target it at the consumer market. But their claim to have "invented the GPU" is just silly.
This appears to be a GPU design for those applications where extreme (by mobile and desktop standards) power consumption and cost can be justified. Correspondingly NVIDIA has been trying to move into the data center for deep learning and rendering. They pushed out a lot of good research on how to do the computationally expensive parts of rendering on the server, and leave the client to do the last interactive part locally.
The market allows for this silicon giant because of the current Deep Learning and VR hype. If there were to be another AI and VR winter I'd expect NVIDIA's marketing to re-focus on power efficiency and architecture tweaks once again.
One of three things is true:
1) Nvidia has misstepped and no one will buy large, expensive chips
2) Companies are buying large, expensive chips because they're purchasing based on hype rather than use
3) Companies are buying large, expensive chips because even at their prices, their performance delivers greater business value
I'd argue that improvements in ML techniques have tipped us into (3). Intel and Microsoft's fortunes weren't built on being faster: they were built on allowing businesses to do new categories of work on computers.
Amdahl is the new Moore
What if only what is important is that by these being available, and the first several adoption catagories identify new categories/opportunities of work that will allow for the next leap in whichever direction can only manifest on the carcasses of these early adoptions...
But entry data analysts are ~USD$60k / yr?
At that price, substituting hardware for even low-hanging, first-order prediction models starts to look attractive.
The chip shown today (GT102?) is much larger than GP102 but smaller than GV100, so it's not clear to me what it should be compared against.
They're just focusing on where the money is now.
Others have already pointed out that Nvidia has been improving efficiency at a pretty steady pace (and they are way ahead of AMD in the respect.)
But Volta was much more than just an just an architectural tweak compared to Pascal.
We don’t know the details yet about Turing, but Jen-Hsun said on stage today that it can execute floating point and integer ops in parallel, which suggests that it’s also based on the Volta architecture.
I don't know about their marketing, but Volta has pretty impressive perf/watt. Loads of the systems in the Green500 have NVIDIA GPUs.
Even for the same process, they’ve improved efficiency for the same workloads.
This is how most film rendering was done ten years ago too.
There are also some impressive new non ML based denoising results too however and it's unclear to me at the moment where NVIDIA is using ML based denoising and where they are using more traditional filters in the demos they've been showing. I think at least some of the real time ray traced ambient occulsion and soft shadow demos have been using more traditional non ML based filtering approaches.
You don’t need to wait, you can put together 8xV100s with NVLINK on major clouds including GCE and AWS. That’s 128 GB of densely connected HBM2.
With ray tracing, if you can’t load the whole scene in memory, it’s simply not going to work.
Disney recently released a scene of Moana for public use. It has 15 billion primitives!
FWIW, that’s after instancing, which doesn’t take up much memory. The Moana scene’s data files are mostly ptx textures, the geometry is only a fraction of it. The subd geometry might expand to be larger in memory than the textures, though, depending on if it’s subsivided and how much.
That would help to put the 48GB in context...
Answering my own question:
“...with memory use upward of 70 GB during rendering.”
The author is later able to reduce the amount of memory significantly by rewriting small parts of his renderer.
The Moana scene data in PBRT form doesn’t quite contain all the data they had in production, and I don’t know, but PBRT might be better about texture caching than some production renderers too.
Is this trademark violating the Lanham act 15 U.S. Code § 1052 section?
"Consists of or comprises immoral, deceptive, or scandalous matter; or matter which may disparage or falsely suggest a connection with persons, living or dead, institutions, beliefs, or national symbols, or bring them into contempt, or disrepute; or a geographical indication which, when used on or in connection with wines or spirits, identifies a place other than the origin of the goods and is first used on or in connection with wines or spirits by the applicant on or after one year after the date on which the WTO Agreement (as defined in section 3501(9) of title 19) enters into force with respect to the United States."
"A mark that is primarily a surname does not qualify for placement on the Principal Register under the Lanham Act unless the name has become well known as a mark through advertising or long use—that is, until it acquires a secondary meaning. Until then, surname marks can only be listed on the Supplemental Register."
After seeing the bright potential of an AI-powered future turn into a sad totalitarian reality it's nice to see a "futuristic" technology materialize with little potential for malevolent use.
Empowering everyone to be able to make Hollywood-grade feature films with like $10k in hardware has a lot of negative side-effects, but also a lot positive. The only problem is that video is no longer hard evidence for justice. It's a huge loss, but it happened already.
Real-time TV production is here and it will get more popular. The main reason isn't because of RTX but because Unreal and Unity just look amazing using current real-time techniques.
Is it just me or is this extremely flowery?
My understanding is that during inference, precision is often not critical, and that some workloads even work with 1 bit?
See Table 6 in  to get an idea of the accuracy drop from quantization, it seems like 4 bits would result in about 1% degradation, which is pretty good. However, as you can tell from the methods they used to get there, it's not easy.
A "viable" Holodeck is a technology sufficiently advanced compared to the present day so as to be indistinguishable from magic. Ergo, no Holodeck anytime soon. No Arnold Judas Rimmer hard light holograms either because WTF is a hard light hologram in the first place?
No. Don't be fooled by the marketing. No one knows how to produce realistic-looking 100% CGI. Raytracing is even worse in some respects than traditional raster based graphics, and better in other respects.
Here's a heuristic that has never failed: If someone claims to have found a way to make photorealistic graphics, ignore them, because they're lying.
(This is true until it isn't, but at no point in history has it ever been true, yet. But that doesn't stop thousands of people and companies from claiming it.)
It is true however that the problem is not fully solved in general. There are still things that are not handled very well in still renders and animation (particularly human animation) breaks down in more situations.
There's no way in hell you'd be able to do it real-time with models of sufficient detail though. So I presume you mean "... in realtime."
"NVIDIA Turing architecture-based GPUs enable production-quality rendering and cinematic frame rates..."
Raytracing at 24fps?
Second, this is apple and oranges. Just look at 'ray tracing' is simple and naive. Pixar uses their own renderer, PRman, which has been developed over multiple decades. There is a big difference in renders for film, including micropolygons, texture filtering, more textures, higher res textures, way fewer lighting cheats, much more customization, motion blur, depth of field, sub surface scattering, multi-layered glossy surfaces that are neither fully specular or diffuse, better area lighting models, etc.
Big Hero 6 was rendered using 55,000 CPU cores in parallel, which would bring the final render time closer to 70 hours for 144k frames (assuming maximum efficiency and ignoring all the tests and overhead).
For reference you can download a production benchmark scene (https://www.blender.org/download/demo-files/) from the Blender team's short film and try to render it at 1080p — my 16-thread Ryzen CPU takes over an hour to finish one frame.
Also feature films are rendered in 4K or above which quadruples the render time over 1080p.
But not all scenes have the same number of lights and elements.
It won't take 29h to render the title screen for example.
But as others point out I suppose this hardware works at a lower level than either of these algorithms by essentially just accelerating ray intersections.
I'm not sure why you are so adamant that it is incorrect; it's fairly common to refer to more than just ray tracing with a Whitted integrator with the term.
There are a few things in play.
- Nvidia has no competition in the consumer graphics card space; AMD is so far behind that Nvidia can release a (completely amazing) architecture in 2016 and have it still be top of class.
- Consumer graphics demands are still pretty much met with Pascal, though this could be a chicken/egg problem. 4K monitors are pricey, and >60fps 4K has only now just entering the market, and even then a 1080Ti can drive it alright. At every other resolution, you get a 1070, 1070Ti, or a 1080 and you're set. The demand isn't there on the high-end, the only thing a new arch would help is delivering better performance at a cheaper price.
- Every non-consumer application of GPGPUs is exploding. Rendering, AI, Cloud, they all need silicon, and every wafer you "waste" in the low margin consumer segment is a wafer that could have went to one of these markets. I mean, its not that simple, but that's the idea.
- The one consumer use that is exploding (less so today, but in months past) is crypto mining, which is highly volatile. Nvidia likely doesn't want to encourage this use. You've got miners in China buying up thousands of consumer cards, and whenever crypto crashes they enter the used market, driving down demand for first party cards.
- Much of the rest of the consumer markets are surprisingly dominated by AMD. AMD has consoles and Mac on lockdown. A lot of this is because Nvidia has always been unwilling to play in this space, but let's say you're senior leadership at Nvidia. You have two choices: Play with Apple and ship silicon in the Mac even though Apple themselves clearly isn't committed to the platform, or play with Amazon, Microsoft, etc and ship silicon to the datacenter, which Mac users will often end up using anyway (AI, cloud rendering, etc). And hey, you want that local processing power, just use Windows. No brainer.
My guess is that they'll keep Pascal alive for 2018 to clear out inventory, and we'll see a line of Volta-based consumer cards in 2019. The want to establish a completely solid moat in these high growth markets before fishing for pennies in the consumer markets with their older established architectures.
HPC and datacenter are way more profitable and they already lead gaming. They've had faster gaming silicon for literally years. AMD's failure to release anything competitive is why you don't see faster gaming GPUs.
NVIDIA's biggest threat right now is being labeled a monopoly.