A lot of Nvidia's new RTX tech may be marketed towards gaming but it's actually ...

m12k · on May 11, 2020

If nothing else the tech allows 3D artists and level designers to get instant feedback while they are tweaking a scene, without having to wait for a full render/lightmap bake - having iteration time go from minutes to seconds is a game changer for creative work like that.

fxtentacle · on May 11, 2020

But sadly, it doesn't work like that.

Let's say I render a reflective vase in V-Ray and use NVIDIA OptiX denoising. The ray-tracer will cast a few specular rays so that some pixels will get super bright from having a direct reflection of the light source, while other pixels will not yet receive specular illumination. The OptiX denoising will then blur things out, so that the entire vase is equally brightened up.

But that quick preview looks so significantly different from the final rendering result, that it is effectively useless and, of course, highly misleading. In the final result, only the parts of the surface where the curvature is "just right" will receive specular. In my preview, everything will.

dahart · on May 11, 2020

You’re attempting to draw a broad conclusion from a contrived corner case example. All you have to do is wait a little longer for more specular samples. The question is whether you wait for less time with denoising than without to get a pretty good preview, and the answer is yes. Many artists are really happy with denoising, and would rather have it than not, regardless of whether the processor is a GPU or CPU.

fxtentacle · on May 11, 2020

Well I had to pick an example to illustrate my grief with denoising, but the problems are not limited to my example.

In general, if you take an AI to replace your noisy image with a noise free but plausible alternative, you should be aware that you're replacing content, so that this can change what you see to the point where it is not representative of the fully converged image anymore.

dahart · on May 11, 2020

In general, it's still true that denoising a higher quality input will yield a higher quality (closer to resolved) output, so all you have to do is wait a little longer.

Denoising is typically applied in interactive environments where you don't get a static result, you get a new denoised result that includes more samples every frame. Undersampling is easy to spot because the denoised image jumps a lot every frame. Once it stops moving a lot, you can have a lot more confidence.

So the problem you're citing is not one that you're stuck with, and it's not a problem in practice.

m12k · on May 12, 2020

I don't know if you noticed that one of the use cases I mentioned was level artists. 9 times out of 10, a level artist isn't trying to optimize some specular highlight in a vase, they are trying to see if this room is lit in a good way (aesthetically pleasing, the player can see the right things and so on). So I agree that there are use cases for which a fast denoised preview isn't useful, like the one you mention, but there certainly are ones for which it is quite sufficient, and infinitely better than waiting around for minutes waiting for the lightmaps to bake GI.

virtualritz · on May 11, 2020

Using GPUs for offline rendering to gain speed is a dangerous fallacy. Dangerous because the GPU has many restrictions which developers have to invest time in working around.

I feel in recent years makers of offline renderers have spent more time dealing with these restrictions to get their renderers to work well with GPUs than actually improving their renderers. I.e. reduce level of noise, better algorithms for hair, volumes, etc.

The renderers produce way too much noise (or have prohibitive render times w/o except for big studios).

Solution? Denoising! Which is (mostly) using GPU-powered ML!

The irony of this seems lost on many freelancers who buy expensive GPU rigs to be able to deliver their jobs on the likes of OTOY or Redshift.

For this money you can buy a vastly superior CPU rig. If only your renderer could use it! Most people compare apples to oranges here. When I hear people rave about GPU offline renderers, I usually ask them:

- Have you made time to first pixel part of your comparison (complex scenes can take dozens of minutes before they start rendering on GPU renderers).

- Are you comparing images as they come out of renderer A before denoising to images of renderer B before denoising? Because anyone can pipe their shit through Intel Open Image Denoise after the fact. It's free even.

- Have you actually used a machine that has more than eight cores personally in production? If all your money goes into buying expensive graphics cards your CPU is usually shite.

A notable exception is 3Delight which doesn't even ship a denoiser. On simple scenes the renderer can't beat the GPUs but on anything a bit more complex -- the stuff you deal with in actual production -- it smokes the competition CPU & GPU alike.

Not least in the amount of noise it produces using comparable sampling settings. Without using any GPU compute.

When this GPU craze took over other vendors who's renderers were traditionally CPU bound too, in the last 5-6 years, 3Delight's developers have spent all their time finessing the performance and output quality w/o relying on GPUs. It shows.

dahart · on May 11, 2020

The trend in renderers is extremely clear, nearly everybody is building a GPU version, and the makers and their users all report pretty big speedups with GPUs...

> Because anyone can pipe their shit through Intel Open Image Denoise after the fact. It’s free even.

The same is true of the OptiX denoiser. I’m not sure what point you’re making?

> Not least in the amount of noise it produces using comparable sampling settings. Without using any GPU compute.

The amount of noise has nothing to do with GPU vs CPU.

fxtentacle · on May 11, 2020

No. My scenes usually need 40+ GB of memory to render. Trying to put that onto a 11 GB RAM GPU will swap like hell and be excruciatingly slow.

It's rendering much faster on CPU.

dahart · on May 11, 2020

Huh? No what? What are you referring to? Is this a reply to a different sub-thread?

> Trying to put that onto a 11 GB RAM GPU will swap like hell and be excruciatingly slow

The comment you replied to didn't mention memory, but what renderer are you currently using that will swap while using the GPU?

What are you referring to when you say "that", in "Trying to put that onto a 11 GB RAM GPU"?

Does the RAM limit of your GPU mean that mine will go slower too?

virtualritz · on May 11, 2020

I am reading between the lines here but I think he is saying that the 'trend' you are talking about is a figment of your own and the imagination of some marketing folks from some companies -- combined.

I.e. it has no substance from a user's pov a priori (his point) and it has no substance from the pov of someone looking at the numbers of this a posteriori (my point, elsewhere in this tread, which I am happy to actually back up any time).

I mean a Quadro 8k RTX can stash 48GB. Standard on 3D artist workstations is 128GB today. Even my freelancer friends have this in their boxes now at the very least.

Go figure what is standard RAM size on render rigs on farms these days based on that ...

And that's not even considering compute restrictions on these GPU rigs that make them simply unfit for certain scenes.

dahart · on May 12, 2020

Please do, instead of claiming you can back it up, please just do it, what are you waiting for?

The list of offline renderers adding GPU ray tracing support is pretty long. If you think the trend isn't real, then are you saying you believe the list isn't growing? If you think it's imagination, maybe you could produce the list of serious commercial renderers that are not adding GPU support, and perhaps evidence they're not currently working on it.

RenderMan, Arnold, Blender, Vray, Modo, RedShift, Iray, Clarisse, KeyShot, Octane, VRED, FurryBall, Arion, Enscape, FluidRay, Indigo, Lumion, LuxRender, Maxwell, Thea, Substance Painter, Mantra... pretty sure there are whole bunch more... not to mention Unreal & Unity.

It's quite true that memory limits are a serious consideration. Which is why, currently, GPU renderers that swap aren't generally a thing. They will be in the future, but right now you get CPU fallback, not swap. So seeing the claim about swap in the comment makes it suspect. Despite the trend and various improvements will continue to be a factor for a while as the limits improve. That doesn't change the trend. It means that preview is currently a bigger GPU workflow than final frame.

fxtentacle · on May 12, 2020

V-Ray on GPU will swap in the sense that it offloads textures out of the GPU and then re-uploads them later for another bucket while still rendering the same frame.

And you know, just because everyone is adding GPU support doesn't mean that professionals will switch their entire pipeline and render farms on their heads just to use it.

I acknowledge that they have GPU support and that some people like it, but I personally can usually not use it, so it is also not a purchase decision for me.

Plus, people already have large farms of high-memory high-CPU servers without GPUs, so switching would require lots of expensive hardware purchases.

And you usually render so many frames in parallel that it doesn't really matter if the single frame takes 5 minutes or 50 minutes. You just fire up 10x more servers and your total wait time remains the same.

dahart · on May 12, 2020

> just because everyone is adding GPU support doesn't mean that professionals will switch their entire pipeline and render farms on their heads just to use it.

You’re right, it doesn’t. The fact is that it’s already happening with or without you. Widespread GPU support being added is a symptom of what productions are asking for, not the cause.

virtualritz · on May 11, 2020

> The trend in renderers is extremely clear, nearly everybody is building a GPU version, and the makers and their users all report pretty big speedups with GPUs...

The 'trend' of the US government under president Trump is also extremely clear. Sorry, I couldn't resist. :)

TLDR; This 'trend' is not economically viable except for two parties. Makers of GPUs and companies renting out GPU rigs in the cloud.

Aka: it's just a trend. It's not that anyone sat down and really looked a the numbers. Because if they did this trend wouldn't exist.

It's also history repeating itself for those that do not learn from it. It will not go anywhere. Mark my words.

I've been there, in 2005/2006, when NVIDIA tried to convince everyone that we should buy their Gelato GPU renderer. I can elaborate why that went nowhere and why it will go nowhere again. But it's a tad off topic.

dahart · on May 11, 2020

Feel free to elaborate, I have no idea what your point is here or what you mean with your non-sequitur about the government or how that relates to developers of 3d renderers in any way. I don't know what you mean by "it's just a trend." The fact is that there's evidence for my argument, and you're attempting to dismiss it without any evidence.

Comparing Gelato to RTX seems bizarre, they're not related, other than that they're both Nvidia products. Are you trying to say you distrust Nvidia? RTX already is a commercial success, and there are already dozens of games & rendering softwares using RTX on the market and hundreds more building on it. RTX already went somewhere.

virtualritz · on May 11, 2020

> Comparing Gelato to RTX seems bizarre, they're not related, [...]

I did not compare RTX with Gelato. Where do you find the word 'RTX' in any of my replies?

I compared Gelato as a GPU-based offline renderer with other contemporary GPU offline renderers.

virtualritz · on May 11, 2020

I am not talking about games. I am talking about offline rendering only.

My point was that indeed bizarre things can become trends and it doesn't make them less bizarre.

ben-schaaf · on May 11, 2020

Cycles does the same work regardless of whether it's running on the CPU or GPU. You can even run mixed where both the CPU and GPU do rendering. The amount of noise has nothing to do with it running on the GPU or not, that's simply how path-tracing works...

virtualritz · on May 11, 2020

Maybe you misunderstood me. I didn't peg the amount of noise to CPU vs. GPU rendering.

I said that these renderers produce too much noise. All of them. And the solution is not denoising. The solutions are novel algorithms that produce faster convergence. There are other issues of course. Noise is just one.

What I implied was that the developers would better spend their time fixing these than trying make stuff fit into a GPU.

This[1] is a comparison of light sampling in 3Delight, Autodesk's Arnold and Pixar's RenderMan (RMan).

TLDR; The bottom of the page has a 'Conclusion' section which is worth reading.

The page is from around 2017 if my memory serves me right and was a private page requiring a password at the time. Pixar asked 3Delight to not make it public until RMan 21 was out. After which 3Delight re-ran the tests and RMan came out worse even. They never published the results because they do not care about publicity.

Mind you, these were all pure CPU renderers at the time.

Few people know about this. What everyone is exposed to is just the marketing mumbo jumbo on the vendors websites.

As one can guess even skimming over this comparisons there are many things one must get right.

E.g. speed of convergence. I.e.: what is the threshold of samples I can get away with? Some renderers converge linearly. An image with twice as much samples will have half the noise. But others manage to converge non-linear in a good or a bad way (twice the samples give you more than twice the quality or less than twice the quality).

Another is bias: does the image look the same when it has fewer samples and you squint or is it darker/brighter than a reference that has 'infinite' samples?

Very important for artists doing look development: time to first pixel. The graphs on the website have not changed much with recent versions of Arnold & RMan, unfortunately.

Note that shaders wise 3Delight already used OSL (byte-code, run-time interpreted) whereas Arnold & RMan used C++ (at the time).

Basically, what I tried to say was: if I was Pixar or Autodesk (insert any other renderer vendor here that came from CPU land) and saw this page (which Autodesk & Pixar did then) ... considering that the competition is running on the same metal, a CPU – maybe I'd try to get to the same level of speed/quality before putting a bunch of new problems on my plate that come when you strive for making this all work on a GPU.

[1] https://www.3delight.com/documentation/display/3DSP/Geo+Ligh...

dahart · on May 11, 2020

> And the solution is not denoising. The solutions are novel algorithms that produce faster convergence.

IMO, there are a few things you seem to be confused about.

1- There are no perfect sampling algorithms, the need for denoising will never go away. 3Delight's geo light sampling does not remove the need for denoising.

2- Denoising and better sampling are independent. You can do both, and everyone already is doing both.

3- 3Delight's geo light sampling is a pretty narrow use-case way to compare renderers. Games don't generally use geo lights at all, and films only rarely. What 3delight doesn't show you in that web page is how their sphere, quad & point light sampling algorithms compare to Pixar's & Arnold's, which is what everyone uses in practice. Spoiler alert: it's not better.

> Very important for artists doing look development: time to first pixel.

Your 3delight page doesn't say whether they're using binary or ascii .rib files. (How do I know it's being fair?) It doesn't compare realistic production scenes. It doesn't compare texturing quality, or how well the renderers perform under extreme memory usage. You are being sucked into some marketing, and missing the bigger picture.

virtualritz · on May 11, 2020

1. Of course not. And I didn't say so.

2. I never contested that too. What I did say is that GPU porting of offline renderers is a waste of time better spent otherwise. And that denoising is solving a problem you would have less off if you didn't waste time with the former. And that 3Delight is proof of that.

3. I never said this was a broad use case. I was making a point in the context of ray-tracing which is the topic of this HNs discussion. But I do assure you: comparing these renderers by other means wouldn't make them look better.

> Your 3delight page doesn't say whether they're using binary or ascii .rib files.

This is completely unimportant. The resp. renderer plug-ins were used inside Maya and 'render' was pressed.

3Delight didn't need to generate RIB files. Neither did (P)RMan. The Ri API is a C binding that can talk directly to the renderer or spit out RIB. If Pixar uses this in their Maya plug-in, I dunno.

Every time I used RenderMan from inside Maya I was working at some facility that had their own exporter. Arnold: I have no idea if it writes .ass files or talks directly to the renderer.

In any case, if you checked the numbers on the page for building up the ray acceleration structure it would be obvious to you that any amount of file I/O with the example scene in question is negligible.

I know a bit about this. I was maintainer of the Liquid Maya rendering translator at Rising Sun Pictures and wrote the Affogato Softimage to RenderMan exporter there. I also wrote a direct in-memory export that made Nuke render with 3Delight using the Ri (called AtomKraft for Nuke) and was heavily involved in the Atomkraft for AfterEffects. So I can assure you RIB is not a factor in this comparison. Trust me. :)

Texturing quality is the same with the caveat that Pixar can only use power of two textures. Which can have adverse effects on memory use and all implications coming from that.

Meaning that if I have e.g. a 46k pixels per axis resolution texture I have to upsample that to 64k to get the same quality that I can get in 3Delight using ... well: a 46k texture.

Because the next closest power of two I could use with RMan, 32k, will mean I loose 14k of resolution (aka quality) per axis axis. And lastly, apart from that: if you compare texture cache efficiency between those two, I am happy to bet considerable money on who will come out on top. Guess. :]

> you are being sucked into some marketing, and missing the bigger picture.

What marketing? 3Delight doesn't have any and that page is not known and extremely hard to find even through google unless you enter exact search terms.

I do not need marketing to be a tad informed about these topics. I have been using Pixar's RenderMan for almost 20 years in production and 3Delight for a decade.

dahart · on May 11, 2020

> What I did say is that GPU porting of offline renderers is a waste of time better spent otherwise.

Not true. You can get speedups of 10x-100x by moving to the GPU. There is no known alternative way to spend your time to achieve the same outcome with the same effort. Using ideal (unachievable) perfect importance sampling algorithms might give you 2x, in some cases, if you're lucky.

What is making you think that people aren't improving their rendering algos or renderers, or that GPUs are preventing other improvements?

> And that denoising is solving a problem you would have less off if you didn't waste time with the former. And that 3Delight is proof of that.

Not true. You can't get rid of the need for denoising when using Monte Carlo path tracing techniques, and 3Delight's geo light sampling does not in any way demonstrate that you can.

fxtentacle · on May 11, 2020

You get those 10x speedups only if stuff fits into GPU RAM. For movie cgi, that is usually not the case, meaning no GPU speedup.

dahart · on May 11, 2020

That is currently changing with on-demand loading, streaming, and NVME technologies, larger GPU ram, and multi-GPU systems.

virtualritz · on May 11, 2020

We are speculating here now. I though this discussion was about the status quo; not what we imagine soon to be.

Even if the bandwidth bottleneck gets solved in the future ... scenes from movie sets usually are big enough so transformations are stored as double precision matrices to avoid objects starting to jitter when they are far away from the origin.

Have you checked double precision GFLOPs on you favorite GPUs lately? And then compared those and their prices to some Ryzen 3970X CPU specs and their prices?

dahart · on May 11, 2020

You're wasting your own money if you buy a CPU based on fp64 flops and then start using any of the renderers you've cited so far.

virtualritz · on May 12, 2020

Yeah, right. Numbers?

dahart · on May 12, 2020

Which renderers use fp64?

virtualritz · on May 12, 2020

All of them.

Just for example in the PRMan 3.9 release notes from 2001, it says under 'Miscellaneous Changes': "Some parts of the renderer now use double-precision floating point arithmetic, to avoid round-off error."[1]

Your model-to-camera to camera matrix needs to be only single precision, usually.

But your model-to-world matrix needs to be double precision to avoid jittering.

So to calculate the former you use double precision and then you can truncate the resulting matrix to single precision for use e.g. in shaders. Everything dandy by then, even for GPUs.

But first you need to get there somehow.

So if you have a gazillion instances in your scene, particles, blades of grass, leaves of tress, spaceships, whatever, you need a gazillion matrix multiplications with double precision to build your acceleration structure and to actually start generating pixels.

It's one of many reasons why GPU based renderers performance goes to shit, particularly on time to first pixel, when scenes of such complexity get thrown at them. Contemporary GPUs have comparatively shitty f64 performance.

Edit: added example PRMan changelog backing up the claim in the 1st sentence. This was most likely considering just xforms. For ray-tracing specific issues and f64 see e.g. the PBR book on solutions to ray intersection precision challenges.

[1] https://renderman.pixar.com/resources/RenderMan_20/rnotes-3....

virtualritz · on May 11, 2020

> Not true. You can get speedups of 10x-100x by moving to the GPU. [...]

Ok, I take you by your word. Mind you, all the points I made were about offline rendering. I dare you to put your money where your mouth is.

Disney's Moana Set[1] it a good, publicly available example. Show me how you get a 10× speed up using a GPU over 3Delight[2] with any commercial renderer out there. Really. I look forward to it. Note that I do not take you up on your 100× because it will satisfy me to see a 10× speedup to accept your point. By far.

Price of GPU and CPU rig need to be comparable. Electricity cost must be factored in too so if your render costs twice as much, rig plus running costs, you divide your results by that factor. And vice versa.

And as for your RIB comment earlier: don't worry -- we will add the scene parsing time to the finals; as we should.

Total time will be from render command issued to final image being available on disk.

And while this is an outdated example the scene complexity reflects what users of Houdini, even single artist types, are throwing at renderers nowadays. Or trying to.

> What is making you think that people aren't improving their rendering algos or renderers [...]

Uhhm, experience? I have been using offline renderers in production since 1994. And I see the rate of progress in commercial offline renderers drop in correlation with those people spending more effort on porting their stuff to GPUs. Sure, could be a coincidence and no causality at all. Maybe there are better explanations -- I'm all ears.

> Not true. You can't get rid of the need for denoising when using Monte Carlo path tracing techniques [...]

Says who? I would never make such a claim either way.

So before you mis-quote me again: I ever said you can get rid of the need for denoising and I never said you cannot get rid of the need for denoising for Monte Carlo path tracing. In fact I never even mentioned Monte Carlo path tracing.

Now going back to what I actually wrote: maybe re-read that. Notice the word I put in Italics so it was harder to miss?

Just for the record: you are mis-quoting me for the fourth time in this thread.

[1] https://www.technology.disneyanimation.com/islandscene

[2] Just an indication (white bars only) since this misses data on electricity costs and parsing time (I think, but I will verify with them): https://documentation.3delightcloud.com/display/3DLC/Cloud+R...

dahart · on May 11, 2020

> I never said you cannot get rid of the need for denoising for Monte Carlo path tracing.

You claimed that investing in sampling algorithms would reduce the need for denoising. You claimed that 3Delight's sampling algorithm is proof of that. I disagree, I think denoising gives the same benefit to 3Delight as it does to Arnold & RenderMan, which is reduced time to a noise-free preview.

> In fact I never even mentioned Monte Carlo path tracing.

I don't even know what you mean by this, if we're not talking about Monte Carlo, why are you commenting on denoisers at all? You posted a link to 3delight's geo light sampling marketing comparison. You know that's a Monte Carlo method, right?

> Sure, could be a coincidence and no causality at all. Maybe there are better explanations -- I'm all ears.

Yes, that's right, it could be coincidence.

Do you follow Siggraph or rendering research, and have you noticed that the rate of progress in CPU rendering algorithms has slowed?

Do you have more examples of features that CPU-only renderers have that are surpassing everyone porting to GPU? I've seen only one example in this thread.

> you are mis-quoting me for the fourth time in this thread.

I don't believe that's true. I may be misunderstanding your points or summarizing you incorrectly or in a way you don't like. I am definitely disagreeing with some of your points and trying to explain why, but I don't believe I have mis-quoted you even once.

>> You can't get rid of the need for denoising when using Monte Carlo path tracing techniques [...] > Says who? I would never make such a claim either way.

Wikipedia says so, and I'm happy to repeat it. Would you not make that claim because you haven't studied what Monte Carlo is, or because you know of advanced techniques that are noise free? There are no known noise free Monte Carlo rendering techniques except for on trivial and/or hypothetical scenes. The kind of scenes you're talking about always have randomness (noise), because the very name "Monte Carlo" is referring to random sampling.

I'm happy to explain more about why Monte Carlo comes with noise, in case it's not clear, or why denoisers will always be valuable in the context of Monte Carlo rendering methods...

virtualritz · on May 12, 2020

> You claimed that investing in sampling algorithms would reduce the need for denoising. You claimed that 3Delight's sampling algorithm is proof of that.

One thing I can share is that 3Delight has a found a way to do weighted filtering of pixel samples w/o introducing correlation. They do not publish any papers though.

Many other vendors of renderers have this issue and the noise this correlation produces is hard to remove.

But I'm sure you understand all that since you were offering to lecture me on Monte Carlo methods below. ;)

Because of this e.g. Intel Open Image Denoise needs uncorrelated pixels samples to work at all.

And the crazy thing resulting from this is that Intel ask you to turn off this sort of pixel filtering in your renderer. Which makes the image more noisy.

Aka: you are adding noise to be able to then denoise. More irony ... this time lost on advocates of denoising, I guess.

Quote from the IOID docs: "Weighted pixel sampling (sometimes called splatting) introduces correlation between neighboring pixels, which causes the denoising to fail (the noise will not be filtered), thus it is not supported."[1]

This is actually misinformation (or maybe trending fake news?). What they should have written is "We believe that weighted pixel sampling (sometimes called splatting) introduces correlation between neighboring pixels [in all renderers we know of] [...]"

To quote a 3Delight developer on this text: "That text from Intel, apparently the same as for RenderMan, is pure, unadulterated, pseudo-science. [...] 3Delight will work with [the] Intel denoiser [...]"[2]

Proof: you can feed a pixel filtered image from 3Delight into IOID and it will work as the samples are not correlated[3].

[1] https://openimagedenoise.github.io/documentation.html

[2] https://discordapp.com/channels/618404428776472581/618415516...

[3] https://cdn.discordapp.com/attachments/618415516641525791/70...

virtualritz · on May 12, 2020

> You claimed that investing in sampling algorithms would reduce the need for denoising. You claimed that 3Delight's sampling algorithm is proof of that. I disagree,

Then we agree to disagree. I can back up all my claims with hard data. Can you?

3Delight's changelog gives you access to all versions of the renderer maybe back to 2006 or so. I think their path tracing core is only after 2016.

So we can download any 'sample', throw the same scene at it and calculate noise properties in the resulting image and see how they got better over time. Be my guest. I know the result and it will back up my claim. Cheers! :)

> You posted a link to 3delight's geo light sampling marketing comparison

Nice 2nd try but not biting. This is not marketing material. 3Delight doesn't employ a single marketing person and this was never published at the time when it was relevant (2017).

> I'm happy to explain more about why Monte Carlo comes with noise, in case it's not clear, or why denoisers will always be valuable in the context of Monte Carlo rendering methods...

Thank you for your kind offer, professor, no need. ;)

May I remind you that I said (analogously) that relying on denoisers for improvement is a bad idea not that denoisers are a bad idea per se?

But let's not try diverting this from you being put on the spot to prove your outlandish 10x+ speedup claim, shall we?

Or are you preparing an extra reply to my challenge?

1. GPUs are not faster than CPUs for offline rendering when you factor in costs.

1.1. They are not even faster in the general case when you just go by GFLOPS/US$. You have to look at specific cases to get a speed advantage (exactly my point: you spend R&D on making the general case work within the restrictions of the special case. Example: FP precision).

3. Time spent working around restrictions of GPUs (data transfer, memory size, precision) could be better spent at improving a CPU renderer. Proof: 3Delight.

4. GPUs are not a general alternative to CPU offline rendering because of aforementioned restrictions. That they work in specific cases means the only good reason to invest in R&D supporting them is if your target audience only ever needs to deal with those special cases. No such renderer (claiming to cater only for such audiences) exists. Thus there are no renderers out there which should focus their R&D on GPU or they are simply lying to their customers (and/or themselves).

> Do you follow Siggraph or rendering research, [...]

Yes.

> [...] and have you noticed that the rate of progress in CPU rendering algorithms has slowed?

No.

But I'm curious how you define a "CPU rendering algorithm". I'm all ears.

dahart · on May 12, 2020

> I can back up all my claims with hard data.

That's where I think you're already off the rails, I don't think this particular point can be proven with hard data, it's about goals. Denoising helps whenever there's noise. 3Delight produces noise with low sample numbers, just like all other renderers. A denoiser will make that faster and enable quicker preview turnarounds.

Your argument is that 3Delight's algorithm makes final frame rendering faster, which I'm not arguing with. If it takes longer than 5 seconds to get to final frame, it doesn't matter whether it's faster than others, denoising will provide the same benefit.

> I'm curious how you define a "CPU rendering algorithm". I'm all ears.

I was really referring to just rendering algorithms in general. There were a lot of papers on how to speed up rendering algorithms in the 80s and 90s, and the speedups were large. A lot fewer in the 2000s and 2010s, and the speedups are no longer multiples, but more often measured in percents. Based on your claims, I'd think you'd have noticed this trend. It's directly related to the rate of progress of features and speed that commercial renderers are releasing.

Feel free to provide sources on flops/$ for fp32. (BTW, talking flops/$ ignores RTX.) First Google result disagrees with you https://aiimpacts.org/current-flops-prices/#Graphics_process...

Saying that GPUs aren't a general alternative is true but ignoring the trend. They used to be much less of an alternative. They are currently a fast alternative for some workflows, especially preview. They will in the future become more and more of a general alternative.

virtualritz · on May 12, 2020

> Feel free to provide sources on flops/$ for fp32. [...]

Ok, early morning maths ... been coding for 10 hours but I try:

A Quadro 8000 RTX has about 510 f64 GFLOPS and 16,320 f32 GFLOPS according to Wikipedia[1]. It costs US$5,500.

A Ryzen 3970X has about 1,320 f64 GFLOPS according to this test[2] and 1,900 f32 GFLOPS. It costs US$2,000

Now a Super Micro board with two EPYC sockets is about US$700 according to Google.

So for US$4,700 I get a CPU compute rig with 2,640 f64 GFLOPS. Aka: more than five times as much as the Quadro 8k RTX rig -- at US$800 less.

Let's say, we are being unfair and we limit our scene complexity to one that only requires single precision (f32) floating point operations (see my reply above). That's btw one of those "GPU restrictions" renderer developers then try to work around that I talked about earlier.

I guess I am trying to say: without going single precision GPUs can not even beat CPUs -- they basically suck.

Now with single precision the Quadro 8k RTX is way ahead of the CPU. At 16,310 f32 GFLOPS it is a bit more than four times the compute power of the CPU rig which clocks in at 2×1,900 GFLOPS of f32 precision (mind you just four times, that's why I said if you can even prove a 10x speedup of GPU vs CPU in offline rendering, I take your point).

Now looking at actual rendering time ... from launching a render to getting an image on disk .. can the GPU we're looking at ever be four times as fast as the US$700 cheaper CPU rig with this constraint to f32?

Absolutely. Most definittly. I would never contest this.

Will these constraints satisfy the needs of people doing professional offline rendering today? Definitely not. Shoveling those scenes onto the GPU will rid you of any speed advantage as low as 4x.

Let's not even talk about electricity costs. The AMD CPU draws 35W under full load. So that's 70W for two. The Quadro 8k RTX draws 260W. And this is not considering the AC you need to buy, maintain and power feed too cool a farm of these beasts.

People who say GPUs will hit it big in offline rendering "soon" are missing important details.

And as someone who had to bear with convincing people the other way once in his life already to prevent disaster later on a production I mean this in a very caring way.

Lastly: at that time I just talked about everyone was saying GPUs will "soon" play a bigger role. Then over ten years passed.

In 2016 two friends of mine, the lead developer of 3Delight and a famous CG researcher who works at NVIDIA were chatting over beers at my house in Berlin. And the latter guy asked the 3Delight dude why they are not investing in GPU research.

And the 3Delight dude said: "Because they asked us the same then. And ten years later we're still faster than any GPU renderer out there for the customers we target. And we don't see that changing."

Now it's 2020 and it's still spot on.

[1] https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...

[2] https://www.pugetsystems.com/labs/hpc/AMD-Threadripper-3970x...

ben-schaaf · on May 12, 2020

The 3970x has a 280W TDP. It draws nowhere near 35W under full load. It's been measured in some benchmarks to reach 450W [0].

Luckily for blender we also have a great set of benchmarks with comparable results: https://opendata.blender.org/. While the 3990x tops the charts, even the 2070S beats the 3970x. That's more value, more performance, more efficiency for less cost.

> People who say GPUs will hit it big in offline rendering "soon" are missing important details.

At least for cycles, it's already a hit. There's plenty of services offering rendering services and all the ones I've found offer either both or exclusively gpu rendering.

[0] https://www.phoronix.com/scan.php?page=article&item=amd-linu...

fxtentacle · on May 12, 2020

I would put Blender clearly into "hobbyist" territory, so whether or not it works well on GPU won't really affect the next Pixar movie ;)

ben-schaaf · on May 12, 2020

Ah yes, Blender the "hobbyist" tool that's also:

* Used by NASA for a lot of their 3d modelling

* Used in Spider Man 2 for storyboarding

* Used for some special effects in Red Dwarf

* Used in Captain America: The Winter Soldier for previsualization

* Rendered the entirety of the movie Next Gen

The list goes on and grows every year. No, it won't affect the next Pixar movie, but if that's your standard then 3Delight sounds like just as much of a hobbyist thing.

dahart · on May 12, 2020

Are you just blindly arguing against anything and everything said in favor of GPUs? Why are you stooping to pick on Blender? It is being used in production, so call it what you want.

Pixar is working on a GPU renderer, they’ve been discussing it for a couple of years.

dahart · on May 12, 2020

I asked about fp32. You picked one of the most expensive GPUs NVIDIA makes, and one not built for fp64, in order to compare fp64 flops/dollar, without explanation... to artificially justify using two CPUs instead of one? Your numbers here are pretty contrived. The RTX8000 is not the best flops per dollar for either fp32 or fp64 by a long way. Try again.

virtualritz · on May 12, 2020

I picked a rig that freelancers using Redshift and the like buy to 'render faster'. I hang out on the resp. Discord servers a lot where these things get discussed. So from what I read there this comparison is spot on as far as the current market goes.

dahart · on May 12, 2020

Rationalize it however you like, the conclusion you jumped to using your cherry picked numbers above is off by an order of magnitude.

wtracy · on May 11, 2020

I think the implication is that GPU rendering ends up using lower-precision floating point numbers than are available on modern CPUs, which leads to more rounding errors.

gmueckl · on May 11, 2020

No, CPU renderers also use IEEE single precision floats for performance. Unless you do something really absurd[0], single precision has more than enough range and resolution to not lead to any noticable inaccuracies in the result.

[0] like trying to render an insect eye to scale while having everything shifted 1000km from the origin.

fxtentacle · on May 11, 2020

While that is in general correct, most GPUs do not fully adhere to IEEE or they do not calculate at full resolution.

It's a well-documented fact that the same floating point operations on a GPU will have less precision than on CPU, despite both using the same IEEE storage format.

https://developer.nvidia.com/sites/default/files/akamai/cuda...

dahart · on May 11, 2020

That paper does not say what you claim it does, it is not showing that GPUs are computing at lower resolution, nor that CUDA is failing to adhere to the IEEE 754 standard.

In at least two cases, it says the opposite of what you just claimed:

FMA on the GPU was a way to achieve higher float precision than you could with CPU math, in 2011 when the paper was written. That is changing now with CPU adoption of fma, but that doesn't change the claims of the paper.

The GPU's strict adherence to IEEE 754 is mentioned multiple times.

"The same inputs will give the same results for individual IEEE 754 operations to a given precision on the CPU and GPU. As we have explained, there are many reasons why the same sequence of operations may not be performed on the CPU and GPU. The GPU has fused multiply-add while the CPU does not. Parallelizing algorithms may rearrange operations, yielding different numeric results. The CPU may be computing results in a precision higher than expected. Finally, many common mathematical functions are not required by the IEEE 754 standard to be correctly rounded so should not be expected to yield identical results between implementations."

wtracy · on May 11, 2020

As far as I can tell, Pixar only started using GPUs for previsualization around 2013 or 2014. It looks like Renderman didn't have real GPU super until 2018.

They've never actually confirmed moving from CPUs to GPUs rendering for their final production renders.

So, yeah, I think you're on to something.