
Neural Supersampling for Real-Time Rendering - jsheard
https://research.fb.com/blog/2020/07/introducing-neural-supersampling-for-real-time-rendering/
======
bigdict
Nvidia recently demoed a similar technique running in real time:
[https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-
your-q...](https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-your-
questions-answered/)

~~~
jsheard
More than demoed, they've shipped DLSS in quite a few games now. The 1.0
version was underwhelming but the 2.0 version works extremely well in
practice.

However Nvidia are treating DLSS as their secret sauce and not publishing any
details, so Facebook's more open research is interesting even if it's not as
refined yet.

~~~
Scaevolus
This appears to be more accurate than DLSS according to Fig. 11 in the paper,
and has the 16x mode as well as 4x.

~~~
dannyw
Meanwhile DLSS is real time and this takes 100ms a frame.

~~~
goldenkey
DLSS is possibly baked into the graphics cards. It is what the RT cores are
for. Facebook is just releasing a POC. I doubt the code is handwritten CUDA
optimized for performance.

~~~
gerardvivancos
On Nvidia cards the RT cores are the ones handling BVH traversal and ray-
triangle intersection operations.

The part of the hardware that is running the DLSS ML model are the tensor
cores. But the algorithm and the model itself is not baked in, it is provided
in the driver and/or game

------
daenz
Relevant performance details:

A Titan V GPU, using the 4x4 upsampling, at a target resolution of 1080p takes
24.42ms or 18.25ms for "fast" mode. This blows out the 11ms budget you have to
render at 90hz (6.9ms for 144hz), and it doesn't appear to include rendering
costs at all...that time is purely in upsampling.

Cool tech but a ways to go in order to make it useful for VR.

~~~
onion2k
_it doesn 't appear to include rendering costs at all...that time is purely in
upsampling_

That part wouldn't be an issue if the plan is to render low resolution images
in the cloud and stream them to a device that can upsample them locally. There
wouldn't be any local rendering costs.

~~~
reitzensteinm
I'd be very surprised if this is what it's to be used for. The technique
requires color, depth and motion vectors. That's three separate video
channels, and two of them contain data that isn't usually stuffed into videos.

Any compression artifacts are going to stick out like a sore thumb, so you'll
need to stream very high quality, and you're going to have weird interactions
between different layers being compressed differently.

------
carrolldunham
scanning through it for the clause that gives away what slight of hand was
used to correctly get BERLIN from nothing. Suspects

    
    
      > and combines the additional auxiliary information
      > multiple frames

In other words, the label "Low Resolution Input" on the blurry images is
misleading. The image should be labelled "some of the input".

~~~
Vel0cityX
No idea what "some of the input" means, or why you thought "Low Resolution
Input" is disingenuous?

It uses color, depth and subpixel motion vectors of 1-4 previous frames. All
things that modern game engines can easily calculate. You didn't even need to
read the paper to get this info, it's literally in a picture on the blog post.

~~~
carrolldunham
Right - so a single low-res image should not be paired with the high-res one
and labelled as input and output, because that implies the algorithm turned
the one data into the other, which it did not do.

~~~
pseudosavant
This isn't about upsampling low-res bitmaps. It is a technique for upsampling
the output of a _game engine_.

The low-res image is itself output generated from a lot of other input the
game engine generates. That same input _that is already being generated
anyway_ can also be fed into this to improve the post-processing. Finding ways
to productively reuse already existing/generated data is the hallmark of any
top (graphically) game.

I read a detailed write-up on the graphics pipeline for GTA:V on Xbox 360. It
blew my mind how many different ways they reused every single bit that ever
hit the RAM. Which explains how they pulled off those graphics on a system
with half as much RAM as an Apple Watch.

------
tanilama
Isn't this literally NVIDIA's dlss? And it has already been productized.

~~~
Vel0cityX
Except Nvidia has published pretty much nothing about their method.

~~~
pixelhorse
They did publish a video on how it works, but I can't find it right now.

The inputs are similar:

[https://www.nvidia.com/content/dam/en-
zz/Solutions/geforce/n...](https://www.nvidia.com/content/dam/en-
zz/Solutions/geforce/news/nvidia-dlss-2-0/nvidia-dlss-2-0-architecture.png)

In contrast to DLSS1, the output of the NN is not color values, but sampling
locations and weights, to look up the color values from the previous low-
resolution frames.

------
warvstar
Would love to add this to my game engine.

~~~
Vel0cityX
Did you read the paper? Or the benchmarks at least? In its fastest mode, it
takes like 18ms. Not even usable in real time if you target 30fps.

Great start but definitely needs additional work to be usable in games.

~~~
pseudosavant
There is a big difference between latency and throughput. FPS is throughput.
If you assume the entire system is producing only the current frame then those
numbers are directly correlated. But most systems, especially game
engines/hardware, always have multiple things going in parallel
simultaneously.

The H.264 encoder on my CPU introduces >16.7ms of latency into a video stream,
but it can encode hundreds of frames per second of SD video all day. Adding ~1
more frame of latency may be worth a quadrupling in image quality/resolution
in most circumstances.

