
DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression - Anon84
https://research.fb.com/publications/deepfovea-neural-reconstruction-for-foveated-rendering-and-video-compression-using-learned-statistics-of-natural-videos/
======
juancampa
Here's an anecdote. My friend's apartment got burglarized many years ago and
when we looked at the security footage we clearly saw the thieves taking
everything, their faces were impossible to recognize though, due to low
resolution. Ever since that happened I kept thinking of a video codec that
would store the whole video in low-res but recognize faces and encode those
parts in ultra high-res. I hope research like this can lead to better
security.

OTOH, government surveillance...

~~~
daenz
OpenCV has a module called "super resolution"[0] whose goal is to reconstruct
high-quality images from multiple low quality images.

0\.
[https://www.youtube.com/watch?v=E6mePD21sIU](https://www.youtube.com/watch?v=E6mePD21sIU)

~~~
taneq
Don't they do something like that to boost the resolution of space telescopes?
Also I seem to remember reading something about processing a stream of images
from an earthbound telescope to cancel out atmospheric distortion.

~~~
AlanYx
Space telescopes don't do super-resolution of the type used for video, but
they do something a little similar. They use a technique called aperture
synthesis which combines signals from a collection of instruments so that they
have the same angular resolution as a much larger virtual instrument.

------
hwbehrens
The dynamic gaze example really convinced me that eye tracking will be
necessary for immersive VR. If you can achieve a 1+ order of magnitude
improvement in rendering performance with no noticeable loss in quality... it
would be very difficult to leave that on the table.

~~~
pygy_
You can also use a lower frame rate for the foveal input.

What peripheral vision losses in spatial resolution, it wins back in time.

~~~
_carl_jung
Not necessarily. The "lower framerate" in our fovea is not represented as a
stuttering sequence of frames, but a blurry and smooth flow. Simply using a
lower framerate would still be noticeable.

Unless you could engineer a display technology that could do this.

------
pornel
Dropping of certain pixels is a very peculiar way of reducing input quality.
Why was that method chosen?

For 3D rendering I guess that's a kind of DLSS, but the paper focuses on video
compression.

For video streams that doesn't seem to make sense. Video codecs are not pixel-
based, but block/frequency based, so you can't save any bandwidth by dropping
pixels. Raw pixels don't compress well, especially less correlated samples
like that, so I wouldn't be surprised if sending just the reduced input for
this algorithm was more costly than sending a full video stream. And existing
video codecs can already very effectively vary quality within the frame by
varying block sizes and quantization.

~~~
Lorkki
"Compression" is probably just poorly chosen wording. This has more to do with
reducing the number of required samples in applications like eye-tracking VR,
where you can choose to render a dense image for the part that the user is
looking at, while reducing detail in peripheral vision. Current
implementations use some fractional resolution(s) for the periphery and blend
pixels using more traditional methods, which results in blurryness and/or
aliasing artifacts.

------
hemogloben
Those are some pretty incredible results. For any single frame I found it hard
to find a significant quality loss between the DeepFovea frame and the
reference (obviously while looking at the Foveal target and trying to compare
peripheral quality), but in motion there was a lot of interframe noise /
aliasing / jitter.

While I'm sure they'll improve on those issues I'm currently wondering what
kind of visual peripheral trade offs I'd make; if I had a demo in front of me
I'd bet that I'd prefer running at higher foveal settings / fidelity with
peripheral artifacts to running at lower overall settings / fidelity to avoid
them.

------
s_gourichon
Had this idea at least 10 years ago. Have many ideas, that said...

Fovea-oriented compression can be useful for optimized bandwidth usage in
video conferencing, too.

One could even implement auto-reframe of video feed when several participants
are in the same room without need for a mechanical camera moving. Or something
like liquid rescale to still get a glimpse of the rest of the full frame.

Perhaps those ideas were since patented and even developed?

------
hoseja
Can the eye tracking see saccades? Can it keep up if you rapidly refocus?

------
czr
haven't read paper yet, but this is a silly demo. "turn 10% of pixels black"
is not a good baseline, should use nearest-neighbor interp (or something) to
fill holes in the "sparse" video for fair comparison. also, you can clearly
see in hd video that it's temporally unstable ("shimmering"), which is the
same problem nvidia has with dlss since forever; need to build temporal
smoothing in or users will hate it.

------
bitL
Can this be used to replace Photoshop's Content-aware fill as well? Or does it
require some sparse sampling of the whole area that needs to be reconstructed?

------
naveen99
i was thinking of doing something similar for image segmentation.

