Hacker News new | past | comments | ask | show | jobs | submit login
Fooling Around with Foveated Rendering (peterstefek.me)
175 points by underanalyzer on Oct 6, 2020 | hide | past | favorite | 87 comments

Relevant: You can actually perceive the rough size of your fovea with this optical illusion: https://www.shadertoy.com/view/4dsXzM

Very interesting.

I was wondering what determines the speed at which the stars in foveal vision rotate. I assumed the movement is the illusion, but it turns out the stars all rotate - there's an explicit parameter determining their period in the shader. The illusion is that the stars outside your central vision seem frozen!

That's the size that is relevant to foveated rendering. But note that your actual physical fovea is even smaller than this, due to saccades.

By what factor does it increase the apparent fovea size?

I couldn't tell you a scientific answer.

However, I have a vision condition where I can't see blue in my actual fovea. And that point is incredibly small. Maybe 3-4 times smaller than what you are seeing here.

I was under the impression that everyone was blue blind in the fovea due to a lack of short wavelength cones in the fovea. This paper seems to confirm this but it is from 1967 and I am have trouble finding other references. https://www.osapublishing.org/josa/abstract.cfm?uri=josa-57-....

Note that this paper describes a lack of blue cones in only the very center of the fovea, corresponding to 7-8 angular minutes (or 0.03-0.04 mm). That's much smaller than the fovea as a whole, which is about 5 degrees or 1.5 mm. It's unclear whether fudged71 just has this normal condition, or actually has blue blindness over a wider region.

Weirdly for me the "size of the fovea" is much smaller with the windowed view than fullscreen (seems to be the same proportion of the overall image), which doesn't seem to make sense. Perhaps another optical illusion is in effect as well?

The effect only occurs if the elements are small enough, so in fullscreen they are probably too large. You can step away from your screen a bit and you'll see the effect as well in fullscreen.

As far as I understand, the effect is caused by the fact that the fovea has a much higher resolution (of bright light) due to its higher density of cones. At a certain size, the fovea is still able to make out details of the elements, but the rest of the retina can not. Thus they are detected as rotating in the fovea area, but not outside (since it's basically just a colored lump there. We still perceive the outside ones as stars (and not just color lumps) because the brain interpolates them that way due to previously perceived information.

Under low light conditions, i.e. when rods would be the main means of vision, an opposite effect can be observed: We are mostly blind in the fovea area due to the lack of rods there. This manifests as objects showing more detail at night when looking slightly past them.

The elements are the same size in windowed as in full screen though.

It depends on the browser. In Safari they aren't.

When I get close to the monitor the # of stars that appear to spin is greater than when I move my head far away. That is, the area of screen space with spinning stars increases as I get closer.

Do other people get the same effect?

I thought the opposite would happen.

I get the same effect. I think it makes sense, even though I understand your reasoning.

Both the fovea and the rest of the eye perceive more detail when you're closer to an object (the fovea is no different in this respect). When you're close enough, the fovea can perceive a larger area in detail, which we see as motion in that animation.

I get the same thing. I think it's because when you get close enough, you can see spinning even in the periphery, just because it's bigger.

Note that if this isn't working, you are probably too close to your screen. I had to move back several inches to see any of the crosses as still.

This is a fun experiment, but reminded me why I didn't like 3D movies:

You need to look right at where they want you to look - so in this example you need to look right at the center point of the image. If your gave drifts off to the side then it looks bad.

The same with 3D movies, if you don't want to look directly at what the director wanted you to look at, you end up getting seasick.

Do VR headsets have gaze tracking these days?

I would definitely not dismiss this technique b/c of my quick prototype. There's a ton of work that has gone into the foveated rendering you will see on quest. Check out this paper: https://ai.facebook.com/blog/deepfovea-using-deep-learning-f... (that I link to in my post) to see a what a real version of this technique looks like.

Some games (not sure if all) on the Oculus Quest use fixed foveated rendering, no gaze tracking. If you look to the side of the display, you can notice the rendering getting blocky. If you search for it, you can make out a border from sharp rendering to blocky rendering, but it's seldom noticeable in normal gaming.

Since the field of view is limited in today's VR displays anyhow, you don't lose much from the foveated rendering: if you look to the side of display, much of your own field of view is now black, which is more limiting than the fact that the few pixels that you still see are rendered in a blocky fashion.

In Carmack's talk this year he said fovisted rendering was basically a mistake. A lot of it was due to hardware implementation details, but it didn't save much GPU resources for the quality hit you took.

One nice benefit of the Quest 2 is that it can run Quest 2 games without be it.

It's still a feature in the Oculus runtime but they only recommend it in specific situations now, and they try and manage the quality outside of the game because way too many developers just cranked it to Max and it made their games look terrible.

I don't remember him saying it was a mistake, just that it wasn't the magic panacea that people were expecting to catapult us to full retina-resolution VR. It was still a significant improvement (and combined with eye tracking, didn't constitute a quality hit) iirc.

He was saying fixed foveated is not a significant improvement if you set it to a level that isn't perceptible.

You can easily tell when games have it cranked up and the quality drop off at the edge is very severe. Perf improvements are only moderate.

Oh, I missed that part then. Yeah, fixed foveated rendering is a quick and dirty hack, and only really useful on cheap headsets with terrible optics which waste a lot of pixels due to distortion. The proper solution is to improve the optics (a la Valve Index), not use FFR.

Eye tracking foveated rendering is much more useful but a bit of a pipe dream right now and still only gives a max improvement factor of 2-5x or something like that.

"without be it"?

Replying from a phone, lying in bed, using voice to text, too sleepy to proof.

no problem -- I wasn't criticizing, just asking what you meant bc I don't understand what you meant to type.

> The same with 3D movies, if you don't want to look directly at what the director wanted you to look at, you end up getting seasick.

They could do statistical gaze tracking on a 2D version of the movie scene using a small test audience, and then render the 3D scene based on the results.

There will likely always be people impacted by this. For me, human faces are generally not the interesting part of the screen. I'm looking at clothing, set design, fx, etc. I noticed this constantly watching avatar in 3D.

I happen to also struggle with human face recognition, so it's likely I'm just wired different.

the same with 2D movies. Anime taught me to always look in backgrounds, a habit which makes it clear that live action frequently uses focus to force attention.

Some Netflix series are really bad in this regard. Sometimes the upper and lower part of the picture are really blurry. Maybe they do this to achieve a more cinematic look? But to me it feels like forced tunnel vision and it makes me really uncomfortable.

I don't think any current headsets do gaze tracking, but it is something Sony is looking at.


Vive Pro Eye and Varjo VR2 are some examples. I expect the functionality to be a part of most next gen headsets

That would certainly give VR headsets a huge performance boost.

When you say "3D movies", do you mean animated ones? Because they just simulate shallow depth of field which you also get from real lenses (given the right configuration). It's completely normal to use it as a tool to direct attention in movies.

They mean stereoscopic movies. It is a normal camera trick but it doesn't work well in 3D? Adjusting your eye's stereo focus differently than the lens focus can be too straining.

>gaze tracking these days?

They do (e.g. Vive Pro Eye, HP Reverb G2 Omnicept Edition). And checking that on google is much faster than writing this sentence.

Tangentially related: I've recently been doing some VR work in Unreal Engine, and also reading Blindsight by Peter Watts, and this made me wonder: are saccades truly random? Could they be made predictable through some special way of crafting a headset, or through some stimulation? If so, then perhaps one day we'll be able to render only the pixels the mind can actually perceive at any given moment.

What makes you think saccades are random? When you are reading text, for example, they are methodical.

We have had machines that do what you describe for decades.

Here's Daniel Dennett in Consciousness Explained (1992)

When your eyes dart about in saccades, the muscular contractions that cause the eyeballs to rotate are ballistic actions: Your fixation points are unguided missiles whose trajectories at lift-off determine where and when they will hit ground zero at a new target ...

Amazingly, a computer equipped with an automatic eye-tracker can detect and analyse the lift-off in the first few milliseconds of a saccade, calculate where ground zero will be, and before the saccade is over, erase the word on the screen at ground zero and replace it with a different word of the same length. What do you see? Just the new word, and with no sense at all of anything having been changed. As you peruse the text on the screen, it seems to you for all the world as stable as if the words were carved in marble, but to another person reading the same text over your shoulder (and saccading to a different drummer) the screen is aquiver with changes.

The effect is overpowering. When I first encountered an eye-tracker experiment, and saw how oblivious subjects were (apparently) to the changes flickering on the screen, I asked if I could be a subject. I wanted to see for myself. I was seated at the apparatus, and my head was immobilized by having me bite on a "bite bar". This makes the job easier for the eye-tracker, which bounces an unoticeable beam of light off the lens of the subject's eye, and analyzes the return to detect any motion of the eye. While I waited for the experimenters to turn on the apparatus, I read the text on the screen. I waited, and waited, eager for the trails to begin. I got impatient. "Why dont you turn it on?" I asked. "It is on," they replied.

> What makes you think saccades are random?

I didn't mean "random" as in governed by an RNG, just "random" as in we can't obviously predict their pattern. The opposite of that would be being able to tell when they'll happen and where they'll end before they start, and/or make them happen on demand and land on desired target.

> We have had machines that do what you describe for decades.

I didn't knew that. Thanks for citation. This leads me to ask: so why aren't we employing this for VR?

Ah I see

wow, is there a video of this? sound really interesting

Saccades are neither random nor very predictable. They are voluntary. You can predict the "salient" parts of an image that humans will prefer to look at, and use that information to e.g. improve video compression. But your predictions will always be wrong sometimes because humans are unpredictable.

It should be feasible to detect saccades instead of predicting them, and render new pixels more quickly than the eye can move. But it is right at the edge of what's possible, and really needs more reliable eye tracking than is generally available today, along with purpose built rendering techniques and higher resolution + higher field of view displays.

One of my favorite books. You might be interested in a fan movie launching next week. http://blindsight.space/


Thanks, very timely! I'm in the middle of reading the sequel/sidequel, Echopraxia, right now.

Not prediction, but: there used to be an Exploratorium exhibit with a large screen which updated randomly. One could toggle between flashing (erase to black) updates and smooth (just overwritten) updates.

The difference in sensation of change was palpable.

Are you talking about Critical Flicker Fusion here?

Possibly. (I'm unfamiliar with the term, TIL!) As I dimly recall from last century, we are sensitive to flicker in the coarse-resolution field, but insensitive to non-flickering change outside the fine-resolution field.

Then you're probably familiar with the 24 frames per second cinema standard for projecting films. Except each frame is actually projected three times, leading to a shutter speed of 72 frames per second. (In reality there are far more options today with IMAX and video tech, though.)

Even though there's only 24 images on the film per second, this triple exposure results in a smoother experience for the viewer. This is due to how critical flicker fusion works (a.k.a. persistence of vision). [1] (Though another reason for doing it is simply so that the film won't be burnt, since those xenon gas projection lamps run very hot.) See also beta movement and phi phenomenon. [2]

These physiological phenomena are of course very important to consider when making VR devices and games.

In order to create the effect mechanically in the film projector, each frame has to be stationary before the shutter is opened. If not, all you'd see is a blur. Thus, when the film is pulled forward, the shutter blocks the light from projecting the image onto the screen. In fact, during some half of the movie, people are actually sitting is pitch black darkness. Think about that next time you go to the movies! ;)

Please note that peripheral vision has a higher sensitivity to flicker than foveal vision. I'm unsure how interlacing affects that, though, if applickable. This effect might also be different on video systems where various forms of interlacing may or may not be used.

[1]: https://en.wikipedia.org/wiki/Flicker_fusion_threshold

[2]: https://en.wikipedia.org/wiki/Beta_movement

(Please excuse my lack of sourcing. Most of this is off the back of my head, and from books I am no longer in posission of... But the Wikipedia links should give you a good start.)

> Thus, when the film is pulled forward, the shutter blocks the light from projecting the image onto the screen.

Ironically, it sounds like a complement to saccades! Where saccades involve shutting off vision when your eye moves, this involves shutting off projection when the image moves.

Would eye tracking be fast enough to follow saccades and generate the high resolution parts on the fly? I assumed it would and we'll eventually have VR at resolution of human perception done that way. I hope so!

> Would eye tracking be fast enough to follow saccades

MEMS kHz eye tracking enables even predicting where a saccade will land 20+ ms before: https://www.youtube.com/watch?v=JEg4l5KuQgI&t=452 (AdHawk @ AWE 2018).

All of the current products are basically cameras, right? I’m excited to hear about possible improvements

All that I know of at least?

Eventually, maybe? But I was hoping there would be a way to cheat and somehow force saccades to happen on a pattern that can be predicted beforehand, so that we could render only the parts of the image that the eyes are just about to jump to.

Probably to some extent. Movie makers anticipate what part of the screen viewers will look at. They can control it with content, motion, focus, etc. Maybe you're hoping for something more automatic? For a simple looping video we might get bored and our eyes start wandering around, but if we're paying attention to the content, we probably keep focused on the moving/important objects.

Yes, I'm hoping for something automatic enough to allow for a turbocharged version of Foveated Rendering. One where, instead being dead ahead of you, the region of high-resolution rendering would follow your eyes as they saccade - so you'd never be able to tell the headset renders only a tiny fraction of the image at any given time, while giving it even more time to produce a high-quality render there.

> Unlike normal programming languages, fragment shaders always execute both parts of each branch due to gpu limitations.

GPUs do have real branching and even loops these days, you just can't have divergent branching within a shader group.

I'm not sure how efficient it is to splatter individual pixels like in the mask used in the article since adjacent pixels will likely need to be evaluated anyway if the shader uses derivatives. Too bad that the author didn't bother to include any performance numbers.

Good points. I should have definitely include perf numbers. This shader https://www.shadertoy.com/view/3l23Rh goes from 20-25 fps on my laptop to over 60 when rendering 1/10th of the pixels and up around 57-60 fps when rendering 1/5th of the pixels although I know that's not exactly great benchmarking. I don't think derivatives are a computational problem here (and a few tests seem to confirm it) but the values of the derivatives are definitely ruined in the process.

27.5MB, of which 27MB is five GIFs. Please don’t use GIFs like this. Use videos: they use less bandwidth, use less power, look better, and annoy your “I don’t want videos to autoplay!” users less.

Autoplaying (without the possibility of sound) is a feature, It's a huge aesthetic boon to the web. GIF is an entirely different conceptual flavor to video. It's good precisely because people almost never fuck with the fact that it autoplays.

When I say I don’t want things to autoplay, I mean it. Please don’t try to second-guess me. I don’t want anything autoplaying.

That GIFs autoplay is unequivocally a bug. Just a well-entrenched one that would take a lot of effort to fix, and no one capable of fixing it seems to quite think it’s worth it.

There are multiple reasons I might not want autoplay: bandwidth or performance concerns, accessibility problems from motion, or even just the simple “if you autoplay, then when I reach it it won’t be at the start of the video any more”.

agreed. I just cant focus on the text when there is giant motion right next to it

This seems something that could be fixed in the browser as a preference.

Unfortunately it’s not that easy. Images (animated or not) go through a very different pipeline to videos, and it would be a lot of work to make animated images behave like videos.

It would be cool to also see a demo where focus follows the face of the bouncing creature, to simulate where the observer would most likely be looking at. Maybe add a red dot, and tell people to follow it to simulate gaze tracking.

That's where I expected the article to go, and I was slightly disappointed when that avenue wasn't explored.

> How could this scheme improve if we had access to the internals of the 3d scene? For example, could we adjust our sampling pattern based on depth information?

If we had access to the internals, we could determine the visual salience of every object[1] and move the sampling pattern closer to the most salient one. Since that object is more likely to attract the viewer's attention, it would focus the rendering on those parts of the scene that the viewer actually cares about.

[1] http://doras.dcu.ie/16232/1/A_False_Colouring_Real_Time_Visu...

Related fun: MediaPipe has an iris tracking demo which can run in browser: https://viz.mediapipe.dev/demo/iris_tracking (top-right run button); blog[1].

Maybe "which 3/4 of the laptop screen needn't be rendered/updated fully"? Or "unclutter the desktop - only reveal the clock when it's looked at"? Or "they're looking at the corner - show the burger menu"? Though smoothed face tracking enables far higher precision pointing than ad hoc eye tracking. This[2] fast face tracking went by recently and looked interesting.

[1] https://ai.googleblog.com/2020/08/mediapipe-iris-real-time-i... [2] https://news.ycombinator.com/item?id=24332939

That's nice. Using eye tracking to optimize rendering workloads is low hanging fruit, I think. Hope to see it adopted by video games.

Eye tracking has currently way too much latency to be used effectively in video games.

Just make the fovea in your game larger to account for the lag; and fix temporal stability for things rendered outside of the fovea, and I think it should become unnoticeable

You could also make the "eye lag" part of the game mechanics. A bit like how your eye can take a second to adjust to darkness. Eyes would become an explicit peripheral device. I also think that the latency problem will be short lived: it's currently slow probably because we haven't had much interest in tracking eyes quickly.

I wonder how it would look if the noise was randomized each frame and the points from the previous frame were considered in future frames to cheaply fill in missing data. I believe a similar technique is used to optimize real time ray tracing due to the low number of pixels that can be traced each frame.

So basically doing video compression between the shader steps? Interesting.

That’s a very cool idea!

Given that your eye "wants" to follow the action, it would be cool if the foveal focus moved with the character, no? Then if you really didn't "notice" the de-rezed background, that would be the proof of the pudding without having eye-tracking. (At least for scenes with a single action-focus.)

Super cool. The problem, of course, is that this technique presumes that the eye remains fixed in the center of the screen. It doesn’t. But combine this with an eye tracking system that can adjust the screen space fovea center in real time, and you’re laughing.


Seems like it would be very interesting to see how changing the sample positions per frame and some sort of technique ala temporal AA would do.

As long as you don't sample outside the fovea it should be ok.

TAA implies a velocity buffer, and that velocity buffer may cause you to sample outside the fovea (of a previous frame), for which you will not have enough samples to do a decent reconstruction of the image.

Quite interesting. Are results available somewhere? The GIF is too compressed to evaluate this kind of content.

I don't get the point of it. It could be my screen size (15" laptop) or resolution (4k), but the gifs looked terrible even when I stared exclusively at the center of the image.

Peter, please reduce the font size. It's absolutely ginormous. The page is completely unreadable without switching to the Reader mode.

1001px wide browser window: https://i.imgur.com/4zdpOo1.png

1000px wide browser window: https://i.imgur.com/HQHkRED.png

Yeah. The culprit is at the end of main.css:

    @media (max-width: 1000px) {
     body {
      font-size: 250%;
      line-height: 175%;
I think this is meant to make it more readable on small devices when in portrait mode, but really the text is too big even for that.

Using Firefox, and this isn't true for me at all. The font is at most one pt. bigger than necessary.

As someone using an old prescription that one pt makes this page fairly pleasant to read compared to most.

Do you have font sizes turned up in your OS? Looking at the site in Chrome on my OnePlus 6 (phone), it's maybe a tiny bit large meaning around 40 characters fit on a single line, but that is far from unreadable.

I've checked a few browsers on desktop and it seems to be no problem at all there, with the added benefit that most (or all?) modern browsers let's you easily change the font size that is used with Ctrl +/- (or Cmd) on almost all sites. For instance, I have the fonts scaled to 125% here on HN since I find them a bit small normally. (You can do that on mobile as well, but I'm not sure if that works on a per-site basis like how it works on desktop.)

Not on the desktop, but I'm getting 10 lines of text per screen on an iPad.

Edit - https://i.imgur.com/8RpJbxs.png

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact