Hacker News new | past | comments | ask | show | jobs | submit login

> But it does put an enormous amount of pressure on the eye tracking. As far as I can tell so far, the role of precise 2D control has been shifted to the eyes.

I've been researching eye tracking for my own project for the past year. I have a Tobii eye tracker which is probably the best eye tracking device for consumers currently (or the only one really). It's much more accurate than trying to repurpose a webcam.

The problem with eye tracking in general is what's called the "midas touch" problem. Everything you look at is potentially a target. If you were to simply connect your mouse pointer to your gaze, for example, any sort of hover effect on a web page would be activated simply by glancing at it. [1]

Additionally, our eyes are constantly making small movements call saccades [2]. If you track eye movement perfectly, the target will wobble all over the screen like mad. The ways to alleviate this are by expanding the target visually so that the small movements are contained within a "bubble" or by delaying the targeting slightly so the movements can be smoothed out. But this naturally causes inaccuracy and latency. [3] Even then, you can easily get a headache from the effort of trying to fixate your eyes on a small target (trust me). Though Apple is making an effort to predict eye movements to give the user the impression of lower latency and improve accuracy, it's an imperfect solution. Simply put, gazing as an interface will always suffer from latency and unnatural physical effort. Until computers can read our minds, that isn't going to change.

Apple decided to incorporate desktop and mobile apps into the device, so it seems this was really their only choice, as they need the equivalent of a pointer or finger to activate on-screen elements. They could do this with hand tracking, but then there's the issue of accuracy as well as clicking, tapping, dragging or swiping - plus the effort of holding your arms up for extended periods.

I think it's odd that they decided that voice should not be part of the UI. My preference would be hand tracking a virtual mouse/trackpad (smaller and more familiar movements) plus a simple, "tap" or "swipe" spoken aloud, with the current system for "quiet" operation. But Apple is Apple, and they insist on one way to do things. They have a video of using Safari and it doesn't look particularly efficient/practical to me [4].

But who knows - I haven't tried it yet, maybe Apple's engineers nailed it. I have my doubts.

1. https://uxdesign.cc/the-midas-touch-effect-the-most-unknown-...

2. https://en.m.wikipedia.org/wiki/Saccade

3. https://help.tobii.com/hc/en-us/articles/210245345-How-to-se...

4. https://developer.apple.com/videos/play/wwdc2023/10279/




From reading/listening to reports of people who were able to demo the device, I think Apple may have nailed it, or come close. Everyone I've seen has absolutely raved about how accurate and intuitive the eye tracking + hand gesture feels.


I just went to echo this. Most of the reviews I’ve watched or read have raved about the eye tracking, regardless of if the reviewer was a fan of VR in general. It feels really cool that we might finally have a new computing paradigm. There is so much potential here and I can’t wait to see where things go.


No one has used it for an extended period of time. You don't get a headache right away, but after a while, you feel it.


The people who designed it, built it, and tested it have used it for an extended period of time. It would be un Apple like if they decide to ship a product with such a well known potential defect in the user experience.


It can't be as bad as the 90s sega vr, where executives walked out after testing it and then buried the entire project.


Moving your eyes doesn’t cause headaches. Eye strain certainly can. If the resolution is high enough and the lag issues are taken care of there’s a good chance that it can be used for longer periods of time.


Entirely serious question: is this something that can be trained? That is, after a long enough time of use and strengthening of the appropriate muscles and neural pathways, the headaches and such go away?


I think for some people, it won't be a problem. Much the same way some people have no issues spending hours using VR goggles without nausea. There are many people with accessibility issues who manage it, for example, and geeks who just like the idea of living in the future. In my experience, your eyes just get tired, even after you get used to it.

I think unless Apple has really added some sort of magic, it will be an issue for most people to use Vision Pro for anything but passive activities, or with a physical trackpad.


Surely you are already moving your eyes to look at click/tap targets anyway. Why would there be any additional muscle strain or neural pathways needed? Clearly moving our eyes doesn’t cause problems because we move them all the time as it is.


Maybe that's why some people prefer keyboard shortcuts; I don't move my eyes around the keyboard when typing, or when pressing Ctrl+S or Ctrl-C/Ctrl-V or in Windows I can Alt-Tab repeatedly and the highlight moves and I can looking around at the available windows while watching the selected one change with my peripheral vision. I can keep typing in a textbox for a bit while looking away, but with a virtual keyboard or speech to text which might be getting it wrong that's less of an option.

Compare with a smartphone, I can move to tap the 'back' button in Safari without moving my eyes to the back button. Looking "at" the phone generally is enough to see where my finger is, and say clicking on links I can look at the link and move my hand to it and move my eyes away while my hand is moving.

Having to coordinate looking and clicking - while I haven't tried it - I can imagine that feels more load-bearing, more effortful, more constraining, more annoying.


The eye tracking as the interface is what stood out to me as the most radical choice in the device.

I'm doubtful we really know how good it is yet. Without a few hours/days of hard use its just not going to be obvious the degree to which they've overcome all these long standing obstacles via "magic".

It's hard to think voice is anywhere near ready for this application, I still find it highly frustrating after decades of tries.


You have those problems mostly because your Tobii is external device much more far away - you won't get good enough accuracy comparing to any eye tracking that is just a camera few cm far away from your eyes.

With Apple solution (or any VR with eye tracking):

1. You have very accurate head 6d location (gyroscope, compass, accelerometer + lidar for translation) at 100+ fps minimum and very computationally cheaply without much latency

2. Cameras pointing at each eye from distance of few centimeters. In this situation Eye region of interest occupy full image resolution. Probably even VGA resolution (or less) is enough to have very accurate precision in this case

3. Because headset covers your face you don't have any problem with different lightning condition, cluttered background, whats more they use IR floodlight to have consistent lighting of pupils

4. Possibly they even use similar Truedepth IR dot pattern to get 3d depthmap of your pupils similar like truedepth sensor works with difference that here they can put thos camera more far away and have it more accurate disparity map

5. Who knows maybe they even use lidar instead of truedepth for even lower latency

6. Because sensors are so close they can have very accurate reconstruction of your eyes and distance between them - most people don't have completely symmetric eye/face

Tobii is a sensor that is just like 0.5m far away from your face. Has to track your head, estimate orientation and distance, and detected eye ROI will be a tiny crop of their image sensor even if they camera has 4K resolution (which I doubt they use such resolution). At that distance if you want to get accuracy of 1 cm of what you are looking at screen you need to detect pupil movement of ~1 degree angle. If you want even better accuracy without gitter like 1mm then you need to detect movement of pupils of 0.1 degree angle.


Interesting. You might be right. I'll have to try it and see.


> If you were to simply connect your mouse pointer to your gaze, for example, any sort of hover effect on a web page would be activated simply by glancing at it.

I’m curious about how they deal with apparently not doing this, in terms of how do you know what button is going to be clicked if you click? The product page under privacy says that websites and apps don’t get eye tracking data, only knowing when you click, so I guess that means a website won’t display its active/hover effects on a button you’re looking at. Because if it could then it could just link that to analytics in JavaScript and defeat that privacy protection. So if you don’t have that, how do I actually know what’s going to happen when I click?


I’m sure the info. is in one of the WWDC videos, but I imagine the hover effect could be activated by the OS without notifying the webpage.


Sure, for a well built standards compliant website making appropriate use of HTML elements. There are a lot of websites that do their own thing, use lots of custom Javascript, where standard browser features like tabbing through their elements don't work, and they don't work well with existing accessibility tools. I'm skeptical that an OS or browser level feature is going to be able to make sense of them. That's their fault, but it's the reality of a huge part of the web.


Heads up: PSVR2 has eye tracking, I know it’s used in some games. Might be something to look into.


PSVR2 uses it for foveated rendering where the eye tracking doesn't really have to be that precise. It's done for performance optimization, not input controls.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: