Hacker News new | past | comments | ask | show | jobs | submit login
Webgazer.js webcam eye tracking on the browser (brown.edu)
138 points by lazyjeff on July 16, 2020 | hide | past | favorite | 54 comments

Disclaimer, I am the primary engineer of a commercial eye tracking system.

Tools like these popup every now and then are a nice tool for rough estimation of gaze. Fixation tracking is a large complicated issue that usually requires some sort of calibration to get more precise results. By sidestepping the calibration problem, much higher subject compliance can be achieved since you don't have a grad student barking confusing order at you. The downside to this is the noise you see in the tracking results. For those interested a product that produces similar results is pupil labs ambient gaze tracking "Core" research headset[0]. [0] https://pupil-labs.com/products/core/

Author here (I didn't use Show HN because it's a larger collaborative project). Yes this is basically a tradeoff between accuracy and flexibility+accessibility.

For usability testing, physical eye trackers have to be used one person at a time (no simultaneous use), use an experimenter's time to schedule and administer, can only be used for a short period, and only work with local participants. But yes they will probably always have better accuracy, and detect saccades/fixations better which are also great for psychology studies.

The other thing is if you want to make a consumer application (like browser game, or accessibility mode), then it's more practical to have people just consent to having their webcam turned on, than to go out and buy an eye tracker.

In your estimation, how far away are we from eye-tracking software being able to detect the start of a microsaccade, estimate where the gaze is moving toward, and draw new contents on the screen there before the gaze even reaches that point? I would think that by "hacking" into human brain "vulnerabilities" like saccadic masking and chronostasis, such software could potentially yield seriously trippy and mind-altering results!

There’s two parts to that implementation of such a system, and they’re both interesting! The first part is detection of a micro saccade which is already available in research systems I have personally worked on. You can basically crank up the camera frame rate until you’re around the micro saccade range and do some clustering analysis on the positional data to decouple the hardware moving vs the face. The second part is having a fast enough reliable commercial grade displa system to present stimulus on. Displays are very fickle in practice and getting your hands on one that can run with adequate color, contrast, and brightness at speed is currently very difficult. One angle under research currently is the perception of stimulus during saccades and micro saccades. There’s quite a bit of time and effort in the industry going into neurological assessments through saccades, the tools coming out of this are really starting to come down in price. This opens the door for a bunch of lower priced research options, such as the parent article, to enable a much more rapid pace of understanding.

That's fascinating, but I'm wondering what is the practical value of detecting microsaccades? Aren't they just involuntary twitching?

For the duration of the microsaccade, you're blind. So if something changes onscreen it's much harder to see.

IIRC people have used small orientation changes during microsaccades for redirected walking in VR. You feel like you're walking straight but you're actually curving back on yourself.

Edit: I think that was just detecting full-on saccades but a microsaccade version would be smoother and harder to detect.

Source: https://blog.siggraph.org/2018/05/challenge-accepted-infinit...

They’re looking to be increasingly more interesting as a biomarker for MS among other things [0]. There’s a lot we still don’t know about our visual systems and the advent of lower cost hardware is allowing huge amount of interesting questions to be hypothesized and tested. It’s a great time to be in the science behind all of this, like the authors at Brown on this paper!

[0] https://iovs.arvojournals.org/article.aspx?articleid=2670267

I believe this might be available in less than 5 years, potentially less than 2, provided both http://www.adhawkmicrosystems.com/ and https://www.kura.tech/ ship and partner with each other. If either one of those companies dies or their technology doesn't work out like I hope, then it might be a while longer.

"How far away are we" is a strange question to ask about something we already do. You can read about this technique being used to investigate how we read here: https://docs.microsoft.com/en-us/typography/develop/word-rec...

Note the citation to a study that used the technique in 1975.

It depends on your application. If you're trying to replace your mouse with gaze, then yes you need careful calibration. I have worked on some purpose-built gaze interactive experiences and coarse accuracy without calibration can be enough to do something useful.

What are your thoughts on newer ML based approaches like MITs "Eye Tracking for Everyone"?


Had a blast at LauzHack '16 using this library to make a Chrome extension enabling "no-hands scrolling"[1]. Sadly, reading web articles while eating with both hands is still a dream. We managed to get okay-ish results on a MacBook Pro in perfect lighting from time to time, but nothing consistent. In variable lighting and on lower-end laptops we found it impossible.

I wonder if it would be possible with a better webcam and good, consistent lighting.

[1] https://github.com/jarlg/lookmanohands

In the brown undergraduate computer vision course one of the homework’s (or final project; my memory is a little hazy) was to improve the tracking of webgazer. People were pretty successful from what I remember. As an undergrad I didn’t think much of it but as a grad student I would be slightly mortified if undergraduates were improving on my current research for homework.

That's not too surprising, tbh. Building the baseline solution or platform from the ground up is often the most difficult part, and by the end of such a project, you end up with a 100 ideas about how to make it better. But either due to time constraints, a "perfect is the enemy of done" philosophy, or simply the fact that you reach a natural crossroads and there are multiple valid ways forwards, you have to draw a line in the sand and release at some point. It's natural that there my be low-hanging fruit for some people to pick up and improve on for some projects, particularly if they have different end goals than the initial project had.

Why? Someone smarter will always follow

Not even just smarter; we all stand on the shoulders of giants, and every year the barrier to entry falls that much more.

Can't wait until this makes its way to adtech. You can have sites that hold their "premium" content hostage until you allow camera access and proved (via eye tracking) that you looked at their ad.

"Wanna play a game? Just move the ball to the hole with your eyes. Oh and we need access to your camera."

“Please drink verification can”

Boy, I love to see an r/lol reference on HN. Gotta be a lot of crossover, right?

There is a short part of this in Black Mirror episode: Fifteen Million Merits.

To earn credits you must look at an ad-video, but when he looks away the video stops and makes a crazy sound so you'll continue to watch the ad...

I wonder. Obviously its technologically possible, but would average user accept it?

It is both possible and being pursued actively. I believe the average user will accept it. The average user has an uncovered camera on the computer and almost everyone has an uncovered camera on the phone. Requiring a cookie banner notification did nothing but make it more annoying to use websites, with less accountability on the site administrator for all the questionable actions they can freely undertake, now with your consent! But it goes far beyond that - there will be cameras embedded in digital billboards/posters you walk past that track your gaze and provide targeted ads based on either facial ID or device ID, the ad effectively learning who you are through your leaky apps or always-on bluetooth.

I had an idea for an evil-eye game, in the style of Happy Tree Friends or so. That is: a lot of cute creatures around, and nothing happens (they may giggle, jump or blink occasionally) until you look at them. If you do...

That's a really cool use case for the library ! I think the calibration stage could be made easier with some "click as fast as you can on the randomly appearing circle on screen" gamification, and just assume that people look at what they click on by default. It could be some little character changing color at random and you need to click it at that point, to make sure the user keeps his eyes on it for a few seconds before clicking.

If you want to see how well webcam eye tracking works check this out https://www.realeye.io/test/f80dd676-3b55-4d2c-931c-d925b068... It's using WebGazer as it's core - but with many improvements done by us.

BTW I'm RealEye co-founder so AMA :)

I just don't see the point of this. I don't think I've ever lost track of my eyes.

Sadly my eyes are too small to actually track the pupils. I remember when I got my ID, the lady taking the picture said "Sir, please open your eyes". I looked at her and asked "They are open... you don't see that?" She laughed and asked if I had been smoking. :/

Whoa. I have a strange mix of reactions to this. It's impressive, and interesting, and worthwhile, slightly creepy, and potentially hugely useful for a11y, democratizing cutting-edge HCI, AR, gaming, etc etc. Surprised it's not getting more attention!

One good thing about webcam access on the web is that browsers require explicit access by the user and browser tabs clearly indicate when a tab is recording with either the mic or camera.

Yes! Hence the "slightly" modifier. :)

Maybe because it's been around for a couple of years?

This seems really cool but it doesn't seem to let me choose which webcam to use- it seems to default to a virtual device I use for streaming that isn't set up right now so I can't really try it out.

I've had this issue with Chrome on OSX where it won't allow camera selection. Firefox (and I think Safari) both do.

It's unclear why Chrome does this - it has a preference for which camera but it seems ignored and takes the most recently installed one.

The "move the ball around with your eyes" demo doesn't work at all for me in Safari on a MacBook. I can only move the ball with the mouse. The other demo works okay, but very low accuracy (like 50%). Firefox on the same computer is unusably slow, generates a new "data point" (the visible dot) only every few seconds. Accuracy was about 10%.

The listed use cases here struck me as odd, I can't imagine why I or anyone wanting to read a news article would allow the site access to my webcam. Nor do I think most users would turn on their cameras to help with any given sites analytics.

So aside from games, what is the actual use case here ?

Nielson Research in years past would ask volunteers to attach special equipment to their televisions to report back what they were watching, and for how long. Even nowadays, I have a relative that carries an always-listening device to track ambient music being played/heard.

A technology like this would make it easier to run a program with a subset of volunteers online, w/o them needing any special equipment.

Of course, given that slightly more people (than tech communities) are aware of the dangers of an uncovered camera, the possibility of using this broadly w/o consent is nearly nil.

Well, if I were a company designing an information service, perhaps monetized by ads, then it could make sense to collect eyetracking data to see what people tend to look at, perhaps as part of an AB testing regime.

Then the choice comes between: do I want to bring people in to my expensive lab where I can get high-fidelity results, or do I want to potentially get a much larger sample for cheaper at the cost of lower fidelity.

I'm starting to develop RSI in my wrists and hands, and have been looking at eye tracking systems as a potential way to help reduce the amount of clicking that I need to do.

I was thinking maybe I could put webgazer.js into a chrome extension and use it to navigate through pages for me, but I don't think it has a click mechanism.

fwiw - I'm a huge vimium extension user as keyboarding is less of a strain than mouse movement.

quite a few folks in the voice programing community supplement voice commands with eye tracking using dedicated hardware. Talon [1] is considered an excellent resource in that field and there are also people in the Caster/dragonfly[2] ecosystem using EyeXMouse [3] and wolfmanstout [4] also has some work worth checking out.

[1] https://talonvoice.com/

[2] https://github.com/dictation-toolbox/Caster

[2] https://github.com/Versatilus/EyeXMouse

[4] https://handsfreecoding.org/2014/11/16/getting-started-with-...

Have you looked at https://eviacam.crea-si.com/ ? Supports Windows, GNU/Linux, and Android, works globally, and runs separately from the browser. (Technically not eye tracking, but close enough)

Wow! I had not, but that looks like it might do the trick. Thanks!

Check out Talon Voice. It's way better than these solutions.

Great question. There's a use for usability testing, where paid participants currently sit at an application in a lab to do eye tracking. So there can be a similar version online for if let's say, Bloomberg is testing a new version of their online terminal. Multiple large companies exist that solely develop eye trackers, so there's quite a market for this.

Plus it does not have to be monetary compensation. I could imagine eye tracking as an option to get around the current newspaper paywalls. If you want to read an article, you can choose to pay (subscription) as usual, or you can read them for free if you let the New York Times see what articles you read, for how long, in which order, or what ads you looked it.

I suppose that might be an option, but in my experience people are highly unlikely to grant a site permission to their camera if they aren't using it as part of a conference call.

But I do agree if you have some A/B testing, that the users are doing it as a part of a study, it might be usable.

I would absolutely love to see something like this combined with a live video filter to make actual eye contact during video chat possible.

Facetime calls (probably only from an iphone 10+) do this now https://9to5mac.com/2019/07/03/facetime-eye-contact-correcti...

Can you recommend any literature to get an introduction into eye-tracking at all?

Here’s an old school high level overview of pupil tracking methods[0]. The focus recently has been moving these older computer vision methodologies to ML without introducing too many errors.

[0] http://citeseerx.ist.psu.edu/viewdoc/download?doi=

I tried my best on the calibration demo page in Safari and consistently get a 0% accuracy result. It calibrates sort of ok in Firefox. I guess this is heavily browser dependent? But why?

Hmm, not sure -- do you see the gaze dot when you're looking at the center circle?

I see a gaze dot. It just seems kinda like Safari is accumulating some tracking error that Firefox doesn't. Both browsers find my face accurately.

(side note: when the gaze dot overlaps the calibration circles, the circles become unclickable without some very careful fiddling)

I tried using this one time but unfortunately found it didn't work when I wore my reading glasses. The screen reflection off my glasses just whites out both eyes.

Now add this to vidconf systems by figuring out which speaker is who, and make a button to talk only to them.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact