Handtrack.js: A library for prototyping realtime hand detection in the browser

ofrzeta · on Jan 3, 2021

I came to say that this will probably make our fan constantly spin until I saw that this is already in the TODO list of the project.

Every client side feature such as background blur or face detection will make my (or every other person's I know) fans spin constantly so I will never use that feature.

I don't know if there's a potential software way to solve this problem or do we have to wait until everyone's using an m2 or i8 cpu?

gibolt · on Jan 3, 2021

Camera resolution is likely to grow at the same rate as processor improvements. Downsizing stream first might be the best bet.

Although, if this is meant for prototypes, it doesn't matter if you get to validate your idea's efficacy

vanderZwan · on Jan 3, 2021

Image size is already downscaled to 450 * 380, and afaik most ML libraries downscale the input by default. In that context a higher camera resolutions only really affect noise levels due to the effective super-sampling, not the performance of the library (assuming downscaling is an insignificant relative to the hand detection part).

JosephRedfern · on Jan 3, 2021

(I get about 28-30fps on my M1 MBP, didn't get any fan spin but wasn't using it for long).

ofrzeta · on Jan 3, 2021

What browser did you use? It's interesting because the M1 is so new and I wonder about JS engine optimizations on various browsers.

JosephRedfern · on Jan 3, 2021

This was the Apple Silicon optimised Chrome. I tried Firefox on my desktop (9900k) and it have awful results, like 1fps. Downloaded chrome specially for this, and haven’t yet tried on Safari. I’ll do so now.

EDIT: 23-24FPS under Safari on M1.

__MatrixMan__ · on Jan 3, 2021

I'm not hip to the js world. Is there a common pattern for offloading the compute heavy parts of a js workload to a server somewhere? I'd totally just leave my pc running so it could do the heavy lifting for my mobile devices if that meant better battery life.

If there's going to be waste heat, I'd rather it go into the building that I pay to heat.

kreetx · on Jan 3, 2021

They don't need to be offloaded to a server, but just optimized: either running in web assembly or running on special machine learning cores (like on Apple devices).

moron4hire · on Jan 3, 2021

This is running on the GPU. It's using a WebGL-oriented version of tensorflow. The problem is most likely the sorry state of GPU drivers on macOS, causing your system config to be disallowlisted against having hardware acceleration WebGL enabled by default.

tpetry · on Jan 3, 2021

The ability to run deep learning inference on cuda, opencl or whatever the browser has access to would be a _BIG_ leap forward to ai in the browser.

Maybe an onnx native binding should be available in browsers and the browser can choose the most performant execution engine at runtime.

kesor · on Jan 3, 2021

How is this better than the already available HandPose https://github.com/tensorflow/tfjs-models/tree/master/handpo... ?

enjoylife · on Jan 3, 2021

A cursory glance shows the output of this project is simply bounding boxes for multiple hands, while the handpose project you mentioned is tuned for a single hand but it provides the key points for fingers and joints. I sense that handpose is probably far more useful for anyone going beyond a prototype.

lazyjeff · on Jan 3, 2021

I'm not the author for either project, but my impression from some tests has been that hand pose is obviously better information, but fickle. There's simply a limitation of how well a camera operating in the visible light range can do (especially if the palm is not facing the camera). If you don't need to match the image to a hand pose, then just tracking the hand position is going to be more accurate for that.

apinstein · on Jan 3, 2021

Facebook might disagree with your assessment... hand tracking research for VR has come a long way:

https://research.fb.com/publications/constraining-dense-hand...

m3sh · on Jan 3, 2021

if you are bald and your head is in the frame, it detects your head as hand.

amelius · on Jan 3, 2021

Reminds me of: https://gizmodo.com/ai-camera-mistakenly-tracks-referee-s-ba...

vykthur · on Jan 3, 2021

Author here, this is a known issue and being worked on for handtrack.js 2.0.

The face/head errors is likely an artifact of transfer learning (handtrackjs is finetuned on the an object detection model trained using the coco object detection dataset which contains a person category).

Current approach being explored to expand the training data to include faces/heads such that the model better discriminates this from hands.

There are also size and speed optimizations being explored (smaller models such as efficientdet, quantization).

tgv · on Jan 3, 2021

It also recognizes my ceiling as a hand when there's no hand in front of the camera. Not enough negative training data?

To satisfy the curiosity of VoidWhisperer: the score was 0.868. And it also sometimes loses track of a hand, in my case quite reliably when it's flat and seen from the side.

VoidWhisperer · on Jan 3, 2021

What was the score it determined when it detected your head as a hand? Might just be an issue of finetuning the minimum applicable score

devxpy · on Jan 3, 2021

I wonder if this problem can be solved by using the YOLO model in front of this to cut out the noise.

vykthur · on Jan 3, 2021

Current approach being explored to expand the training dataset to include faces/heads labels such that the model better discriminates this from hands.

mywacaday · on Jan 3, 2021

I'm not bald and my head got a score of 0.9xx and my hand in the same frame was 0.7xx

kreetx · on Jan 3, 2021

No, I think you should just stop fooling yourself and accept reality for what it is ;) (yes, I'm balding too and it's okay).

rufus31415 · on Jan 3, 2021

WebXR is a JS API that, among other things, allows you to track hands using the device's capabilities. Handtrack.js is a good alternative for browsers that are not compatible with WebXR.

oksurewhynot · on Jan 3, 2021

I used handtrack.js a few months ago to write an in browser theremin. It's not perfect but it allowed me to write the whole thing in about three hours and only kind of blew up my 2011 mbp.

https://github.com/msmedes/theremin

vykthur · on Jan 3, 2021

Hi, handtack.js author here, I am exploring some size and speed optimizations which might help with this (smaller models such as efficientdet, quantization). Will share once something is ready.

po1nter · on Jan 3, 2021

It detects my face as a hand with a score of 0.900+ especially when I tilt my head backwards.

bathtub365 · on Jan 3, 2021

How is something like this addressed in ML? In traditional software development there are tools like automated tests & static analysis to prevent your customers running into these types of embarrassing issues.

vykthur · on Jan 3, 2021

Author here, this is a known issue and being worked on for handtrack.js 2.0.

The face/head errors is likely an artifact of transfer learning (handtrackjs is finetuned on the object detection model trained using the coco object detection dataset which contains a person category).

Current approach being explored to expand the training data to include faces/heads such that the model better discriminates this from hands.

There are also size and speed optimizations being explored (smaller models such as efficientdet, quantization).

villgax · on Jan 3, 2021

A better model is BlazePalm inside MediaPipe/Google FOSS