I came to say that this will probably make our fan constantly spin until I saw that this is already in the TODO list of the project.
Every client side feature such as background blur or face detection will make my (or every other person's I know) fans spin constantly so I will never use that feature.
I don't know if there's a potential software way to solve this problem or do we have to wait until everyone's using an m2 or i8 cpu?
Image size is already downscaled to 450 * 380, and afaik most ML libraries downscale the input by default. In that context a higher camera resolutions only really affect noise levels due to the effective super-sampling, not the performance of the library (assuming downscaling is an insignificant relative to the hand detection part).
This was the Apple Silicon optimised Chrome. I tried Firefox on my desktop (9900k) and it have awful results, like 1fps. Downloaded chrome specially for this, and haven’t yet tried on Safari. I’ll do so now.
I'm not hip to the js world. Is there a common pattern for offloading the compute heavy parts of a js workload to a server somewhere? I'd totally just leave my pc running so it could do the heavy lifting for my mobile devices if that meant better battery life.
If there's going to be waste heat, I'd rather it go into the building that I pay to heat.
They don't need to be offloaded to a server, but just optimized: either running in web assembly or running on special machine learning cores (like on Apple devices).
This is running on the GPU. It's using a WebGL-oriented version of tensorflow. The problem is most likely the sorry state of GPU drivers on macOS, causing your system config to be disallowlisted against having hardware acceleration WebGL enabled by default.
A cursory glance shows the output of this project is simply bounding boxes for multiple hands, while the handpose project you mentioned is tuned for a single hand but it provides the key points for fingers and joints. I sense that handpose is probably far more useful for anyone going beyond a prototype.
I'm not the author for either project, but my impression from some tests has been that hand pose is obviously better information, but fickle. There's simply a limitation of how well a camera operating in the visible light range can do (especially if the palm is not facing the camera). If you don't need to match the image to a hand pose, then just tracking the hand position is going to be more accurate for that.
Author here, this is a known issue and being worked on for handtrack.js 2.0.
The face/head errors is likely an artifact of transfer learning (handtrackjs is finetuned on the an object detection model trained using the coco object detection dataset which contains a person category).
Current approach being explored to expand the training data to include faces/heads such that the model better discriminates this from hands.
There are also size and speed optimizations being explored (smaller models such as efficientdet, quantization).
It also recognizes my ceiling as a hand when there's no hand in front of the camera. Not enough negative training data?
To satisfy the curiosity of VoidWhisperer: the score was 0.868. And it also sometimes loses track of a hand, in my case quite reliably when it's flat and seen from the side.
WebXR is a JS API that, among other things, allows you to track hands using the device's capabilities. Handtrack.js is a good alternative for browsers that are not compatible with WebXR.
I used handtrack.js a few months ago to write an in browser theremin. It's not perfect but it allowed me to write the whole thing in about three hours and only kind of blew up my 2011 mbp.
Hi, handtack.js author here,
I am exploring some size and speed optimizations which might help with this (smaller models such as efficientdet, quantization). Will share once something is ready.
How is something like this addressed in ML? In traditional software development there are tools like automated tests & static analysis to prevent your customers running into these types of embarrassing issues.
Author here, this is a known issue and being worked on for handtrack.js 2.0.
The face/head errors is likely an artifact of transfer learning (handtrackjs is finetuned on the object detection model trained using the coco object detection dataset which contains a person category).
Current approach being explored to expand the training data to include faces/heads such that the model better discriminates this from hands.
There are also size and speed optimizations being explored (smaller models such as efficientdet, quantization).
Every client side feature such as background blur or face detection will make my (or every other person's I know) fans spin constantly so I will never use that feature.
I don't know if there's a potential software way to solve this problem or do we have to wait until everyone's using an m2 or i8 cpu?