Hacker News new | past | comments | ask | show | jobs | submit login
Hand-Tracking with Three.js (xl.digital)
135 points by marban on Feb 3, 2023 | hide | past | favorite | 31 comments



Hello! I am the creator of this experiment.

Glad to see the conversation about hand tracking in the browser over here.

This demos is done under the context of a series of creative experiments on how to use real time hand tracking in the browser for creative interactions. Will be posting more experiments soon.

Tech background: I am using MediaPipe to control the hand rig in threejs. MediaPipe provides landmarks that are used to control a threejs Skeleton (hierarchy of bones with rotations).

Feel free to ask, I will answer any questions!


> creative interactions

Fwiw, some things I've found fun: Clip-on fish-eye lens, intended for phone but fitting on laptop, for expanding webcam field of view. Additional cameras: on sticks above screen tips for high-res stereo positioning over kbd; asymmetric high-off-to-side to trade some resolution for some field of view (meh); high-overhead for whole-workspace tracking. Binocular periscope with webcam splitter and screen-tip mirrors (blech - low-res awkward fiddly). Look-down mirror on webcam, partial or full, to get kbd view (nice in VR). Look-down with curved mirror along top of keyboard to get "out along kbd surface view" and crufty touch detection for kbd-as-touch-surface (cute but fiddly - only makes sense to save a camera or two; caveat I had high-contrast white hands on black thinkpad kbd). Putting tracking markers on fingers (flats, a-frames, or cubes on velcro rings) makes for less jittery tracking, but is awkward (meh). Markers taped around keyboard help with calibration.

Magic wand. I found I could more-or-less manage to type while holding a chopstick. So stuck a marker cube on one end, and an arc-sliced-off a small-Xmass-ball on tip, so it slides smoothly across (thinkpad) keys. Barber-pole rotation marker. Anvil'ed tip pressure sensor, a finger microswitch, and very thin and soft ribbon cable to arduino. But didn't actually get the pressure sensor working before punted on all this. Chopstick was narrow enough to avoid breaking hand tracking.

Some gotchas: 2K camera resolution was painful for tracking. (Several years ago) mediapipe finger tracking was annoyingly noisy for doing stereo. You only get one usb2 camera per usb port, even if it's usb3 (maybe usb3 cameras allow working around that limit nowadays?). If you do hand, arm, face and marker tracking on several cameras, even with native gpu mediapipe, you're burning a lot of gpu just on the human interface device, before your likely-graphical-itself app even starts. If I had it to do over now, I'd punt mirrors, use 4K usb3 cameras, and at least with desktop, more cameras. Nicely merging high-latency camera tracking with lower-latency keyboard, touchpad, and graphics tablets, requires changes to the input event pipeline, and adapting apps to deal with "oh my! That space key pressed several keys ago - it was pressed with a pointer finger at position 3!, so that means we roll back app state and then ...".

Here we are a half-century later, still banging on glorified xerox altos. We're so broken.


Correction: barber pole for optical rotation went with color ends, not marker cube.

Greenfield, stylus-wise, I'd... (1) Punt "keyboard as graphics tablet" as a bodge. Except for Mac-ish non-tiny touchpad with stylusified 2: (2) Simple chopstick with color ends and barber pole. Caveat that color blobs with 2K cameras and ambient light are noisy low-res (and slow) for small gestures. Workday ergonomics says resting hand with small motions, rather than movie arm waving. Barber pole pushes towards full-not-pen-short chopstick. And I've not seen a nice simple story for finger pressure/buttons. Sensor fusion with hand pose might be interesting? Squishy HID? (3) Graphics tablets already give clean fast high-res 2D, sometimes several cm above surface, often pressure, sometimes tilt, even rotation. Might add high-latency optical height. Gestures with distinguishable 2D projection could skip burden of high-time-res fusion, for a fast dev path to UI software rather than HID struggles. Caveat mediapipe hand pose struggles with thick black stylus, black background, and white hands (at least years ago - maybe someone is now training with styluses? Maybe someone's stylus, if recolored, is thin enough?). For the other hand, can fuse pose with tablet multitouch. Fwiw.


Thanks for the pointers! Do you have any video/media/recordings of your experiments?


Np, tnx for the demo. Sigh, sorry, not really, nor easily accessible.

I do that poorly, repeatedly. A mindset of "today's rev n is bad, still unusable; tomorrow's incremental rev n+1 will be slightly better; no point in recording bad, wait for better; will demo at meetup for friends, but otherwise, who'd care?"... left a sparse trail. Sort of: you might take a picture of your nice finished cake, but of baking? There have been HN posts of commitment-hacking as a service, eg, iirc, a Japanese workspace with sign-in like "I'm here to write one chapter, and I'd like person-standing-behind-me level pressure". So perhaps, motivate documenting this week's state as a service? As finding/creating community that's interested in such seems often difficult.

Hmm, here's a snapshot[1] of my late-rev laptop hardware with flop-up kbd cam and (stowed fold-up) stereo cams (wires not connected). Gaff tape, sticks, vecro and cardboard esthetic allows fast and incremental iteration. For wires, I like magnetic usb connectors[2]. Fwiw.

[1] https://twitter.com/mncharity/status/1232446953784369154/pho... https://pbs.twimg.com/media/ERqCfdkX0AEWTN_?format=jpg&name=... [2] https://twitter.com/mncharity/status/1255300177960808448


Based on the sample at https://google.github.io/mediapipe/solutions/hands — that doesn't even sound all too complex.


I thought the same until I tried!

As a matter of fact, lots of people over twitter have been sharing their frustration when they attempted to do the same thing.

This is the funniest one:

https://twitter.com/isjackwild/status/1617559339891396619

Here's some comments on my implementation:

https://twitter.com/SketchpunkLabs/status/161758661970323049...


This is amazing I had a browser plugin called flutter years back that was able to do webcam gesture recognition for scroll and forward/back. This is using threejs so I wonder how much is CPU or GPU and also how well this could, now or in the future, run under the hood in the background of a web game (or WebXR!) just as the input device and without too much overhead. Great Proof of Concept !!


Thanks for the nice words! Your plugin sounds like fun. In terms of using hand tracking for web games: my next experiments will use this setup to interact with 3D scenes.


Looking forward to where you take this, and just to clarify I was just an end user of Flutter but was checking for a link (I had this installed years back) and turns out they got aquired by Google[1] so could be somewhere in Lense for all I know!

[1] https://www.searchenginewatch.com/2013/10/03/google-acquires...


One use case I immediately thought for hand movement tracking like this is to help my disabled brother - tetraplegic - steer efficiently. Using mouse for him is sometimes too hard. Only in some cases. If one could use this as a macro launcher or more accurate joystick without attaching real joystick, this could help a lot


I wish I had found out about MediaPipe before the tail end of grad school, but my collaborator integrated some neat stuff into our project. Very cool; thanks for sharing!


the hand tracking is spot on by the small image in the bottom left hand corner. steady and accurate positioning of two hands. :-)

unfortunately the rendering of the hands in the large window jumps all over the place on firefox, ubuntu, razer laptop. :-(


It seems quite stable to me (on Chrome) ... until I weave my fingers from each hand together.

That said, I still think MediaPipe is an excellent piece of ML tech. And, from a dev point of view, quite easy to get working for various things in the browser[1][2]

[1] - MediaPipe Selfie Segmentation for real-life background replacement - https://scrawl-v8.rikweb.org.uk/demo/mediapipe-001.html

[2] - MediaPipe Face Mesh for, well, drawing lines on your face - https://scrawl-v8.rikweb.org.uk/demo/mediapipe-003.html


On my Pixel 7 Pro phone, I'm able to ge represent all kinds of strange hand positions correctly on the screen via my selfie camera.

There is some jumpiness but I am holding the phone in one hand and making hand positions with my other so I'm less concerned about the jiggles.

It has a hard time naturally when all of my fingers are occluded by one another.

Way to go.


Same problem on Chrome (mac M1). For weird positions I can understand as the computer must still interpret the 2D dots in the small screen to 3D coordinates, but even when I just show my hand fingers spread, palm facing the camera, the small image is stable and the 3D image glitches quite profoundly.


Same problem. The hand tracking is really impressive, basically flawless. The rendered hand is all over the place, doesn't match the debug window at all.


Cool progress, really nice work!

Not quite high enough fidelity to handle ASL though.

Some issues I ran into testing it, if it's something that interests you:

- Cannot distinguish closed vs. open fingers (always adds gaps between fingers, even if they're touching) (B) - Can't handle crossed fingers (R) - Doesn't seem to like extended vs. curled fingers in some cases (H) - Other failed letters: (Q), (E?), (F?), (G), (Q), (S), (U/V)

But, when signing naturally, it seems to get enough of the shapes and orientation correct enough to understand what I'm seeing. I'm sure there's things it'd trip up on because of some of the above weaknesses in detecting hand shapes but it does seem to get movement, orientation, and position "good enough".


Doesn't really work well while holding objects or faster movements, which I imagine would be restrictive for gaming or simulation purposes. Might be useful as a replacement for Leap Motion, though. I can see this working for manipulating a desktop environment.


Congrats OP, very cool and runs great for me in Chrome and Firefox. Mediapipe can indeed work for manipulating desktop environments, I've been working off and on at that for a couple years.[0] It's tricky to make the interactions effective and minimize false positives while also avoiding a heavy cognitive load on the user, but there's lots of potential.

[0] https://www.youtube.com/watch?v=bHjj46AIVxs


This is so awesome. Fantastic demo.

An "air-piano" seems well within the realms of possibility now.


This is unfortunately a bit buggy. I can see in the 2D image that it is tracking when my hand is turned backwards and relatively flat to the camera. But the 3D is showing my hands are curled.


This is really cool to see. I don't have any problems or bugs using it, but it seems like the 3d rendering of the virtual hands is not quite as fast as the tracking itself. I wonder if and how this could be used in a meaningful way. As a feature for things like Google Quick Draw it would be fun.


Cool mashup! For anyone interested, I found this codepen from Google where you can play with Mediapipe in your browser: https://codepen.io/mediapipe/pen/RwGWYJw


Really well done! I've taken a stab at this problem with mixed results. This is leaps and bounds beyond most of my attempts. Thanks for sharing!


It doesn't allow me to select which camera to use unfortunately, Chrome is set to use one but the site probably uses the first one it finds.


You need a better browser. Firefox lets you choose which camera you want to use.

https://jasper.monster/sharex/firefox_8DYyGO7U9f.png


I guess one could make a fun therminvox game out of this.


Wow this works much better than I had expected. Well done!


Jack, it's amazing. Very impressive stuff.


fails when both hands touch or cross




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: