
On-Device, Real-Time Hand Tracking with MediaPipe - neversaydie
https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html
======
Havoc
Remember that HN post the other day about someone going on a massive mission
on voice recog because they can't use a mouse due to pain?

Stuff like this makes me hopeful even if it seems like a gimmick when viewed
in isolation.

~~~
WalterGR
_Remember that HN post the other day_

No. Link? Or can you remember any other details?

~~~
slashcom
[http://nsaphra.github.io/post/hands/](http://nsaphra.github.io/post/hands/)

This one perhaps

~~~
melling
Close to 400 comments. It was a good discussion.

[https://news.ycombinator.com/item?id=20662232](https://news.ycombinator.com/item?id=20662232)

------
zawerf
The underlying project MediaPipe looks pretty cool:
[https://github.com/google/mediapipe/blob/master/mediapipe/do...](https://github.com/google/mediapipe/blob/master/mediapipe/docs/hand_detection_mobile_gpu.md#Graph)

I wonder why they didn't build it on top of one of the gazillion flow-based
visual programming languages instead?

~~~
Mathnerd314
Probably because it uses TensorFlow (lite) and there aren't many (any?) VPLs
supporting C++ integration.

And also Google has NIH syndrome, it's hard to think of any outside projects
they use besides the Linux software stack and LLVM.

~~~
dekhn
Google depends on thousands of external software projects.

They heavily support a VPL, Scratch.

~~~
Mathnerd314
AFAICT the relationship for Scratch is that Google wrote a new VPL (Blockly)
and the MIT Media Lab released it as Scratch 3.0.

I guess the "thousands" is from
[https://opensource.google.com/](https://opensource.google.com/)? In a few
minutes of browsing, I couldn't find anything besides Bullet that wasn't just
Google releasing something as open source.

------
godelski
This is neat. I'm curious about a limitation though.

Was this trained on people with missing or partially missing digits? Like if
someone is missing the top part of their third and fourth finger does it
always predict Spiderman or Rock for an open hand?

I don't think this is a necessary thing for an openly released piece of
software that's not aimed at edge cases like these. I'm just curious about
limitations and how it deals with edge cases. I also don't currently have a
friend with a missing finger to test with.

(Also you could probably fine tune this model to pick up those cases. Would be
curious how good results would be because I imagine it'd be difficult)

------
jedimastert
I've started to get really interested in lower-budget and/or easier motion
tracking for special effects. I've been looking at optical motion tracking
with a multi camera setup, and optical facial tracking. With the right math
and assumptions, you could capture a full performance with little to no
specialized equipment. I've been wondering if ML could output enough detail to
make it feasible.

~~~
soylentgraham
With the right models, and gluing it all together yeah.

For some commercials we've dropped xsens suits for openpose. Facial capture
from afar needs too much in the way of exaggerated movement, but for mouth
capture, audio processing gave more pleasing results. 3D models still aren't
here but for cameras that are staying basically in one plane it's good.

Used open pose initially to correct capture suit drift live, but with some
math (I'm a game & computer vision dev) translating to 3D was pretty good. As
always, you just have to fix outliers.

------
hirundo
> We are excited to see what you can build with it!

The killer app is typing. Qwerty would be nice for a transition, but someone
please invent a gesture "keyboard" more optimal for a free floating hand.
Because of the lack of feedback I imagine it couldn't be as good as an actual
keyboard. But it could be brilliant as an away from keyboard keyboard.

~~~
h2odragon
There's still ergonomics. Say we teach computers to recognize casual ASL
(which is a big job for people learning ASL, but whatever)... You're not going
to be able to spend the time using that input method you could using a
keyboard, because of simple fatigue.

------
debrice
As a engineer I can see how truly amazing feat that is. As a human being I'm
staggered that it took so much effort into AI and machine learning to do so
little.

~~~
visarga
> so little

Nature took 85 million years to perfect the hand, and dexterous use takes 1-2
years of training for babies. Interpreting the hands of other takes longer.

~~~
undoware
Try _drawing_ hands. (If you're a non-artist like myself.) It's impressively
hard --- they are complex artefacts we can't quite see clearly because we are
so used to them. There are a lot of, as it were, polygons.

If, after staring at these things for several decades, I still can't draw them
with my eyes closed, I will assume that an AI would not find it easy to think
about them either.

~~~
debrice
I think it’s the opposite problem. I can see an AI being able to very easily
draw what it sees but struggle at interpreting it. When you learn to draw, one
of the first step is relearning to see without interpretation. A better
example is example is how kids draw stick figures so easily and how hard it is
for computer to do so: for decades we had to put reflector interpreted as dots
used to render vector... use to render stick figures

~~~
undoware
What a great point, thanks!

------
MentallyRetired
Cant wait for the pixel 4. No doubt this will be put to use.

~~~
haberman
There is this, from July:
[https://www.youtube.com/watch?v=KnRbXWojW7c](https://www.youtube.com/watch?v=KnRbXWojW7c)

------
thorum
I'd like to see this kind of technology employed in VR headsets to allow for
more natural interaction using my hands and fingers instead of controllers.

~~~
TheRealSteel
You can buy a LeapMotion and stick it to the front of Rift right now. I've
done it, it's amazingly immersive and feels fantastic.

~~~
gfodor
I really wonder how this compares to the LeapMotion tracking. My suspicion is
that the leap tracking is now hardened by years of real world experience, so
it's probably ahead of anything that's still in the R&D stage. But hard to
know without testing it.

~~~
tootie
This article doesn't actual mention anywhere but it implies it's doing all
this with a regular camera. LeapMotion and others use more complex sensors.
The ML approach is really impressive but getting clearer input would seem to
be a more reliable approach.

~~~
joshspankit
The LeapMotion actually use ultrasonics for measurements and then guesses to
translate that to hand-tracking. Doing it fully from a camera may actually
improve on it if done right.

~~~
gfodor
That’s the magic leap. The original leap motion uses stereoscopic infrared
cameras afaik.

~~~
joshspankit
Nope, you’re totally right. Don’t know where I got the idea that the little
Leap Motion was ultrasonic. Apologies to the readers.

------
awinter-py
pose detection is a mixed bag because it encourages always-watching devices
(like how alexa is the killer app for always listening mics)

That said, incredibly useful for interacting w/ technology in physical space.
Could imagine this doing really well for handheld drone landings or hybrid
human / robot factories.

~~~
jsilence
Could be activated by Bluetooth based presence detection, leading to face
recognition to verify it is the right user, and then activate gesture
detection.

------
jhull
Thinking of all the baseball applications here: catcher signals, third base
coach, head coach etc.

~~~
mkl
Why would it be useful to have technology detect signals? Aren't people
already going to be doing it?

Genuine question, I don't know how baseball works.

~~~
ereyes01
Hand signals are commonly used in baseball to surreptitiously communicate
intent to teammates without giving away your strategy to opponents. I think
what GP was getting at is that this technology could be used to automate the
reading of hand signals. I'm not sure it would be effective, as the pro
baseball players are already quite sophisticated at both reading and
obfuscating hand signals, at least at the highest levels of the game.

~~~
TOMDM
I'm not sure if the complexity of the codes is significantly higher in the pro
teams, but I found this video on baseball code decoding pretty fun.

[https://youtu.be/PmlRbfSavbI](https://youtu.be/PmlRbfSavbI)

Combined with this, it certainly seems like there's potential for a fully
automated pipeline

------
Multicomp
This is amazing! Google could now build a chording keyboard, without the extra
device needed.

------
lawik
First skim of the title "what are they tracking now? Wow, very clear language
on the tracking part of the business for googleblog.com link. Oh.. Hand
tracking"

But this is very neat.

------
krilly
Wasn't this posted here about 2 days ago?

~~~
scribu
11 days ago, but with no comments:
[https://news.ycombinator.com/item?id=20739a577](https://news.ycombinator.com/item?id=20739a577)

~~~
mkl
Also 11 days ago, with comments:
[https://news.ycombinator.com/item?id=20743575](https://news.ycombinator.com/item?id=20743575)

------
auslander
realtime _hard_ tracking tech. Typo.

------
userbinator
Given the company behind it, I can't get out of my head the thought that this
will find a lot of _other_ applications...

 _Please give a thumbs-up to acknowledge your engagement with this ad_

 _Sorry, the middle finger is not acceptable. Please give a thumbs-up._

~~~
40four
Yeah, I hate to be cynical, but I'm not buying the stated use cases as the
main motivator here for Google. It's cool they are releasing it. Interested to
see what other folks come up with.

I wonder if this could be used to identity people based on hand movements
alone? Like some sort of movement 'finger print' or something.

I've got to imagine we all have somewhat different paterns of moving our
hands. Is it possible AI could be trained to study existing footage of a
person, and identify them this way? Maybe akin to facial recognition, but hand
movements instead?

Or maybe they are all too similar to be able to tell one person from another.
Hell if I know, but interesting to think about.

~~~
roywiggins
Hand identification has been used to convict at least one person:

[https://www.bbc.co.uk/news/av/stories-45190746/how-a-
paedoph...](https://www.bbc.co.uk/news/av/stories-45190746/how-a-paedophile-s-
hands-led-to-his-conviction)

