
Learning to Estimate 3D Hand Pose from Single RGB Images - hellomichibye
https://lmb.informatik.uni-freiburg.de/projects/hand3d/
======
kefka
I'm glad they're making inroads here, cause gesture and pose calculation and
tracking is awesome.

But.. it also seems like more usage of black-box magic math of Neural
Networking. I'm glad it gets results and all, but it just seems.. inelegant.
What's the algo really doing underneath? Did the algo figure out the joints
and their ranges of freedom?

The results are spectacular, indeed. But it also seems a bit not like science.
It seems more like an oracle system was able to deduce the results - and we're
nowhere near closer to understanding how to do this. Just, we have a trained
system that can.

~~~
return0
Maybe the problem itself is not elegant (hence why it's not solved in closed
form)? Neural networks learn thousands of different algorithms in parallel.
It's probably not a single idea behind it, it may be thousands. It's more like
experimental science: the system is there, it works, someone else needs to
analyze it to figure out how it does what it does.

~~~
kefka
That's my point. Anybody can shove in gigs of data, clean it up some, provide
easily digestible samples, make a whole lot of parameters (or let the system
decide), and scrape the data out.

How is it working? What criteria does it work and fail? Does it work for black
people? Does it work for women's hands? (Or does it work for anyone out of
university?) does it handle people with hand defects or missing digits?

That's right... We don't know unless we test this. And only by adding more
data can we even determine those questions. And we haves no clue how its
working, what features its using, or anything. Just, that it does work. And it
doesn't for conditions were unsure of.

Now this is a great starting plank for determining the underlying math. But
even the cost of compute seems high for what it could ideally be, if we
understood what was going on.

------
iaw
I think this post would be a lot more informative with the original title
"Learning to Estimate 3D Hand Pose from Single RGB Images"

The results are astounding for anyone that's tried to do similar work in the
past. I'm so pleased that they made the code available.

~~~
sctb
Agreed, thanks! We've updated the title from “A 2D Image of your hand and
TensorFlow tell you everything”.

------
hypertexthero
Interesting.

I just finished reading a book called The Hand, by Frank R. Wilson. The author
tells how the human hand has shaped our evolution and suggests, among other
things, that the development of language came from our hands.

A bit heavy at times, but nevertheless warmly recommended.

[http://www.penguinrandomhouse.com/books/191866/the-hand-
by-f...](http://www.penguinrandomhouse.com/books/191866/the-hand-by-frank-
wilson/9780679740476/)

------
mrfusion
The vive already has a camera... could this be applied there?

What are some other applications?

------
revelation
Okay, but depth cameras exist, full stop, and they obviate the need for an
entire GPU worth of processing. It's why Uber and Waymo are sinking billions
into cheap, solid-state LIDAR. Actual measured sensor data is just so much
better.

~~~
genericpseudo
The challenge with LIDAR (for the foreseeable future) is ubiquity. If you're
building a car, then sure, LIDAR, no questions asked. If you're trying to pair
this with say a structure-from-motion pipeline for photogrammetry _targeting
mobile phones as capture devices_ then this becomes very very important.

The best data for many applications is often the data you can get at scale.

------
jvanderbot
Here's something from six years ago ish with similar results (faster)
[https://www.youtube.com/watch?v=qok636pe_qw](https://www.youtube.com/watch?v=qok636pe_qw)

------
wyldfire
The current title "A 2D Image of your hand and TensorFlow tell you everything"
is a little head-scratching. "2D image" as opposed to "stereo" 2D images (or
depth-mapped image?). And "tell you everything" makes it sound like some kind
of fortune teller.

------
jordache
I don't think this is all that amazing.

Human fingers only have a very narrow range of motion in a single direction.
When the 2D profile of the hand changes, the set of finger movements to arrive
at that profile is fairly predictable.

~~~
KineticLensman
> "Human fingers only have a very narrow range of motion in a single
> direction"

Each finger has three joints that can bend in and a little bit back. Each
finger can move from side to side. There is a small amount of rotation around
the base of the finger. The thumb has three joints, the bottom of which is
opposable to create a gripping hand. The entire hand can bend, yaw and rotate
with respect to the arm, and there is further flexibility in the bones of the
palm.

If you doubt how complicated the hand is, download a free 3D figure posing
system like Daz Studio [1], instantiate a figure, and then try to make a hand
grip a small object, or gesture (e.g. v-sign, 'ok', hitchhiking, vulcan
greeting).

[1] [https://www.daz3d.com/](https://www.daz3d.com/)

~~~
jordache
A lot of the variables you listed is not relevant. The solution linked here is
only deriving the 3d positions of the fingers, none of the other nuances that
can be expressed by the hand's muscular and skeletal system.

