
Beyond the pixel plane: sensing and learning in 3D - gtmtg
https://thegradient.pub/beyond-the-pixel-plane-sensing-and-learning-in-3d/
======
JeremyHerrman
This is an exciting time for those of us working on computational geometry to
better understand 3D shapes across many industries.

In addition to the architectures mentioned in this great overview, I'm excited
to see progress on spectral and geodesic CNNs for graphs and manifolds. Check
out this other fantastic source for info on 3D ML:
[http://geometricdeeplearning.com](http://geometricdeeplearning.com)

------
jeffchuber
This is a great overview. Also checkout CS 468 from Stanford,
[http://graphics.stanford.edu/courses/cs468-17-spring/](http://graphics.stanford.edu/courses/cs468-17-spring/)
"Machine Learning for 3D Data"

Also, if you want to work on this stuff full time-
[https://news.ycombinator.com/item?id=17649726](https://news.ycombinator.com/item?id=17649726)

~~~
bitL
Super cool course! Thanks for the link!

Do you know what are the most precise programmable RGB-D cameras a non-
professional can buy? I was trying to extract 3D information just from a
single camera via 3D convolutions and RNNs (for a self-driving car project)
and would like to play with real 3D a bit as well.

~~~
gtmtg
(I wrote the original article)

I've been playing around with a few and I'd recommend the Orbbec Astra and the
Intel RealSense (the new D435 is what I've been using) as decent but cheap
cameras if you want to get started! The Asus Xtion PRO LIVE is also quite good
but since it's been discontinued it's pretty hard to find.

The Stereolabs ZED relies on stereo vision but produces a similar output as
traditional RGB-D cameras, and I've heard good things about it as well!

~~~
jcims
Any idea if the iPhone X surfaces RGBD from the TrueDepth camera?

~~~
gtmtg
Yes! Haven’t had a chance to play around with it but I’ve been wanting to. See
AVDepthData: docs at
[https://developer.apple.com/documentation/avfoundation/avdep...](https://developer.apple.com/documentation/avfoundation/avdepthdata)
and reference implementation for streaming depth at
[https://developer.apple.com/documentation/avfoundation/camer...](https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture/streaming_depth_data_from_the_truedepth_camera)

------
glalonde
What about pose estimation? e.g. Given a well defined coordinate system, like
the origin is the nose on a face, determine the pose of the face. Is this
still best done with classic optimization formulations like ransac/ICP and a
supplied model, or have these been bested by learned models somehow?

~~~
gtmtg
Don't think it's exactly what you're talking about (I'm sure there are other
works much closer to what you have in mind, just can't recall off the top of
my head) — but you might find PoseNet ([https://www.cv-
foundation.org/openaccess/content_iccv_2015/p...](https://www.cv-
foundation.org/openaccess/content_iccv_2015/papers/Kendall_PoseNet_A_Convolutional_ICCV_2015_paper.pdf))
interesting. Not explicitly 3D, but estimates where in a large-scale scene a
picture was taken using an end-to-end convolutional network.

With that said, I think there's still a ton of merit in classical geometric
approaches like ICP — there's a real, geometric basis to why they work.
Convolutional networks can demonstrate some pretty amazing results, but
they're still mostly "black boxes" to us, and a consequence of this is that
it's hard to understand why they work and predict when they'll fail. This blog
post (by the PoseNet author, actually) articulates the viewpoint well:
[https://alexgkendall.com/computer_vision/have_we_forgotten_a...](https://alexgkendall.com/computer_vision/have_we_forgotten_about_geometry_in_computer_vision).
One recent research direction that I personally find really fascinating is
designing deep learning architectures around real geometric properties, e.g.
as in Skydio's deep stereo work:
[https://arxiv.org/pdf/1703.04309.pdf](https://arxiv.org/pdf/1703.04309.pdf)

~~~
mncharity
PoseNet on Tensorflow.js does nice head tracking. One can get rough head pose
from nose/eyes/ears. but it's crufty.

[1] Web-browser demo: [https://storage.googleapis.com/tfjs-
models/demos/posenet/cam...](https://storage.googleapis.com/tfjs-
models/demos/posenet/camera.html) [2] Github:
[https://github.com/tensorflow/tfjs-
models/tree/master/posene...](https://github.com/tensorflow/tfjs-
models/tree/master/posenet)

------
ajmarcic
Has any progress been made towards single view 2D -> 3D inference?

~~~
andreyk
yes, a ton! I think the latest exciting work is pixel2mesh:
[https://arxiv.org/abs/1804.01654](https://arxiv.org/abs/1804.01654) ; can
follow citations in their for other relevant recent work.

