
Recent advances in 3D content understanding - jimarcey
https://ai.facebook.com/blog/pushing-state-of-the-art-in-3d-content-understanding/
======
jcims
Kinda boggles my mind that there hasn't been a stronger push for hardware
support of 3D imaging in phones. It clearly provides more useful information
for analysis. Even when we look at an image in a photo we map it to a 3D
projection in our mind.

~~~
BubRoss
Phones with multiple camera are starting to become common. Time of flight
sensors that use projected infrared patterns aren't trivial to out in a phone
so the demand needs to be there. Still, there have been tablets that have
integrated Intel's depth cameras.

~~~
taneq
Nit: Projected patterns are structured light, not time-of-flight. As far as
I’m aware (would love to be wrong!) you can’t do ToF with a traditional CCD or
CMOS sensor and resolution is invariably woeful.

~~~
VikingCoder
What kind of sensors do they use in time-of-flight? Because I think they do
use gated CCD or CMOS...?

~~~
taneq
Hm, last time I checked I thought the sensors were individual diodes but it
looks like you're right, and 'flash LIDAR' cameras do use some funky kind of
CCD. Still relatively low resolution but good to know, thanks!

------
zaroth
The videos showing the algorithm in practice are really nice demos.

I’m curious how big of a step forward this is from the previous state of the
art, and at what computational cost.

Also curious if the technique scales well with multiple cameras with
overlapping fields of view. That is to say, I assume accuracy can be increase
through sensor fusion in the basic sense of averaging errors, but actually
molding a cohesive 3D view of a 360° environment and understanding that an
object at the end of one frame is the same object from a different perspective
at the end of another camera frame.

Obviously this seems like it should be extremely useful for AutoPilot.
Compared to the relative inaccuracy of the positional information of adjacent
cars on the AutoPilot guidance display that we have today this seems like a
big step forward.

I think it’s interesting how the RNN is identifying specific types of objects
and then depth mapping them. I assume it can’t just depth map the whole image
without that first classification step? I’m thinking like for the Smart Summon
application where depth mapping everything around you is pretty crucial and
obviously not entirely working at this point.

------
Darkphibre
I do photogrammetry as a hobby, and would _love_ to see more RGBD cameras on
the market. I've even considered hacking together my own... anyone have
pointers for cost-conscious options?

A few of the models I've shrunk down and posted:
[https://sketchfab.com/darkphibre](https://sketchfab.com/darkphibre)

~~~
throwaway6734
The new Azure Kinect is really impressive.

I've used the og Kinect, Kinect v2, and intel realsense d435 and it was much
more accurate then all of those.

~~~
ipsum2
Sorry for derailing the thread, but how was the new Azure Kinect's performance
outdoors? I've had trouble finding this information anywhere.

~~~
throwaway6734
np, I have yet to test it outdoors.

------
nojvek
Very impressive research. Just scared what invasive thing Facebook will use it
for.

I hope the researchers are advocating for its ethical use.

------
mrfusion
It really seems like we should be close to having robots that can navigate a
room.

~~~
throwaway6734
You can get pretty far building one yourself using the out of the box ROS
([https://www.ros.org/](https://www.ros.org/)) navigation stack with a kinect,
off the shelf motor controller, and something like an nvidia jetson

