
3-D Depth Reconstruction from a Single Still Image (2007) [pdf] - Jasamba
http://www.cs.cornell.edu/~asaxena/learningdepth/ijcv_monocular3dreconstruction.pdf
======
GrantS
This is some very cool research, though many may be surprised to learn that it
is an IJCV journal article from 2007, based on a conference paper from NIPS
2005.

Source:
[http://www.cs.cornell.edu/%7Easaxena/learningdepth/](http://www.cs.cornell.edu/%7Easaxena/learningdepth/)

~~~
TheArcane
Gah! I thought this was cutting edge. Is this research used in any recent
works/research?

~~~
Xcelerate
Definitely. It's been cited 658 times since it was published:

[https://scholar.google.com/scholar?oi=bibs&hl=en&cites=18064...](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=18064180945750989807,18259224307089260510)

------
Xcelerate
As an aside, I really enjoy it when people post articles like this on HN.
Technical, but not too niche, and relevant to today's interests in technology.

------
jvhaarst
Video is a nice watch, especially as this is from 2007, so the state of the
art should be even better.
[https://www.youtube.com/watch?v=UZ7_ED9g4FY](https://www.youtube.com/watch?v=UZ7_ED9g4FY)

~~~
pen2l
Google Tango ([https://www.google.com/atap/project-
tango/](https://www.google.com/atap/project-tango/)) is able to do pretty
impressive 3d reconstruction from what I hear. However it has a number of
cameras, whilst linked technique can work with just one image (with enough
training).

~~~
leeoniya
reconstructing depth from 2+ offset images is vastly simpler than from a
single frame.

~~~
pen2l
Yup, but I think there's a desire to get good 3d-reconstruction techniques for
single images because in practice you often want to do reconstruction in
extremely small environments where using multiple cameras is sometimes not
feasible for various reasons.

~~~
leeoniya
Google Camera can already do this from a single camera. It takes a bunch of
photos as you move the phone around a bit and reconstructs the depth from
multiple shots.

[http://googleresearch.blogspot.com/2014/04/lens-blur-in-
new-...](http://googleresearch.blogspot.com/2014/04/lens-blur-in-new-google-
camera-app.html)

------
pen2l
I wish these groups would make their code available so others could play with
it and test it out. Some do, but I wish more did.

I would be curious to see the performance of this 3d depth reconstruction
technique for non-rigid environments.

~~~
rboyd
Yeah! Having spent much of the last year plowing through computer vision
papers, it's pretty rare to encounter published code or even datasets.

Why is this? Do people spend all their time polishing prose and neglecting the
code? Or maybe they keep it closed for commercial intentions? Or is it just a
cultural thing in all of academia?

~~~
Xcelerate
Speaking from experience, the code is normally functional but unsightly, thus
it would be embarrassing to place it online. Most researchers are happy to
email you the code if you ask for it though.

Most of my code is for scientific computing and consists of a bunch of one-off
Jupyter notebooks and Julia scripts that are written with two objectives in
mind: get-the-task-done-before-my-advisor-meeting and make-it-as-fast-as-
possible. Good coding practices and code cleanliness are not priorities. And
particularly for code that needs to run on HPC clusters or supercomputers,
it's really hard to make it look nice — one could easily spend weeks designing
a proper, modular system only to throw it all away in favor of using a better
algorithm that was just published. Anecdotally, the vast majority of my code
has produced results that were later discarded.

For example, I have a function named _YPXt_. What does such a woefully named
subroutine do? It multiplies a matrix Y by a permutation matrix P, and then
multiplies that by another (transposed) matrix X. The matrices are small, so
the overhead of using BLAS for the task is wasteful. It's better to use
Julia's macro system to generate the necessary for loops on the fly and let
the compiler SIMD-ify the result. Also, permutation matrix multiplies can be
done in a way that doesn't actually involve creating a full permutation
matrix. And of course the subroutine has side effects — in order to conserve
memory I need to overwrite the input matrices instead of allocating new ones.
Immutable data structures may work great for functional programming, but the
overhead of red-black trees and stateless programming is too high for
numerical data crunching.

Add that to the fact that a lot of scientists in academia are unfamiliar with
tools like Git, and so manually versioning the code and emailing it back and
forth is unfortunately a necessary part of the collaboration process.

~~~
dr_zoidberg
I completely agree with you and I can relate with your "objectives in mind".
But you really need to push to get others using Git.

GitHub's GUI client is very easy to use and with a bit of nagging around you
can get even people who don't understand _what_ VCS are to use it. The one
time they ruin the code and get the chance to easily go back to how things
were before they'll see its value and endlessly thank you for it.

------
AndrewKemendo
Always love seeing computer vision related stuff posted.

Incidentally, we have improved on these techniques (given that this came out
almost a decade ago!) for scale reconstruction with our mobile monocular SLAM
system.

The authors are now killing it worldwide (Obviously for Ng) working on
applications.

------
jmcmahon443
tl:dr; Can combine monocular NN CV techniques with stereo techniques for good,
cheap results. AKA: The future of SLAM.

~~~
dheera
I do not think this kind of feature-based estimation of depth will be the
future of SLAM. Point this algorithm at a scale model or a framed photograph
and everything will go haywire. It's great for scene understanding, in which
you point it at a photograph or artwork and you want it to understand what's
going on in the photograph. But not for mission-critical mapping and
navigation.

Depth cameras, which are getting cheaper, will be the future of SLAM. SLAM
algorithms for depth cameras are also a _lot_ simpler to write. And with depth
cameras, you're not estimating how far away objects are, you're actually
measuring their distance. Data beats estimation.

~~~
jmcmahon443
Depth/stereo cameras are the future, I agree. But the conclusion of this paper
says that you can cascade the results of this algorithm with the stereo
algorithm fairly easily.

