
Deep3D: Automatic 2D-to-3D Video Conversion with CNNs - stared
http://dmlc.ml/mxnet/2016/04/04/deep3d-automatic-2d-to-3d-conversion-with-CNN.html
======
Animats
AI-based de-lamination algorithms, which this seems to be, are useful. You'd
like to be able to take video frames and break them down into layers which
reflect what occludes what. This is not necessarily depth; a single object
that recedes into the distance, such as a railroad rail, belongs in one layer.
It's relative depth; what's in front of what.

Systems for this should be trained on data with lots of things moving in
different directions, occluding each other. Football games, for example.
Pedestrians in busy intersections.

This sort of things is a good front end for many image understanding
applications. You can also do frameless video compression. Framefree did this
a few years ago, but the de-lamination front end of the compressor needed more
work. We have more CPU power now; it's time to revisit that.

------
wutf
This is a negative result :) (the method doesn't work)

------
rasz_pl
more like automatic blurry copy of original image :( All the examples are
single frames from movies, so why bother when you can extract perfect 3d using
multitude of SFM methods (PTAM/SVO/etc) using whole scenes?

~~~
ericjang
I think the authors chose the wrong narrative to discuss in their motivation
for the paper. Anyway, it's easier to just shoot a movie / VR content with
multiple cameras than go through frame-by-frame to make sure the AI isn't
introducing blurry regions etc.

However, being able to extract 3D scene features from a still 2D image is a
super useful feature to have. Perhaps in the near future, inference models for
image-based tasks (i.e. ImageNet, robotics) will make use of 3D features
learned during end-to-end training.

The disparity predictions generated here remind me of spatial transformer
layers [http://arxiv.org/abs/1506.02025](http://arxiv.org/abs/1506.02025)
which Deepmind is using pretty heavily in their sequential attention models. I
wonder if they would offer an improvement over the deconv layers.

------
toisanji
you can test the model here:
[http://www.somatic.io/models/oEG0wMkR](http://www.somatic.io/models/oEG0wMkR)
the results seem kind of blurry, I hope there is more room to improve this
model

------
mrfusion
Our brains do this very well so there must be some algorithm out there for
this.

~~~
losteric
Well we have two eyes, and each eye provides some depth information (in
addition to depth derived from the two perspectives). It's really a completely
different problem (Simplified 3d representation of the 3d world).

~~~
mrfusion
So you can't tell someone how to navigate a room from a photograph?

