
An algorithm that recreates 3D objects from tiny 2D images - janober
https://techcrunch.com/2017/08/23/this-algorithm-cleverly-recreates-3d-objects-from-tiny-2d-images
======
edanm
This article is confusingly written. Its explanations sound silly to anyone
even a little familiar with 3d, since it spends the first half of the article
explaining that this "breakthrough" is "computationally clever and forehead-
slappingly simple", the breakthrough being that you can represent things as
surface-models instead of voxels.

Well, surface rendering is how almost all 3d work is done already, that's
definitely not a breakthrough. You can probably spend an entire career never
dealing with voxels.

What this article never mentions (surprisingly) is that this paper is about
neural networks. I'm not an expert, but as I understand the article, voxel
representations have been the standard _specifically when building neural
networks_ to turn 2d images into 3d. The main idea is that you can build a
network that only renders high-resolution voxels when close to the surface of
the model, and renders very low-resolution voxels everywhere else (say, on the
inside of the model). This means you can both represent much larger models
memory-wise, but also that you're not having to run the NN computations on
voxels that are not likely to change, since everything "inside" of the model
you probably guessed correctly pretty quickly.

Here's the paper's abstract, much better at explaining itself than the
article:

"Recently, Convolutional Neural Networks have shown promising results for 3D
geometry prediction. They can make predictions from very little input data
such as for example a single color image, depth map or a partial 3D volume. A
major limitation of such approaches is that they only predict a coarse
resolution voxel grid, which does not capture the surface of the objects well.
We propose a general framework, called hierarchical surface prediction (HSP),
which facilitates prediction of high resolution voxel grids. The main insight
is that it is sufficient to predict high resolution voxels around the
predicted surfaces. The exterior and interior of the objects can be
represented with coarse resolution voxels. This allows us to predict
significantly higher resolution voxel grids around the surface, from which
triangle meshes can be extracted. Our approach is general and not dependent on
a specific input type. In our experiments we show results for geometry
prediction from color images, depth images and shape completion from partial
voxel grids. Our analysis shows that the network is able to predict the
surface more accurately than a low resolution prediction."

~~~
xg15
I was stumped by that as well. It spends half the text explaining the well-
known parts (surface models) and then mentions the actual contribution
completely in passing:

> _So first his system renders a 3D reconstruction of the 2D image in very low
> resolution [...] Next, do a higher-resolution render of the area you kept._

Where apparently the author seems to think _producing a 3D reconstruction of a
2D object_ is trivial even though that's what the paper is about.

~~~
averagewall
That's not what the paper is about. The abstract quoted by the GP shows that
it has been done before. Probably only very recently so it's going to be a
surprise to a lot of readers, but perhaps old news measured in machine
learning years.

~~~
dahauns
It was actually astonishing for me how old a lot of the machine learning
methods and algorithms are - quite often stuff I worked with on university
almost 20 years ago. The main difference being that you have more computation
power and memory by orders of magnitude. You can throw datasets at the NN that
would have been prohibitive in size back then. Stuff I had to reserve time for
on the uni "super-"computer probably runs on your phone nowadays.

Oh my god, I sound like my father.

------
pasta
Well I'm sorry but I think more is going on than the article describes.

For example the red pickup truck. There is no way the algorithm could create
that model from that image only (depth of the trunk).

So my guess is that they use the tiny picture to search a database for similar
pictures and then create a model with all that data.

~~~
blauditore
The linked paper talks about CNNs in its abstract, so those were probably
trained with such samples and thus have that knowledge baked in.

------
13of40
I kind of got lost on the first sentence: "With a lifetime of observing the
world informing our perceptions, we’re all pretty good at inferring the
overall shape of something we only see from the side, or for a brief moment.
Computers, however, are just plain bad at it."

I'm good at inferring the shape of something from a side view well enough that
I can use it in some kludgy mental models, but down the page they're
essentially asking the computer to look at something from the side and render
a precise(ish) 3-D mechanical model of the object from memory. For example, I
have what I think is a pretty good idea of what a Boeing 737 looks like, but
if you asked me to draw it on a piece of paper it would look like a kinder
gardener did it. What I'm good at is boiling down features and distinguishing
a 737 from an Oscar Meyer wiener truck. Drawing a scale-accurate picture of it
is a job for artists and savants.

~~~
khedoros1
The computer's a savant at precise drawing, but the visualization isn't the
interesting part. The interesting part is having the computer look at a 2D
picture and come up with a "kludgy mental model" of the 3D shape represented
by the picture. From the example images, it infers information more accurately
than I would've expected.

~~~
jackhack
I can't help but wonder what would result from an Escher drawing.

------
fischerq
As lots of people already pointed out: there is more going on than just a
smart upsampling technique.

See section 4 of the paper - they train on a set of 3d models from some
'ShapeNetCore' dataset, from which they generate sample inputs (renderings of
the model with randomized viewpoint and lighting) and corresponding target
outputs (voxelized model).

They train specialized networks for different classes of objects 'aeroplanes,
chairs and car', so reconstructing on all classes at the same time probably
still has some issues.

An interesting point about their use of this coarse-to-fine progression that
the article omits: they do the same trick for training their net - first train
to predict the coarse voxels, and when those work start predicting the next
level.

------
Stanleyc23
This paper from a while back acomplishes this too.
[https://github.com/chrischoy/3D-R2N2](https://github.com/chrischoy/3D-R2N2)

~~~
runesoerensen
Thought of that paper too but didn't remember the name, thanks for sharing!
The new paper also reference 3D-R2N2 several times and seems heavily
inspired/adapted from it.

------
ricardobeat
Very interesting tech, though the article makes a piss poor job of explaining
anything about how it works. Rendering surfaces is how literally 99.9% of 3D
graphics work, not a "breakthrough" of any sorts.

~~~
khedoros1
99.9% of 3D graphics: "Computer, here's a definition of some 3D surfaces and
their properties. Produce a 2D projection."

This: "Computer, here's a 2D projection. Infer the likely 3D data that would
produce the projection."

~~~
ricardobeat
The comparison the article makes is against rendering a full voxel volume, as
if occlusion was a great insight.

------
AliAdams
I'm looking at the second example from the image in the article - the blue
plane - and can't work out how the algorithm could possibly infer a second
wing from that picture.

~~~
johndough
Presumably the neural network was trained on a dataset where all planes had
two wings, so it will predict planes with two wings.

~~~
TuringTest
Yeah, there's definitely some previous knowledge of the kinds of objects it's
inferring, which is used to deduce the parts.

Look at the chairs' legs, where it's transforming the flat base of the
rotating chair into something that looks as wheels, and where it's completely
missing the bars connecting the legs in the second tall chair.

------
AndrewKemendo
I'm very excited about the potential for extracting voxel specific or
procedural generated 3D objects from a low number of 2D images - primarily
with a combination of Semantic Segmentation and some form of MVS. The results
of papers like this and other similar ones are great progress.

~~~
Aron
Seems like Google Earth is doing this yeah?

~~~
dharma1
for the 3D Google Earth views? It's photogrammetry from a 5 camera rig on a
low flying plane

[https://youtu.be/suo_aUTUpps?t=2m53s](https://youtu.be/suo_aUTUpps?t=2m53s)

~~~
Aron
Alright pretty old school. As an aside, one of the first things I did with my
Oculus was pull up Google Earth and lay down in Yosemite valley which is
feature around 5 minutes in this vid. Cheers.

------
ealloc
"Single Particle Reconstruction" for Cryo-EM reconstruction algorithm, anyone?

SPR is an algorithm already used to reconstruct 3d objects from a set of 2d
images produced by an electron microscope, typically of a protein on a flat
surface.

I haven't read the paper here, but I suspect a key part of the algorithm is
that it can only reconstruct objects with symmetry planes: The airplane,
chair, car are all symmetric across an axis. This greatly constrains the
possibilities the algorithm has to search through. In em-reconstruction the
user often specifies what they think the symmetries are beforehand.

------
EamonnMR
Could be interesting to see it run on old game sprites. The Starcraft
Remastered team could have used this, for example.

~~~
figgis
How would copyright and licenses apply to something like this?

~~~
jackhack
presumably covered under the common "derivative works" clause

------
vbuwivbiu
want to see results from Necker cube

