
3D photographic inpainting from a single source image - dsr12
https://shihmengli.github.io/3D-Photo-Inpainting/
======
echelon
Just take a look at the historical photos results! That's one of the coolest
things I've seen in well over a year. I've come across most of these photos
before in textbooks and on the web, but this demo makes all of those
historical figures and moments in time feel as real as if they were happening
in the world today. It's much more pronounced than colorizing black and white
photos. I feel _connected_.

I'm convinced that the folks that present at SIGGRAPH are capable of nothing
short of black magic wizardry. I can understand the technology being used, but
my visual cortex only sees the impossible becoming real.

This is truly deserving of the word _awe_ some, because I am in awe.

~~~
fxtentacle
Problem is, this paper won't be able to produce those results from historical
photos. I find them quite misleading, to be honest.

You'll need to either use a separate 3D depth estimation AI, or (more likely)
have someone do a manual stereoscopic 3D conversion of your historical image.
Only then (when you have depth data) can the algorithm presented in this paper
start its work.

~~~
johndough
> You'll need to either use a separate 3D depth estimation AI

They seem to be using MiDaS [https://github.com/intel-
isl/MiDaS](https://github.com/intel-isl/MiDaS) for depth estimation which does
a reasonable job on some random image pulled from pixabay
[https://i.imgur.com/IfbeaqY.jpg](https://i.imgur.com/IfbeaqY.jpg)

~~~
fxtentacle
That looks good, indeed :)

~~~
fxtentacle
In the meantime, I tried our Midas on other images and it failed most of the
time. Really only works for those stock-is pictures with clear foreground and
background and I believe it mainly just detects bokeh blur as opposed to
actually understanding the scene.

~~~
jbhuang0604
Yes, you are right. Existing single-image depth estimation models are not far
from perfect. We hope to see future development in that direction to further
improve the visual quality of 3D photos.

------
ladberg
If you like this then you'll love neural radiance fields:
[http://www.matthewtancik.com/nerf](http://www.matthewtancik.com/nerf)

~~~
IshKebab
Very cool! Not quite the same because they use 20-50 input images and this
uses 1. But it looks like it might be really cool for photogrammetry. The
traditional photogrammetry software sucks.

Very cool application using Matterport I saw recently is to scan AirB&Bs, e.g.
[https://breconretreat.co.uk/accommodation/swn-y-
nant/floor-p...](https://breconretreat.co.uk/accommodation/swn-y-nant/floor-
plan/#content)

------
peteforde
I wonder if this can be tweaked to generate the 45-perspective quilt files
required to feed this into a Looking Glass.
[https://lookingglassfactory.com/](https://lookingglassfactory.com/)

~~~
mrfusion
Wow how does that work?

~~~
modeless
It's the same principle as those plastic toys with printed 3D images that
you've surely seen:
[https://www.youtube.com/watch?v=jIfAi_zJ2F4](https://www.youtube.com/watch?v=jIfAi_zJ2F4)

~~~
peteforde
Yeah, just like that.

Except that there's 45 distinct planes and it directly interfaces with Unity,
Unreal and ThreeJS in minutes.

Sorry if I'm reading too deeply into your note. There are just so many haters.
Meanwhile, I backed these guys on Kickstarter, have had a unit on my desk for
18 months and think it's one of the most incredible things I've ever had to
experiment with... and it cost me under $500.

~~~
mrfusion
Can you explain it more? Or is there something I can read up on? It seems like
quite a breakthrough.

~~~
peteforde
Can you be a bit more specific?

Are you asking for more information on
[https://lookingglassfactory.com/](https://lookingglassfactory.com/) or
[https://www.youtube.com/results?search_query=lenticular](https://www.youtube.com/results?search_query=lenticular)
?

~~~
fxtentacle
It's a bit sad that their active resolution is so low, with 2560x1600 being
divided into 9x5 quilt frames. So that would mean 320x280 or so effective
resolution for the 8.9" display

------
anigbrowl
Outstanding work. I've been really impressed with the quality of recent
colorization, perspective, and texture reconstruction tools and think they do
a wonderful job of bringing historical or degraded images 'back to life.' I
wonder if we are headed for a future where many of these tools reside client-
side and image/video transmission and storage can be reduced to a lightweight
stream of vector data.

------
wokwokwok
Take a moment and have a deep breath before you get excited about full 3d
reconstruction with a single image.

That isn't what this is.

Watch the video in the bottom right corner, entitled "Comparison with State of
the art".

Now go and rewatch the examples and actually _look_ at the edges of the
objects as they move the camera 'a bit'. You'll see a tonne of artifacting.
Less, clearly, than the existing state of the art, so I tip my hat to the
efforts here.

...but all this is doing is generating an 'empty gap' generated by perspective
and then basically using the equivalent of photoshop's content aware fill to
fill that gap with plausible pixels.

Since humans don't really pay much attention to edge details, it's quite
plausible.

To quote the paper:

> In this work we present a new learning-based method that generates a 3D
> photo from an RGB-D input.

Ie. This work is taking a _depth image_ as _the input_ and working on that to
generate a 3d photo, rebuilding the full content of each _2d layer_ in the
image.

Ie. The output is not a 3d model, it is a series of 2D images at depth
intervals, where the occluded content in each layer is in-painted (ie.
generated artifically).

(NB. The 'from a single source image' work used here is not novel; they're
just using existing approaches to estimate a depth image)

~~~
saagarjha
> all

It might be "all" that it's doing, but it does it quite well and in a way that
is quite believable, which is significantly better than what came before,
which makes it almost realistic. That's what you said, but the way you said it
felt like it lessened the achievement.

~~~
wokwokwok
> which is significantly better than what came before

It's a bit better. That's the point I'm making; it's just incremental
improvement on existing process. Read the actual paper, eg. under
'Quantitative comparison'.

If you think I'm belittling the effort I'm sorry, that's not my intention;
...but, for example, the other comments talking about using it to generate a
full 3d model to display on a looking glass surface, or in VR displays a total
lack of understanding of what has been achieved here.

------
Robotbeat
Their example images are still 2D, but potentially very interesting for making
360 degree videos much more immersive in virtual reality. (May be some extra
steps to ensure consistent rendering among adjacent frames?).

------
jsilence
Would be awesome if this could be integrated into gallery software as a 3D Ken
Burns effect. The artificial camera would not have to move that much so that
the inevitable artifacts would be much less visible.

------
fxtentacle
The main issue that I see with this work is that they require RGBD data,
meaning you have to do a lidar scan or something similar to measure the depth
map. Alternatively, you could pay someone to draw it by hand, but that takes a
long time.

So basically, if you have impossible to get input data, this network can do
its magic.

What it then does is hallucinary inpainting, so something like the Photoshop
content aware fill. If there's a tree in your photo, this one will make up a
fake background behind it, so that you could move or remove the tree without
things looking weird.

~~~
yorwba
> So basically, if you have impossible to get input data, this network can do
> its magic.

Except they evidently got the input data for the examples in the paper, so it
can't be impossible to get.

They cite at least two different methods for adding depth information to a
single image to generate the necessary RGBD data, different views of which can
then be rendered with their inpainting applied:

\- [https://arxiv.org/abs/1907.01341](https://arxiv.org/abs/1907.01341)

\-
[https://research.cs.cornell.edu/megadepth/](https://research.cs.cornell.edu/megadepth/)

~~~
fxtentacle
Yes, you can always create that data by hand, but it's too expensive and,
hence, impossible to scale.

As for using other AIs, they tend to not work too well on more complex images.

But in any case, getting the data that you need to be able to use the paper
here is very challenging.

Edit: I should probably say that I have hands-on experience with MegaDepth and
Midas and that it was underwhelming. Both of them assume a Gradient from close
to far from the bottom to the top and both of them assume that optical
variation will be in the foreground. A photo of a dining table from the side
is already enough to confuse both of them.

------
ConradKilroy
This looks like Lytro
[https://en.wikipedia.org/wiki/Lytro](https://en.wikipedia.org/wiki/Lytro)

~~~
ladberg
Kind of an interesting tangent, but the founder of Lytro (Ren Ng) is the
advisor on a really cool paper that came out recently and reminds me a lot of
this: [http://www.matthewtancik.com/nerf](http://www.matthewtancik.com/nerf)

Basically, it's similar to this but also re-lights reflective parts at the
cost of needing more than one source image.

------
signaru
Now we just need an automated image stabilizer for those shaky videos...

Just kidding. This is awesome! Just imagine the possibilities with enough
computing power.

------
frfl
Anyone with ML development experience know what changes are needed to make
this work on a CPU without a CUDA GPU? Seems heavily coupled to CUDA.

~~~
johndough
Here is a quick hack to make it work on CPU: [https://github.com/983/3d-photo-
inpainting](https://github.com/983/3d-photo-inpainting)

Maxed out at 4 GB RAM for 256x144 images for me.

PyTorch CPU installation instructions: [https://pytorch.org/get-
started/previous-versions/](https://pytorch.org/get-started/previous-
versions/)

OpenCV should work without CUDA. If not, build from source and consider
`WITH_CUDA` flag.

------
mopierotti
It would be interesting to see this applied as a photo viewing app for VR, or
applied to videos.

------
akdor1154
Is this what is already implemented on Facebook's newsfeed for certain photos?

~~~
viggity
I can't speak for all photos - but I'm fairly certain those 3d looking photos
on FB must be taken with with a stereoscopic camera (2+ lenses). They can
calculate depth when they know the distance between the lenses (and
differences in focal lengths, etc).

That isn't to say they couldn't do it retroactively for single lens photos,
but I'm guessing not right now they're not.

------
felixyz
The experience of Mark Twaine's shoes was worth it for me. So dapper.

------
amelius
This would be great as a CSS scroll effect /s

------
muglug
Very cool work, but it's a little jarring to see photos of segregation, war
and famine being used to show off the algorithm
([https://filebox.ece.vt.edu/~jbhuang/project/3DPhoto/3DPhoto_...](https://filebox.ece.vt.edu/~jbhuang/project/3DPhoto/3DPhoto_Legacy.mp4))

~~~
craftinator
Please do not allow me to see that which I do not want to see.

~~~
sp332
It's weird to use negative content in a promotional context. I agree with
Robotbeat but it's still fair to say that it's "jarring".

