
3D Object Manipulation in a Single Photograph using Stock 3D Models - nkurz
http://www.cs.cmu.edu/~om3d/
======
mxfh
Anybody remembers _MetaCreations Canoma_ released in 1999?

Worked also with only one photo.

Extending this into using known 3rd party geometries of identifiable objects
instead of reconstructing by hand seems like a very logical extension in
retrospect.

[http://www.canoma.com/](http://www.canoma.com/)

[http://digitalurban.blogspot.de/2006/12/great-software-
from-...](http://digitalurban.blogspot.de/2006/12/great-software-from-past-
canoma.html)

As cited in the paper and by _Canoma_ this 1996 paper by _Paul Debevec_ is
really where it all started: _Modeling and Rendering Architecture from
Photographs: A hybrid geometry- and image-based approach_

[http://www.pauldebevec.com/Research/debevec-
csd-96-893.pdf](http://www.pauldebevec.com/Research/debevec-csd-96-893.pdf)

Still very impressive Video:
[https://www.youtube.com/watch?v=RPhGEiM_6lM](https://www.youtube.com/watch?v=RPhGEiM_6lM)

------
drcode
Darn, it looks like it won't be long before photo editing software can (1)
Find stock models for all objects in a scene (2) Align them perfectly (3) Let
you manipulate them arbitrarily (4) Render an output picture with all the
changes applied that is virtually indistinguishable from a real photograph.

Once this happens (and it doesn't look like it'll take long) photography will
no longer be an accurate reference for knowledge about the real world.

~~~
bhouston
> photography will no longer be an accurate reference for knowledge about the
> real world.

I think we passed that point a few years ago. The trick now is to lower the
barrier so that it is more accessible to more people.

The state of the art in rendering looks perfect:

[http://www.ronenbekerman.com/inspiration/](http://www.ronenbekerman.com/inspiration/)

Our product, Clara.io, makes it possible to render things like this out in
real-time:

[https://clara.io/view/bee73adb-
ed90-47c0-8048-93accd56ff80/r...](https://clara.io/view/bee73adb-
ed90-47c0-8048-93accd56ff80/render)

~~~
afro88
Still not there for me. The devil is in the detail - the sauce and the
strawberries in the food picture, the books in the desk picture. Also, I still
haven't seen a rendering of a human being that looks genuinely photorealistic.

I used to love that technology was pushing towards this point when I was a
kid. Now it scares me. Maybe I'm getting old..

~~~
thomaseng
These renderings of a girl look pretty damn realistic to me:
[http://vimeo.com/40602544](http://vimeo.com/40602544)

~~~
pja
Those are pictures of an adult, not a child. When you use the word "girl" to
describe an adult woman you're implicitly belittling her. Don't be that guy.

~~~
goblin89
> Those are pictures of an adult, not a child. When you use the word "girl" to
> describe an adult woman you're implicitly belittling her. Don't be that guy.

Sorry, I'm not a native English speaker (so I'm not confident enough to
downvote or anything), but your judging this use of “girl” as female version
of “boy”, ignoring the overall mode of expression, doesn't seem adequate. I
would have no objections if thomaseng's comment was more formal:

> I would consider these pictures of a girl quite lifelike

But it's not.

If I were the author, and the pictures were of a man, I'd totally say “guy”.
Once you flip the gender, “guy” seems to become “girl”, not “woman”. (Again,
given the overall informal style used.)

And as for the word “guy”, it doesn't sound in any way belittling a grown-up
man (and you just used it yourself).

~~~
pja
I would say that the male equivalent of "girl" is "boy", not "guy".

The English language is often unhelpful in that exact equivalents of the word
you want that exist for one gender don't exist for the other, or else carry
other connotations. Master vs Mistress for instance.

~~~
barrkel
And the female equivalent of guy is girl. The word 'girl' genuinely has more
shades than the submissive little box you want to put it in.

------
benwen
Reminds me of the Running Man (1987) scene where, in supposed real-time, a
video production editor synthetically composes Arnold Schwarzenegger's and
Jesse Ventura's characters together in a deathmatch. One would have to go from
rigid-component origami birds on static frames in this CMU paper to semi-solid
human figures on moving frames in the movie. 3D models of famous actors'
bodies are already made for special effects, painstakingly rendered and
composited together in batch mode.

(Personal recollection: there was a solid model Shaq's head at 3d modeling
company Viewpoint Datalabs back in the day. His head is _huge_.)

Stills from Running Man taken at about 01:19 -
[http://imgur.com/rQlxigG](http://imgur.com/rQlxigG)

------
bhouston
This is a neat approach. Basically it is a combination of:

(1) Fitting 3D stock models to existing models using a simple but interactive
ray casting approach.

(2) Estimating soft lighting on objects fairly convincingly.

(3) Re-rendering the stock models using the artificial lighting and textures
of the original photographs.

It is a pretty cool approach. There are real limitations to this but I think
that the automated lighting estimate is just cool and has wide applications in
the visual effects space.

~~~
modulus1
And they appear to have some way to fill in the part of the photo occluded by
the cut-out objects.

~~~
bhouston
They never seem to mention that in the paper, at least not prominently as I of
course skimmed it today. But Photoshop already has a built in tool for this,
so I guess they can just use the standard methods that seem to work fairly
well.

~~~
smackfu
"We compute a mask for the object pixels, and use this mask to inpaint the
background using the PatchMatch algorithm [Barnes et al. 2009]"

PaintMatch algorithm:
[http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/index.php](http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/index.php)

~~~
bhouston
Yes, that is the one that Photoshop has adopted and renamed " Content Aware
Fill"! Details:
[http://www.adobe.com/technology/projects/patchmatch.html](http://www.adobe.com/technology/projects/patchmatch.html)

------
dm2
This is very impressive, but were the fingers behind the paper crane drawn in
by hand? I don't see how any algorithm could create that kind of content.

I'd really like to see a video of someone starting with an image and using
these algorithms and tools to create one of these effects from start to
finish.

~~~
nkurz
In the full paper they say "We use a separately captured background photograph
for the chair, while for all other photos, we fill the background using
Context-Aware Fill in Photoshop."

So I think the fingers indeed were filled in algorithmically. This is
plausible since, as best as I can tell, current Context Aware Fill algorithms
are based on magic.

Follow some of the links here for examples and detailed explanation:
[http://www.adobe.com/technology/projects/content-aware-
fill....](http://www.adobe.com/technology/projects/content-aware-fill.html)

~~~
dchichkov
Yes, they are based on magic!

As per wikipedia. Imagination, also called the faculty of imagining, is the
ability to form new images and sensations that are not perceived through
senses such as sight, hearing, or other senses. Imagination is magic. Everyone
knows that. So any generative model in general is ;)

~~~
dm2
"Any sufficiently advanced technology is indistinguishable from magic."

[https://en.wikipedia.org/wiki/Clarke's_three_laws](https://en.wikipedia.org/wiki/Clarke's_three_laws)

------
wildpeaks
If you like this kind of effect, you should also check out VideoCopilot
because inserting 3D objects on top of reference images or video is a
recurring use of After Effects (it even ships with a lite version of Cinema4D
now).

Example with a 3D truck:
[http://www.videocopilot.net/tutorials/3d_truck_compositing/](http://www.videocopilot.net/tutorials/3d_truck_compositing/)

This and Photoshop's context-aware fill (to help fill the holes left from
removing the object in the reference image) are very handy to achieve such
effects.

------
Osmium
Ah, it's time for SIGGRAPH again! Excellent. As a layman, I always look
forward to the new "looks like magic" results that come out of there.

------
bakbek
Taking this approach geared towards pre-made 2d still imagery and implement on
rendered stills out of a 3d model and some serious MAGIC can take place!

In this scenario you already have all 3d elements in hand, so no need to look
for them, as well as the complete environment. lots of things that called for
re-rendering can be done with this approach post render.

------
phkahler
We need a way to do digital signatures on images such that they cannot be
faked. It should verify the image, location, time, and serial number. I know,
this seems impossible since someone (the camera) needs to know the private key
and that could be compromised.

~~~
terhechte
I've thought about this before, and it is actually pretty easy. You just do a
hash of the image and push it into the bitcoin blockchain as a transaction.
done. Only downside is that it will cost a wee bit of money and that you need
to be connected to the internet at the point when you shoot the image (or, at
the point where you want to have the image verified). See:

[http://www.proofofexistence.com/about](http://www.proofofexistence.com/about)

~~~
bhouston
That only establishes the date of registration, not whether it was taken with
a camera at a specific location on a specific date.

~~~
terhechte
Oh right, I had only thought about proofing that an image hasn't been tampered
with after a certain date, but of course it could have been tampered with
before that.

------
stevebot
This is cool. As a non-photo editter, can someone explain to me this
statement?

"the user (c) interactively aligns the model to the photograph and provides a
mask for the ground and shadow"

What is a mask for ground and shadow and how hard is it to develop one?

~~~
bhouston
A mask in photoediting is usually a gray scale image that is white where you
want to remove things and black where you want to keep things and often is in-
between where you want soft edges.

To create these in photoshop you can use the magic wand tool and it selects
things with similar colors. But you can create these types of masks in a
variety of ways.

------
jgreen10
Why do the legs of the rendered chair connect at the bottom, while those are
hidden in the picture and the stock 3D model does not show them?

~~~
DanBC
They use publicly available 3d models. I'm more interested in how they get the
colour of the hidden legs and hidden base of the chair correct.

~~~
3rd3
They fill hidden parts according to symmetries they find in the object’s
geometry or make use of model’s textures or user-defined input if there is no
symmetry.

 _> For areas of the object that do not satisfy the criteria of geometric
symmetry and appearance similarity, such as the underside of the taxi cab in
Figure 1, the assignment defaults to the stock model appear- ance. The
assignment also defaults to the stock model appearance when after several
iterations, the remaining parts of the object are partitioned into several
small areas where the object lacks structural symmetries relative to the
visible areas. In this case, we allow the user to fill the appearance in these
areas on the texture map of the 3D model using PatchMatch._

------
jbhatab
I tried downloading this and running it but they both had issues. Any way I
can get on an email listing for when it is officially launched?

------
syshen
you can also check out this app,
[https://itunes.apple.com/us/app/insta3d-instantly-create-
you...](https://itunes.apple.com/us/app/insta3d-instantly-create-
your/id883125430?mt=8) , which also turns a person's selfie picture into a 3D
avatar model.

------
nileshtrivedi
So if you have a 3D model of a person, you can make completely fake videos of
them doing something?

~~~
ygra
I guess that was possible beforehand. Just take a look at movies.

