
Total Moving Face Reconstruction - mxfh
http://grail.cs.washington.edu/projects/totalmoving/
======
hunvreus
Can't help but think of "The Running Man" watching Schwarzenegger's face being
rendered in 3D.

It's terrifying to think that in the next 5 to 10 years we won't be able to
distinguish a forged, high definition video of pretty much anybody.

~~~
canadaj
Terrifying would be continuing to take everything we see as fact in 5 to 10
years. I think we are in a special time where if you see a video of pretty
much anybody, you can usually tell right away if it's fake or not. Sometimes
it's hard, but it can usually be debunked. Before that, you really couldn't
fake any videos without it being obvious.

In 5 to 10 years, hopefully we will have learned to never ever ever take
anything we see as fact, because we absolutely will not be able to distinguish
rendered video from the real thing.

~~~
anigbrowl
True, but the downside of this is that anyone caught in the act will use this
as a defense of first resort - it'll be the 2020's equivalent of 'my twitter
account was hacked' until we establish some sort of reliable ELA tool to grade
source material.

Indeed, as there are more and more cameras around (including autonomous ones
of increasingly tiny size) imagery of the videographer will probably become a
major authentication factor.

~~~
canadaj
I believe everything you have said is true, especially with cameras becoming
smaller and more common.

However, I'll still be optimistic and hope that with the increasing number of
cameras, people will be less likely to engage in activities where they
shouldn't. The opposite argument would be that with the increasing technology
to fake such an activity, the amount of 'my twitter account was hacked'
incidents will rise.

I guess only time will tell?

~~~
guelo
But the "activities where they shouldn't" will be defined by the powerful.
It's total state or corporate control, depending on your dystopian future
preference.

~~~
kordless
The solution to this problem of pools of power probably lies in solving what
is and isn't allowed using a continuous consensus based method. To enable
that, we need realtime feeds of everything available, available to all, and
some sort of filtering mechanism to deliver video to the right people for
evaluating consensus on whatever happens that is contestable. Today's
'powerful' are only powerful because they control the information. Whatever
gets built to improve upon society needs to ensure it can't be wielded for
self gain or increased lever of power action.

~~~
pjc50
Social regulation by Youtube commenters? What fresh hell is this?

Consensus is not sufficient. This is why rights-based approaches are
developed. It has to be OK to live an unpopular lifestyle that's not harmful
to others.

~~~
kordless
I agree, which is why I said "deliver video to the right people for evaluating
consensus on whatever happens that is contestable". That gives you the right
to live a way that is 'contestable' without having to pay a price for it
(because you have a right to it).

------
bsenftner
I run a startup specializing in this space called the 3D Avatar Store
(www.3d-avatar-store.com).

3D Reconstruction of human faces is literally on the edge of mainstream. I'm
betting on it, personally.

Our system is similar as theirs, but more general: we laser scanned 300,000
real people and then associated each laser scan with dozens of photos of that
person taken from different angles, lighting conditions and expressions. That
data set was then used for a neural net training - actually a pipeline of
neural nets.

We can accept 1 photo and get back a good quality 3D model, or a series of
photos and get better quality, or HD quality video and get back frame by
frame, in expression reconstructions just like their solution. In fact, our
system is able to recover 36 people per video feed in real time, as well as
handle 4 video feeds at once. We don't need as much reference information as
they do, because we trained our system to generally understand the human
facial form, rather than their solution that operates in isolation for a
single reconstruction operation.

Our current system is targeted as a WebAPI for games and serious simulations -
enabling 3rd parties to implement "put yourself in the game" functionality. As
such we have 3 different geometry outputs aimed at game/simulation developers.
We also do facial recognition, and we have a special "forensic" output for
that.

Our current "best output" is purposely "Pixar like" rather than realistic.
Taking them realistic tends to freak people out - especially women (seems like
our culture has trained women to have an idealized self image, and when
presented with their non-mirror true form, they don't like it.)

You can learn more at these links: [https://3d-avatar-store.com/Web-API-
Features-May-2014](https://3d-avatar-store.com/Web-API-Features-May-2014)
[https://3d-avatar-store.com/3D-Avatar-Creation-walkthru](https://3d-avatar-
store.com/3D-Avatar-Creation-walkthru) [https://3d-avatar-store.com/New-Face-
Finder](https://3d-avatar-store.com/New-Face-Finder)

~~~
kemelmi
Very nice! Our work is different though: we create high detail 3d moving 3d
models from youtube videos -- without any manual interaction (looks like there
is quite a bit of interaction in creation of an avatar on 3d-avatar-store) Ira
[http://homes.cs.washington.edu/~kemelmi/](http://homes.cs.washington.edu/~kemelmi/)

~~~
bsenftner
The interaction is primarily there to support users who supply poor quality
photos. Given a photo taken with an actual lens (not a mobile phone's pin
hole) the manual portion can be skipped. Plus, since we are only exposing
single photo input (because given the opportunity to supply multiple photos,
most users supply multiple garbage photos) certain profile features are
difficult to recover. So we have a "3D detailing" interface so people can
adjust their profile and add smile creases and so on. That 3D detailing
interface also allows for exaggeration - which is how the avatar on our home
page is presented.

Your work is very nice as well. Like yours, our video version requires no
manual interaction. It's primarily used by government agencies, and we've not
exposed it to the public yet.

------
phkahler
I like that they show cases where it has problems. It's very much "here's what
we can do, and here's where it doesn't work." There is no hype, no claims of
"novelty", no speculation on uses, just results. I wish this were far more
common.

~~~
Iv
I still would like to see the reconstruction from another angle than the
original. Depending on how much you accept to cheat, it is not too hard to get
a good result from that anfle only.

~~~
pimlottc
They demonstrate the reconstruction at different angles in the video @1:49

------
anigbrowl
Somewhat off-topic, but I wonder why facial recognition/modeling experts seem
to persistently ignore ears and jawline. As someone who works in film and does
some picture editing (though it's not my primary skillset), ears are just as
individual as other parts of the face, and they're one of the trickiest things
for makeup artists to work on. As CG in movies and videogames keeps improving,
my suspension of disbelief is often broken by noticing problems with the ears,
eg watching a CG anime film and noticing that everyone has the same ear shape.

~~~
swartkrans
> As CG in movies and videogames keeps improving, my suspension of disbelief
> is often broken

I have never really seen CG portrayal of people in live action films that
didn't break suspension of disbelief, it always falls right into uncanny
valley stiffness. The exception being Avatar, which for whatever reason
doesn't seem to have had the problems other films have.

~~~
forrestthewoods
That's a silly statement. If it didn't break suspension of disbelief then it's
because you didn't even realize it was CG! If you go to movies at all then I
can all but guarantee you there are CG shots that you had no idea were CG.

Here's some super impressive CG from the first Captain America movie back in
2011. Crazy insane stuff they're doing it. And that's for a very, very extreme
case. You better believe they are doing slick stuff in non-extreme cases!
[http://www.fxguide.com/featured/case-study-how-to-make-a-
cap...](http://www.fxguide.com/featured/case-study-how-to-make-a-captain-
america-wimp/)

For Iron Man it's not quite the same because it's a suit but it's just as
impressive. Since Iron Man 2 there is no full suit worn by an actor. There are
no legs and at this point there are barely even arms. There's a chest piece
and open mask face piece and that's about it.
[http://movies.stackexchange.com/questions/2198/how-are-
the-i...](http://movies.stackexchange.com/questions/2198/how-are-the-iron-man-
suit-scenes-filmed)

In The Social Network there are a lot of scenes with the Winklevoss twins.
Spoiler: They didn't use twin actors. They used two actors and CG'd the face
of one onto the other body. No way in hell did you call this out on first
viewing. (number 3) [http://www.totalfilm.com/features/50-cgi-scenes-you-didn-
t-n...](http://www.totalfilm.com/features/50-cgi-scenes-you-didn-t-notice/the-
social-network-2010)

~~~
rlx0x
I think we are talking about CGI faces here and thats just not doable yet and
this research won't change a thing about it. The last movie I remember having
a CG face was Clu in Tron Legacy which certainly was state of the art CG,
unfortunately 'state of the art'[1] in face modeling/animation is just not
perfect yet.

[1]
[https://www.youtube.com/watch?v=CvaGd4KqlvQ](https://www.youtube.com/watch?v=CvaGd4KqlvQ)

~~~
forrestthewoods
That link is real time. State of the art server farm rendered is orders of
magnitude better.

"The last movie I remember having a CG face was Clu" By strict definition if
you remember it having a CG face then it wasn't good. It's not possible for
you to remember a good CG face because if it's good you wouldn't know it was
CG!

The next big test might by Paul Walker in Fast and Furious 7 or Philip Seymore
Hoffman in Hunger Games 3. Both big budget movies where the actor died and CG
will be used in at least some places. That's the ultimate. I think we can
already get away with CG if the actor isn't known. But for actors who have
faces we know and mannerisms we subconsciously recogonize it's another step up
in difficulty.

------
sabalaba
3D reconstruction is used in state-of-the-art facial recognition as well.[1]
Essentially you reconstruct the face in 3D, rotate the 3D model to the front,
project it back into 2 dimensions, and then feed it through a CNN with deep
architecture. Because this gives you very good alignment, you can do tricks
like not having shared weights across the entire image. That is, each section
of the input vector is known to correspond to a certain part of the face and
thus can learn unique parameters that are well suited for that specific
region.

The paper claims that it takes about 105 seconds to render a single frame. So
one second of 30 fps video would take about 52 minutes to render. I would have
to read more in depth to see what kind of savings can be had by sharing
information across frames. (The paper also doesn't mention the use of GPU
acceleration.)

[1]
[https://www.facebook.com/publications/546316888800776/](https://www.facebook.com/publications/546316888800776/)

~~~
danbruc
Roughly they use a model of the face, render it, compare the source image and
the rendered image, estimate changes of position, orientation and deformation
for the model that will yield a better match and then just repeat this until
the result is good enough. While you probably can exploit temporal coherence
between frames the process is inherently pretty expensive due to its iterative
nature. But it may also be relatively easy to parallelize bbecause of this.

------
daniel_reetz
I'd like to ask the authors how they managed to do such great/natural looking
reconstructions of the eyes. Eyes are tough because they're naturally
specular, transparent in places, and refractive.

~~~
danbruc
After quickly reading the paper, I think they did nothing special for the
eyes; it's probably just the result of the final refinement step. The result
looses a bit of its magic, though, when you read in more detail how it works.
But it remains impressive.

------
macca321
I think I'm missing the point. Why are all the reconstructed videos from the
same angle? It would demonstrate it better if they repositioned the camera.

~~~
omellet
If you watch the George Bush segment, it shows the reconstructed video from a
different angle (straight-on, eliminating the head movement).

------
debt
I've been increasingly interested in the Face. Human beings must have some
incredible mental calculations going on when parsing a face. We're an evolved
species that use the face as a form of communication.

I love the attached video in the link because it isolates perfectly the face.
If you look closely you can see these tiny minute combinations within the face
as each person talks; the eyes shifting, the face rotating, looking in various
directions, the forehead crunching, the eyebrows raising, smiling, etc. All of
these "cues" combine to create a message that we interpret instantly.

The face has inspired me lately to read more into this subject as it seems, at
least on the surface, to be an extremely complex innate human ability; facial
recognition.

~~~
Throwaway12830
It really is incredible. I mean, in the original link, look at the image of
Arnold used for the video before playing. It's a blurry, greyscale section of
his face. Nonetheless, most people could easily recognize that face as being
him.

Billions of people in this world, we all have a very similar facial structure
with two eyes, a nose and a mouth, and yet you can recognize that small blurry
face in a fraction of a second.

I imagine it's similar with animals. You can have thousands of birds in a
flock, and they can recognize their mate instantly. To us, we'd have to
carefully analyze the birds for days or weeks to make that same match.

~~~
31reasons
What amazing is that scientists discovered that brain allocates one single
neuron fires for each recognized face. For example, You have a one special
neuron in your brain that fires when you see Bill Clinton's face. And that
exact neuron fires no matter what picture of Bill Clinton you happened to see.

~~~
lutusp
> What amazing is that scientists discovered that brain allocates one single
> neuron fires for each recognized face.

This is nonsense, citation needed.

> For example, You have a one special neuron in your brain that fires when you
> see Bill Clinton's face.

This is New Age science, i.e. not science at all. It's a myth. One neuron is
equivalent to one bit in a computer. One bit of information is insufficient to
distinguish between faces.

> And that exact neuron fires no matter what picture of Bill Clinton you
> happened to see.

Maybe you could learn a tiny bit of neuroscience before spreading this kind of
nonsense.

~~~
31reasons
You need to learn some neuroscience buddy. One neuron is not one bit. A neuron
works in much complex way than a single bit in a computer. A neuron has
hundreds to thousands axons and dendrites and connect with other neurons in a
dense network. I am surprised by your knowledge of neuroscience yet attacking
me. very strange.

EDIT: As dragonwriter says below, lower levels of neural networks are
responsible for general facial recognition but that triggers more specific
neurons once the recognition is done from generic face -> specific person.
Even in artificial neural networks single bit is sufficient at the end of the
classification.

~~~
lutusp
In that case, your original claim was false -- either one neuron is acting
alone as you claimed, or it isn't. Your claim was that one neuron was acting
alone, which is absurd.

A neuron is interconnected to many others, but this doesn't mean one neuron is
many neurons, any more than one binary bit is 64 binary bits by virtue of its
position in a binary word.

------
Harshit15
This can help a lot in recreating the faces of avatars, in animated movies and
games. They have tough time tracking facial details using small markers. I was
wondering using shadow and shine removal to solve the issues shown in end. An
example here is implemented by these autonomous car designers detailed as
shadow correction:
[http://www.igvc.org/design/2013/US%20Naval%20Academy.pdf](http://www.igvc.org/design/2013/US%20Naval%20Academy.pdf)

------
imaginenore
This kind of advancements is one of the reasons I don't post photos of myself
online. In a few years we will be capable of making videos with anybody's face
replaced with anybody else's. It will be trivial to produce a fake video that
can cause all kinds of legal troubles.

And yes, I realize it's even possible now, but with all the new algos and
software coming out it will be easy enough for somebody to just mess with
people's lives for fun.

~~~
hnriot
doesn't matter if _you_ post photos of yourself online, very likely it will be
others that do so. And your image is being digitally captured constantly, if
you travel on public transport or go through an airport, or do anything in a
public place your photo has been captured many times. Figuring that it's you
is also pretty straightforward. All it takes is a DMV hack or Facebook (where
one of your coworkers tagged you in a photo, or instagram where your niece
tagged you by the bbq on july 4th etc etc etc...

~~~
nkozyra
In the future, we are _all_ in porn.

~~~
elwell
While I think this is a derangement of society, I can imagine a porn app that
let's you drop in a facebook id to superimpose your crush's likeness upon a
porn actor.

------
aresant
Technology likes this makes me wonder how long of a shelf-life video
"evidence" has.

Or perhaps these same algos will also provide utility in detecting / decoding
"fakes", sort of like edge-tracking / error level analysis etc today.

------
tantalor
Tragic they removed the verbal audio from the demo video. It would have been
much easier to judge the visual accuracy if the reconstructed lip motion were
combined the original sound.

------
igriffer
Hi! Anybody have some sources? I want to touch this method =) This 3D
reconstructions are the best material for the face recognition!

------
Aqwis
Very impressive. How large does the photo collection of the individual have to
be to achieve results like those in the video?

~~~
anigbrowl
The source videos they used as examples are about 600 frames each, which is
~40 seconds of video. For a politician or professional actor this would be
easy to achieve. Likewise if you wanted a good 3d model of yourself or a
friend you could do it cheaply using this method.

~~~
Aqwis
I may be misreading the paper, but as far as I understand you need a
collection of photos _in addition to_ the video.

~~~
anigbrowl
I thought that too, but then they have some ambiguous sentences like 'In this
paper we target high detail reconstruction from a single video captured in the
wild, i.e., under uncontrolled imaging conditions,' where it seems as if they
treat that as their input library. Other parts of the paper support your
interpretation. It's annoyingly vague.

...but still, outstanding work even if it requires manual calibration like
curation of the input set.

------
SnowProblem
This will be huge for VR.

------
31reasons
This + Virtual Gesture Tracking + VR = Virtual Meetings

------
polskibus
The question that burns me is - when will we see this amazing algorithm
implemented as part of OpenCV ?

------
Htsthbjig
Quite dangerous what this technology will mean in the future, they could
manufacture evidence against you, publish it, and then let the masses lynch
you.

I think this is what happened to the recent decapitation videos, they were
reconstructed from home videos.

IMO the videos with the people dead in the floor are true, but the videos
where they talk are staged.

Today we know there was a CIA team whose job was faking videos of Osama Bin
laden: [http://blog.washingtonpost.com/spy-
talk/2010/05/cia_group_ha...](http://blog.washingtonpost.com/spy-
talk/2010/05/cia_group_had_wacky_ideas_to_d.html)

Remember Osama Bin Ladem appeared and disappeared according to US army
interest at the time, finally ending in very strange circumstances(and being
buried on the ocean, not letting anyone else interantionally to confirm(by
DNA) he was Osama).

For me it is staged because current technology could synthesize a voice only
if there are not strong emotions. The same happens with the voice.

With strong emotions it becomes very easy for familiars and friends to notice
as people do specific gestures and most of them are not recorded in video.

That people are perfectly calm before dying I could understand, but that they
do while saying exactly what their captors want I can't.

Also, before the videos most of the population in UK did not want to go to
war, after the videos(with a UK native), most of them support war, quo prodis?

