
Face2Face: Real-Time Face Capture and Reenactment of RGB Videos - signa11
http://niessnerlab.org/projects/thies2016face.html
======
sondh
Imagine some day a hacker will be able to mimics speech with the next version
of Tacotron 2[1] then use it over the next version of Face2Face... Scary
future ahead.

[1]
[https://news.ycombinator.com/item?id=16014047](https://news.ycombinator.com/item?id=16014047)

~~~
hacker_9
Wow those text to speech examples are amazing. The use in videogames could be
huge. Exciting stuff.

~~~
Razengan
My thoughts exactly!

On an earlier Tacotron article:
[https://news.ycombinator.com/item?id=15963888](https://news.ycombinator.com/item?id=15963888)

------
stuntkite
I knew this was coming and it's more surreal that I thought it would be. We
crossed the uncanny valley. I'm not sure we are prepared for trusted sources
to be faked this well and this easily. It will be interesting to see how we as
a society start to cope with an acid trip level of real life unreliable
narrator.

~~~
bsaul
This could be a new use for DRMs. Certifying that the image stream is real and
unaltered (or only by trusted sources).

~~~
beagle3
DRM does not do that. Plain old public key cryptography does.

~~~
stuntkite
I read something recently about photographers pushing for cryptographic keys,
maybe it was even registering media in the blockchain. Certifying origin and
authorship for exactly this situation. I think it was a podcast. Maybe Open
Source.

------
minxomat
Could be huge in dubbing or even just ADR for movies. I'll take a slightly
uncanny smile over the abomination of some dubs any day.

------
tziki
Since this is around 18 months old, does anyone know what's the current state
of the art?

~~~
yomansat
I found this from ~3 months ago, the recent addition to github:

[https://google.github.io/tacotron/publications/uncovering_la...](https://google.github.io/tacotron/publications/uncovering_latent_style_factors_for_expressive_speech_synthesis/index.html)

Seems to do with manipulating the speed styles in how we speak.

Listen to the "neutral" blue lagoon, vs token 6 (tired sound), vs mix w/ token
7 (prominence)" (very confident).

Paper:
[https://arxiv.org/pdf/1711.00520.pdf](https://arxiv.org/pdf/1711.00520.pdf)

------
okket
Previous discussion:
[https://news.ycombinator.com/item?id=11314931](https://news.ycombinator.com/item?id=11314931)
(2 years ago, 35 comments)

------
nl
Isn't this around 18 months old now?

GANs should be able to do a lot better now.

~~~
sschueller
Yes, this from 2016

~~~
arbie
Mods, please add 2016 to post title.

