
Lyrebird – An API to copy the voice of anyone - adbrebs
https://lyrebird.ai/demo
======
eadz
Combined with Face2Face[1] live video impersonation, it is truly time to be
very careful verifying videos or even live streams.

[https://www.youtube.com/watch?v=ohmajJTcpNk](https://www.youtube.com/watch?v=ohmajJTcpNk)

~~~
nihonde
Without a doubt, our concept of personal identity will be completely
unreliable within a few generations. Forget about privacy--we will soon have
literally no way to verify who we're talking to.

~~~
proaralyst
Crypto would still work, and this tech isn't going to work face-to-face.

~~~
anigbrowl
Neither will insulate you from a deception which you wish to perpetuate upon
yourself, and identifying the latter is a trick that con artists specialize
in.

------
pbhjpbhj
Last week on BBC Radio 4 I heard of a woman who was losing her voice through
disease (MND maybe?), a similar system was being anticipated and she was
saving voice samples to seed it with.

She had been a singer and strongly identified her self with her voice, she
wanted to be able to use a speech synthesis system that had her own voice
pattern.

Apologies if this was already mentioned, but it seems to be a use others here
hadn't considered.

~~~
k2xl
I was just thinking that Stephen Hawking would perhaps be interested in using
this to replace his current voice synthesizer (feeding in old interviews of
him when he could talk). He has said that he has adopted the current voice
since he has associated it with his own, but I wonder if he would prefer his
old _actual_ voice.

~~~
Ensorceled
I think Hawking is now so firmly tied to that voice that he would probably
never switch for public speaking engagements and the like.

I could see him doing such a switch for personal interactions.

~~~
Negative1
I recall his biggest qualm with his current synthesized voice is that it did
not come with a British accent. :-)

I'm not sure how sentimental he is but he does seem quite tied to that voice
since there have been lots of advancements in voice synthesis since he
originally got this and yet he's chosen to keep this one.

~~~
mabbo
I wonder if he'd accept the same voice, but with a different accent?

Should be just as possible in the present or near future.

------
qeternity
While all of these vec2speech type models are impressive, I get the feeling
that most of the comments didn't listen to any of the samples. It's still
distinctly robotic sounding, probably has quite a bit of garbage output that
needs to be filtered manually (as many of these nets often have) and is a far
cry from fooling a human.

~~~
Infernal
In the first clip, I'd say 80% of the soundbites were obviously robot-like,
but one or two of the "Obama" quotes were startlingly clear - "The good news
is, that they will offer the technology to anyone" \- I can't hear anything
wrong with that in the first clip at all. If they were all that quality I'd
say we'd be easily fooled. As a proof of concept this is pretty big.

~~~
bpicolo
I can definitely hear issues with that phrase. It has quite robotic drop-offs.

Though coming soon: Neural networks to determine whether speech is NN-
generated? :P

~~~
uremog
And then the generators train their own generation NN against that :P

~~~
hoschicz
And you have invented Generative Adversarial Networks. They are the basis of
all new ML findings, like pix2pix.

------
paraschopra
I appreciate the ethics link up there in the menu. Not sure if I noticed it on
any other AI startup (or for that matter, any startup). Given how complex the
world is becoming due to ever increasing co-dependence with tech, I can see
how such pages could become as important as 'pricing' or 'sign up' pages. (The
privacy issues with Unroll.me, Uber and a thousand other such services will
only accelerate this trend).

Good job, team Lyrebird. My feedback is that while the inclusion of ethics
page is great, it could do with more content on your vision and what you will
not let your tech be used for. I know others can develop similar tech, but it
will be good to read about YOUR ethics.

[Edited for clarity]

~~~
tigerBL00D
I agree, it is reassuring to see that the team is thinking about ethical
implications.

Judging by the samples from the homepage there are audible artifacts in the
recordings resulting from synthesis. I doubt these would pass scrutiny if
presented as evidence in court. In some ways forging a voice is like forging a
signature, truth can be exposed with enough effort.

------
keithwhor
I love this. The business model is too good to be true.

1\. Open source voice-copying software

2\. At worst, create entire market of voice-fraudsters, at best, very few
voice-fraudsters but very high and very real perception of fear of such

3\. Become leading security experts in voice fraud detection

4\. Sell software / time / services to intelligence agencies, governments, law
enforcement, news networks

Ethically I'm a bit concerned with (2), but realistically the team is right
--- this technology exists, it will certainly be used for good and for bad,
and they're positioning themselves as the leading experts.

I'm interested to see which VCs and acquirers line up here. Applying a voice
to any phrase seems useful for voice assistants (Amazon Alexa, Google Home)
but I don't think that's the $B model.

~~~
swamp40
You could charge 99 cents to have Siri talk in your favorite actor's voice.

------
pinpeliponni
Funny thing is, this is approximately where CIA was with similar technology in
closer to 2000. They did some demos for politicians about how they can given
anyone's fake their messages. That stuff is golden for propaganda means, and
for confusing stuff like military chains of command. Today the CIA probably
has worked out all the robotic artifacts already, and their output is really
indistinguishable.

~~~
dmix
> Funny thing is, this is approximately where CIA was with similar technology
> in closer to 2000

Source?

~~~
abdias
Not OP, but here is one related source: [http://www.washingtonpost.com/wp-
srv/national/dotmil/arkin02...](http://www.washingtonpost.com/wp-
srv/national/dotmil/arkin020199.htm)

I do not think the technology involved artificially generated voice though,
but simply morphing someone's voice into sounding as the target voice.

------
yladiz
This is pretty cool (although, I have no idea what other technologies exist
for this kind of thing), but it's definitely not convincing enough to a human
listener. This sounds like it might be convincing enough for some programs
like "Hey, Siri" but it's not gonna convince your mom. You can listen to the
samples on the page linked here and you can immediately tell that Obama and
Trump don't sound quite human.

~~~
ehsankia
Well, the question is, do they just need to throw more computational power /
training at this algorithm or is that the peak of their implementation?

This is something Google has been working a lot on [1] and Baidu also recently
posted about their results too [2]. We're definitely pretty close to passing
the human detectable level.

[1] [https://deepmind.com/blog/wavenet-generative-model-raw-
audio...](https://deepmind.com/blog/wavenet-generative-model-raw-audio/)

[2] [http://research.baidu.com/deep-voice-production-quality-
text...](http://research.baidu.com/deep-voice-production-quality-text-speech-
system-constructed-entirely-deep-neural-networks/)

~~~
backpropaganda
Google and Baidu have only demonstrated single speaker TTS. Lyrebird's the
first to demonstrate being able to generate arbitrary voices. Since this came
out of a research lab, I would guess that the quality would only improve if
they are given more compute and data.

~~~
jtbayly
The Google one is able to generate arbitrary voices based on the recordings
used for training. So much so that they made it generate piano music.

------
LegendaryPatMan
This is pretty basic at the moment and it's terrifying. Yeah, it has an MS Sam
feel to it, but as the tech improves and we know it will, you could use a
service like this to put words in someone's mouth. Think about how you could
trip up a CEO or a Politician by playing some random clip that they never
said. When that gets into the Zeitgeist judgments will be made in the court of
public opinion devoid of facts or real evidence. You could destroy democracy
or people's lives with technology like this

~~~
return0
There are human impersonators already. I suppose it's not that easy to fake a
visible, high-ranking person for long.

~~~
ygjb
Individual impersonators are not the threat. It's the glut of impersonators
that will present the real challenge. It would be very helpful to see a study
done with these platforms as they mature to determine what percentage of the
population is more easily fooled by these.

For example, as an individual with hearing problems, I may not be so easily
able to determine a synthesized recording from an actual recording - for a
short period of time. With longer recordings it may become more obvious.

------
got2surf
This is exciting! If you look at historic speeches (ie from American Rhetoric
[http://www.americanrhetoric.com/top100speechesall.html](http://www.americanrhetoric.com/top100speechesall.html)),
there are large variations in average characteristics between various
styles/contexts (on average, pitch/volume/speed are different for
inspirational vs somber speeches, for example). But there are also really
large differences in the variation - an inspirational speech may be marked by
large swings from quiet, reflective pieces to booming, rousing calls-to-action
while a somber speech has fewer swings in delivery.

For the examples given for various intonations from Obama/Trump, some
intonations are much more natural than others. It would be interesting to
decide how to parametrize a sentence for the intended intonation. (based on
word2vec analysis of the words in the sentence, punctuation cues in the
sentence, and perhaps a specified category of "emotional delivery").

It would be interesting at the sentence-level, but also at the macro speech-
level to include the right "mix" of intonations for a specific context. On a
related note, it would be interesting to study the patterns of intonations in
successful vs unsuccessful outbound sales calls, for example, to learn how to
best simulate a good human sales voice.

------
amarant
It's there any copyright protections for a person's voice? If not, David
Attenborough and Morgan Freeman will be lead voice actors in my next game
project

~~~
sudhirj
I don't think you can claim that your game was voiced by either of them, but I
don't see how using this would be any more infringing than using a tuned
synthesizer.

~~~
TeMPOraL
He probably won't be able to claim the game is "voiced by", but maybe he can
get away with saying it features "voices of"?

~~~
amarant
Actually i wasn't planning on saying either, just having the calm voice of
Morgan Freeman tell me i need shoot the zombies or whatever the game will be,
and then David explaining with fascination how such a poor shot could have
survived this far whenever you miss

------
eps
Impressive.

But also enabling the next gen of "Mom, I'm in Mexican jail. Quickly wire me
$2,000 so I can get out." scams.

~~~
PetitPrince
Wouldn't you need tons and tons of training data ? Obama, Clinton and Trump
are public personalities, so it's easy to have many hours of recording of
their voices.

A random relative; not so much.

~~~
mverwijs
They claim to only need 1 minute of recording. In this age of all the kids
sharing everything all the time that shouldn't be too hard to acquire.

~~~
ygjb
Even better, targeted attacks against a person to collect their voice could
involve contacting them for an opinion survey regarding a product, survey, or
political opinion they value. Gleaning something like that from social media
profiles is fairly easy.

~~~
carewornalien
"My voice is my passport. Verify me."

------
celticninja
Is this enough to beat voice recognition software?

If you thought fake news was bad before wait until these 'secret' recordings
start getting released and reported on.

~~~
simcop2387
That was one of my first thoughts too [1]. I doubt it currently will do so as
it does sound fairly robotic. It does sound closer however than other demos
I've heard so i think we're probably on the way.

[1] [https://youtu.be/-zVgWpVXb64?t=41](https://youtu.be/-zVgWpVXb64?t=41)

~~~
TeMPOraL
Throw some hard compression on it to get additional distortion, and I feel you
could get away with claiming it was a recording done by a low-power bug.

------
JustFinishedBSG
Cooler :
[http://www.dtic.upf.edu/~mblaauw/IS2017_NPSS/](http://www.dtic.upf.edu/~mblaauw/IS2017_NPSS/)

[https://arxiv.org/abs/1704.03809](https://arxiv.org/abs/1704.03809)

~~~
kastnerkyle
This model is quite cool, but also quite a bit different than what lyrebird.ai
is doing. NPSS has a lot of extra information in the control inputs about
pronunciation and timing (the part-of-phoneme timer feature) - this means that
most of the "hard parts" (in my opinion) for naturalness are control inputs to
NPSS/WaveNet style models, rather than variables the model must generate
globally and consistently as in lyrebird. At generation time NPSS appears to
generate each component autoregressively as well, but I am not clear on
whether the demo samples do this or if they use "true" values for f0 at least
- what forces the model to sing the exact same melody, if many melodies are
possible given the underlying audio information?

Also note that NPSS has some amount of post-processing, at least reverb and
perhaps other common musical mixing - we don't really know how these samples
are generated, and I have a hard time decyphering exactly what inputs are
required, and what are generated from the paper alone. However, I really,
really, really like NPSS - I just don't think the comparison you are making is
valid here.

These features (f0, duration, pronunciation) are some of the most difficult
things to learn to model from datasets of speech and text directly, and I am
not sure how they got the subset used (I think only f0 and
pronunciation/phoneme) for this NPSS model. Giving creators fine-grained
control of the performance (as in NPSS) is quite cool, and if these systems
can get fast enough I think the possibilities are really exciting. The same
things could likely be done with lyrebird as well - there is no real "tech
reason" you couldn't add more conditional inputs, with finer grained
information/control.

The key part in my mind is deciding what amount of complexity to show to a
user, and what amount to try and capture inside the model - some people may
want to control (for example) duration and f0 directly for a performance,
while others may want to just upload clips to an API and get reasonable
results back, with less ability to control each sample (they can still curate
themselves for the "best" samples). Lyrebird.ai is handling the latter case,
while the former case would require quite a bit more intervention from the
average user, almost becoming like an instrument ala the original voder [0].
However, you could potentially have _both_ approaches as a kind of
beginner/advanced mode, but advanced mode needs a user interface, and probably
near-realtime feedback.

I used to really strongly believe that the audio model was going to be the
hard part of "neural" TTS (blame my background in DSP perhaps), but post-
WaveNet the game has really changed a lot - conditional audio models are
something we are starting to know how to do pretty well.

The text pipeline of most TTS systems is still the craziest part in my mind,
check out a "normal" feature extraction of 416 hand-specified features [1]!
These extractions can be upwards of 1k features per timestep/frame, and
generally require a lot of linguistic knowledge to specify for new languages.
It seems (given Alex Graves' demo [2], char2wav [3], tacotron[4]) that we are
making progress on learning this information directly from text, which in my
mind is a key breakthrough for TTS in languages besides English, where lots of
work on English pronunciation has been done already and is generally
available.

[0]
[https://www.youtube.com/watch?v=TsdOej_nC1M](https://www.youtube.com/watch?v=TsdOej_nC1M)

[1] [https://github.com/CSTR-
Edinburgh/merlin/blob/master/misc/qu...](https://github.com/CSTR-
Edinburgh/merlin/blob/master/misc/questions/questions-radio_dnn_416.hed)

[2]
[https://www.youtube.com/watch?v=-yX1SYeDHbg&t=38m00s](https://www.youtube.com/watch?v=-yX1SYeDHbg&t=38m00s)

[3]
[http://josesotelo.com/speechsynthesis/](http://josesotelo.com/speechsynthesis/)

[4] [https://google.github.io/tacotron/](https://google.github.io/tacotron/)

~~~
yishhh
Hi Kyle, I was wondering if the lyrebird github implementation will be open
sourced as currently I am hoping to work on improving the current
implementation by incorporating prosody into speech synthesis, thanks!

------
joshmarlow
Finally, I can have Morgan Freeman narrate my major life events.

Update: Reading changelogs before deployment never sounded better!

~~~
anigbrowl
As a skilled vocal impersonator I read all my comments aloud in the voice of
Morgan Freeman before posting them. Ordinary sentiments such as 'these pickles
are quite tasty' are suddenly transformed into profound insights on the human
condition.

------
cjlars
I was wondering when CG Sir David Attenborough would get here and start
narrating my day to day.

~~~
seanhandley
Actually, when the sad day comes and he passes away I'd be very comforted to
hear his voice on new nature documentaries. I just can't watch them unless
he's narrating.

~~~
mhandley
I agree. This leads to an interesting question: can the estate of a deceased
person sell or license their voice rights for new future performances? I
suspect the law has some catching up to do.

~~~
anigbrowl
This is not qualitatively different from existing situations, and will be only
a minor legal wrinkle. Estates have been licensing the likeness of dead people
for commercial purposes for a good while, now those likenesses are simply more
sophisticated.

I'll tell you what will get complicated, copyright holders complaining that
their product was used as input for the training algorithm and demanding a
slice of any profits because they made the famous individual more famous by
casting them.

------
sna1l
Charles Schwab uses a voice phrase to authenticate you for access to your
account, which is already pretty brittle, but I hope this makes them
reconsider more urgently.

------
ksec
1\. Is this company new?

2\. Is this better then what Google or Baidu are doing?

3\. I remember reading Adobe has something similar.

4\. Why ( What happened ) that all of a sudden we have 4 company making voice
breakthrough tech like these?

5\. What Happen to Voice Acting? Places like Japan where they highly value
voice actor. Is Voice even patentable?

~~~
lordCarbonFiber
Specially to point 4, google pushed their wavenet paper a couple months ago. I
wouldn't be surprised if some, if not all of these current break throughs are
built on that foundation. This sort of application was the first thing that
came to my mind after reading the paper.

[https://deepmind.com/blog/wavenet-generative-model-raw-
audio...](https://deepmind.com/blog/wavenet-generative-model-raw-audio/)

~~~
kastnerkyle
There is an older paper [0] and demo from [1] Alex Graves that inspired a ton
of work around handwriting, and then speech. Previous work from Jose Sotelo
et. al. (including me) called char2wav [2] is a close neighbor to Graves'
approach, though he (Graves) never published the approach for speech so we
don't really know. Google's recent Tacotron paper [3] is also a relative to
these approaches.

WaveNet certainly changed the game in many ways, but approaches to TTS using
RNNs have different roots. WaveNet and friends (incl. DeepVoice and NPSS
linked elsewhere in this thread) are largely focused on audio modeling, and
generally use something closely related to the "classic" TTS pipeline for text
in the frontend. The audio modeling results are stellar, and really blew me
away personally - basically changing my perspective on what is possible in
audio modeling overnight.

RNN models try to tackle the whole problem (text + audio modeling) at once,
though currently (all?) RNN and attention style models need intermediate /
high level hints or pretraining from things like vocoder representations or
spectrograms, versus WaveNet's approach using the waveform directly. So they
are complimentary in many ways, and I am sure we will see people trying to
combine them soon - char2wav has this flavor by using SampleRNN, our lab's
take on raw waveform generation though we are still working on the fully end-
to-end from scratch training, the inference path is truly end-to-end. Though
there are still many details to work out as far as output quality, it seems
possible that this will be a productive approach (though I am quite biased).

We see similar directions in neural machine translation (NMT) moving from word
level representations to word parts or characters directly - one of the big
reasons deep learning has come so far, so fast is that a lot of techniques
from other subfields can be utilized for new domains, and I think there is a
lot more fertile ground for crossover in both directions.

Heiga Zen has a great overview talk about how speech synthesis, as a field,
overlaps between different approaches and factorizations [4]. His work on
parametric synthesis and TTS generally has laid the foundation for a _lot_ of
recent advances, and he was also a co-author on WaveNet!

[0] [https://arxiv.org/abs/1308.0850](https://arxiv.org/abs/1308.0850)

[1]
[https://www.youtube.com/watch?v=-yX1SYeDHbg&t=38m0s](https://www.youtube.com/watch?v=-yX1SYeDHbg&t=38m0s)

[2]
[http://josesotelo.com/speechsynthesis/](http://josesotelo.com/speechsynthesis/)

[3] [https://google.github.io/tacotron/](https://google.github.io/tacotron/)

[4]
[https://www.youtube.com/watch?v=nsrSrYtKkT8](https://www.youtube.com/watch?v=nsrSrYtKkT8)

------
Nadya
I see a lot of people claiming that certain things will now be untrustworthy.

As if _human_ voice imitators have not existed and could not be paid for prior
to this. For $5 you can get Stewie Griffin [0] or Barack Obama [1] to say
whatever you want them to say. Any audio-only messages of well known figures
should already be considered "compromised" and untrustworthy. Even without the
technology to impersonate them.

This should be more concerning for "normal people". It isn't that you can no
longer trust an audio-only recording of Obama, but that you may not longer be
certain an audio recording is from your best friend. (E: Once the technology
improves a bit more of course.)

[0] [https://www.fiverr.com/joe_stevens/talk-like-stewie-
griffin-...](https://www.fiverr.com/joe_stevens/talk-like-stewie-griffin-for-
you)

[1] [https://www.fiverr.com/celebimpression/do-a-custom-barack-
ob...](https://www.fiverr.com/celebimpression/do-a-custom-barack-obama-
impersonation)

------
drusepth
This is awesome. As someone exploring the fictional storytelling space, this
seems like it'd have a lot of fun applications in that space as well.

How difficult is it to create/tune voices from parameters rather than training
from an audio clip? I build software where people create fictional characters
for writing, and having an author "create" voices for each character would be
an amazing way to autogenerate audiobooks with their voices, or interact with
those characters by voice, or just hear things written from their point of
view in their voice for that extra immersion. Having an author upload voice
clips of themselves mimicking what they think that character should sound
like, but probably would keep traces of their original voice (and feel "fake"
to them because they can recognize their own voice), no?

Can't wait to see how this pans out. Signed up for the beta and will
definitely be pushing it to its limits when it's ready. :)

------
carlob
I wonder how dependent this is on language: can we make Trump speak Chinese
using a one minute audio track of him speaking English?

~~~
simcop2387
I'd imagine you might get close, but it'd probably work best if you can get
the person to use all the phonemes that you want to reproduce. That said
depending on how good it is, it might slur other phonemes together to
approximate it, which would probably work to give it the accent that the
speaker would likely have.

~~~
Sunset
It all sounded sort of slurry and muffled. Maybe if you're imitating a
naturally slurred speaker it would be more effective.

------
echelon
It sounds like they're training a parametric speech synthesis platform on
samples in order to learn the parameters. I wonder if there are are approaches
at generating n-phones for concatenative models, or using a hybrid approach.

I built a toy concatenative Donald Trump speech system [1], but I don't have
an ML background. I've been taking Andrew Ng's online course in addition to
Udacity's deep learning program in an attempt to learn the basics. I'm hoping
I can use my dataset to build something backed by ML that sounds better.

Is anyone in the Atlanta area interested in ML? I'd love to chat over coffee
or join local ML interest groups.

[1] [http://jungle.horse](http://jungle.horse)

~~~
kastnerkyle
I tried similar approaches long ago (~2 years now?) with something related to
RNN-RBM and it showed some slight glimmer of promise, and still think there
might be some clever ways to combine concatenative methods and deep learning
to avoid a lot of the noise issues present in parametric models. Then again,
maybe it just needs to train longer - it's always hard to tell. I liked
jungle.horse, awesome stuff!

------
Tloewald
This is very exciting to me because it lets RPGs provide spoken dialog for
everything (I'm waiting to see if they can do emotions at all convincingly).
Even big budget games suffer from "you can call your character anything as
long as it's 'Shepherd'" simply because you can't mention the character's name
or any other use-content safely.

------
retox
Through the tinny speaker of my mobile phone the Obama in the first sample is
almost spot on. Some speed issues with Trump but really impressive.

------
joeblau
I wonder how accurately this would reproduce dead musicians voices. I've had
this idea for about 8 years called the Notorious BIG project. I have about 20
acapellas that I was originally going to manually chop into a song. Neural
Nets can pretty much solve this now.

------
jtbayly
Can we get these speeches in audio form now?

[https://medium.com/@samim/obama-rnn-machine-generated-
politi...](https://medium.com/@samim/obama-rnn-machine-generated-political-
speeches-c8abd18a2ea0)

------
kristaps
As noted in other comments, all the samples still sound very robotic, so this
is probably "just" a method to tune the parameters of an existing voice
synthesizer to mimic a real persons voice as much as it allows.

~~~
TillE
That's exactly what it sounds like. The same old mediocre TTS with voices
modified to mimic specific well-known voices.

It's impressive for what it is, but a lot of people here seem way too excited.
This isn't any kind of breakthrough, and only the shortest hand-picked snippet
would fool anyone.

------
Ensorceled
The samples all sound a little like Rich Little and Stephen Hawking's love
child doing impressions: they won't fool very many people.

But, you can certainly see where this is going and that's the worrisome part.

~~~
dmix
I'm sure it will improve dramatically over the years. This seems to be a
problem with all digital voice software, it's never entirely human sounding.
Pretty good starting point though.

------
ageofwant
Oh yea. The Troll embedded deep in my soul giggles in glee.

However, the day some shill tries to sell me travel insurance in departed
nana's voice would be the day I start signing my voice convos' with a pgp key.

------
felipemesquita
This site has a "demo" section featuring only Soundcloud clips. Uses to much
the present tense "In a world first, Montreal-based startup Lyrebird today
unveiled" and "Record 1 minute [...] and Lyrebird can [..]Use this key to
generate anything" but has no actual product or beta version. Adobe had a much
more impressive sneak peek of a similar product called VoCo:
[https://www.youtube.com/watch?v=I3l4XLZ59iw](https://www.youtube.com/watch?v=I3l4XLZ59iw)

~~~
Gaelan
To be fair, that demo could have been staged whereas we can be pretty darn
sure those aren't Trump's actual words.

~~~
felipemesquita
I agree. My main issue with them is that those clips could have been
painstakingly produced using some very far from shipping software with lots of
manual tinkering while the copy on the site mostly reads like there's a
product out. The part about being far from shipping could also be true about
adobe's software, but I think their presented result (assuming it is real)
sounded better, and they were more honest about the stage the product is in.

------
backpropaganda
Relevant discussion from 17 hours ago:
[https://news.ycombinator.com/item?id=14177589](https://news.ycombinator.com/item?id=14177589)

------
return0
We need a new markup language for intonation and emotion.

~~~
LesZedCB
[https://xkcd.com/1709/](https://xkcd.com/1709/)

------
scibolt
Voice Actors out of business! :D

------
anigbrowl
Excellent work. This will find widespread application in the film/tv/music
industry and beyond (and we're not that far away from being able to do the
same thing for video). Unfortunately it will also be widely abused, but given
the near-inevitability of such technological development I'm already
reconciled to that :-/

------
jpsim
Curious choice to name a company & product with a name that sounds like "Liar
Bird" when spoken. To me, that looks like they're fully embracing the concept
that this can be used for nefarious purposes. If one of their goals is to
bring attention that this technology exists and can be misused, the name
reinforces that.

~~~
coldsmoke
I guess they've named it that because the lyrebird is an amazing impersonator.
The end of this BBC clip blew my mind the first time I saw it.
[https://youtu.be/VjE0Kdfos4Y](https://youtu.be/VjE0Kdfos4Y)

But you may have a point, and the ethics section makes it clear that they are
indeed very aware of that this may be misused.

------
LordKano
This is impressive. There is now a way for Morgan Freeman and James Earl Jones
to be able to narrate movies forever.

------
mericsson
Related Economist article: [http://www.economist.com/news/science-and-
technology/2172112...](http://www.economist.com/news/science-and-
technology/21721128-you-took-words-right-out-my-mouth-imitating-peoples-
speech-patterns)

------
sehugg
Sounds great, I was trying something like this in Keras but didn't get very
far:
[https://github.com/sehugg/kerasspeechcodec](https://github.com/sehugg/kerasspeechcodec)

------
augustt
Any ideas on what the underlying technology looks like? Maybe some kind of GAN
for audio...

------
ParadisoShlee
The audio feed sounds like they're real and drunk.. so that's impressive

------
bisRepetita
1\. Buy the rights for "Car Talk" re-broadcast. 2\. Record new, current ads
using Click and Clack's voices. 3\. If the voices sound a little too
"mechanic", pretend it's a joke.

------
dyu-
This trump version [1] is quite believable. [1]
[https://soundcloud.com/user-535691776/trump-6](https://soundcloud.com/user-535691776/trump-6)

------
hayd
And just as my bank offers a "login via speaking" option. Lovely.

------
cocoa19
This technology reminded me of 24 (TV series).

The plot of season 2 has Jack Bauer prove a Cyprus recording between a
terrorist and high-ranking Middle East officials was forged so the US
president would start a war.

------
koolba
The President Obama voice sounds decent. But the President Trump and Senator
Clinton voices sound like robots. Reminds me of the crappy text to speech
program that came with Windows.

------
vermontdevil
Coming soon - fake videos of future political candidates saying outrageous
things that will derail their campaigns.

Maybe from now on - just learn ASL. Hard to fake a distinctive signing style.

------
Markoff
it's interesting development but it sounds too robotic, there is zero
intonation/punctuation, zero variantions in the voice depending on mood of
speaker, etc., in the end extremely robotic and if someone really need to fake
someone else voice convincingly it would be still easier to hire professional
voice imitator

------
inetknght
Site doesn't load at all on my machine without some javascript from Cloudflare
for Ajax.

I guess this product isn't for me then.

------
gwbas1c
Now we can't trust the news anymore. In a year or two we'll never know if
recordings are real or not.

------
Sunset
Now make it say the Navyseal copypasta with Trump's voice, but make him speak
slowly and with emphasis.

------
abetusk
Does anyone know of any free/open source alternatives to this? Is it too new
to expect a FOSS library?

------
mzzter
Trump 6 speaking "... my intonation is always different" sounds very
convincingly human.

------
w8rbt
___“Believe only half of what you see and nothing that you hear.”_ __\-- Edgar
Allan Poe

------
olleromam91
So all my voice commands can be recorded and my voice can be replicated.
Cool...i guess

------
leke
OMG I want to play with this so bad.

------
wirddin
If they can pull this off with the API, this is worth millions of dollars on
the table.

~~~
olegkikin
Until someone implements it in a few lines of Tensorflow and posts it on
Github.

------
nerfhammer
Hello. My name is Werner Brandes. My voice is my passport. Verify me.

------
rajacombinator
Wow had no idea something like this was possible. Very impressive.

------
theemathas
It's a matter of time before this can compete with Vocaloid.

------
mod
Does the API get better results with more training data?

------
weenkus
A bit scary thinking someone could do this with ease.

------
hoodoof
It feels like the future has arrived.

------
simlevesque
Great stuff ! Respect from the 514.

------
gator-io
So much potential for mischief!!

------
xumx
Be right back (Black Mirror)

let's do it.

------
rglover
This is fucking terrifying.

------
selbekk
Scary.

------
kkotak
RIP Dan Castellaneta.

------
backpropaganda
[deleted]

~~~
IanCal
They both sound right to me.

------
redsummer
I wonder if you could do this with singing? Feed it acappela Bowie, Sinatra,
Elvis songs, then give it new text, and out comes a similar voice and melody.

------
redsummer
I can't wait for Richard Burton to read me the news.

------
ChairmanPao
Now people can deny saying things caught on tape. Just show this technology to
a jury considering taped evidence, and bring in some experts to testify on how
it works.

The samples weren't that convincing to me, but could probably be used to
switch a word here and there. That may be enough.

------
lucidrains
Lol, I totally called this.

------
amarant
They lost me at "... Consumers are still not lining up to buy EV's"

What the fuck are they talking about?

------
afinlayson
This is how a lot of tech companies make proper text2speech, this was just
done using the vast amount of audio that's out there for these people.

Soon Trump will use this to state that things he's said are fake news. God
help us all.

~~~
Gaelan
They claim they only needed one minute of sample audio.

We really need to start requiring that all public announcements (news, press
releases, etc) are digitally signed and put into the blockchain.

~~~
ionwake
But who would sign it?

~~~
Gaelan
Trump. We'd need to teach people to assume anything unsigned is fake.

------
stefek99
I have two domains:

\- legalscreenshot.com

\- legalprintscreen.com

I also developed a concept of "Reality Check" similar to Touring Test (when VR
and AI becomes so convincing >50% people won't distinguish it from base
reality)... Too bad I'm on the corporate network and my personal website is
blocked: [https://genesis.re/wiki](https://genesis.re/wiki)

Aside: do you believe psychedelics should be the part of obligatory astronaut
training?

