
ZeroSpeech Challenge 2019: TTS without T - ivan_ah
https://zerospeech.com/2019/
======
ivan_ah
The final paper submissions deadline was April 5 2019, so we can check the
results now:

[https://zerospeech.com/2019/results.html](https://zerospeech.com/2019/results.html)

^ click on the green (+) to see details for each submission including audio
samples reconstructed.

------
vojta_letal
Super interesting. Yet IMO kids have more than a text available. They have
several senses, most importantly their eyesight. Hence this comparison does
not seem to be on spot.

~~~
jacobush
But blind kids also learn quickly to talk

~~~
jcoffland
They have touch, smell and taste.

------
ChuckMcM
Evil much? This is, as described, a way to impersonate anyone else on the
phone. I can see the abuse of that capability combined with caller ID
spoofing, a crook could drain the bank accounts of half the elderly population
of any developed country.

~~~
ozzmotik
so could anyone with a predilection towards voice acting and a proper
understanding of how to imitate others. one way or another, bad things are
going to happen because there will always be people who will use their skills
or tools for purposes other than what they are intended for. i don't think
it's a valid argument to dismiss the utility and power of something just
because people might do what people have always done.

~~~
ChuckMcM
I see it as the difference between APTs and script kiddies. Anyone with an
investment of time, and a certain minimum of skill, can reconnoiter a system,
identify a weakness, exploit it for access and do bad things. The level of
investment is high, so the expected reward has to be worth it. Once exploits
are packaged into a piece of code you can just install and run and have it
break into random systems, the investment is low so you for systems vulnerable
to those attacks you get many many more incidents.

Finding the requisite voice acting skills, training for the voice in question,
and then identifying a number and calling it is a lot of investment in time
and money and resources. (voice actors aren't cheap, especially if you're
asking them to defraud someone's grandparents)

But if this project is successful in being able to mimic a person's voice
saying what ever you want it to say, then anyone hanging around a theme park
could capture voice signatures, and relations, and then turn around and
exploit the same for fraud without anyone else getting involved.

To me, that seems like it would open up a large attack surface that is
currently poorly defended at best.

