
DeepMind introduces ‘EATS’: adversarial, end-to-end approach to text-to-speech - Yuqing7
https://syncedreview.com/2020/06/09/deepmind-introduces-eats-an-adversarial-end-to-end-approach-to-tts/
======
ma2rten
It seems like the abstract is a little bit misleading. WaveNet is almost
scoring half a point better on a scale from 1 to 5. Tacotron 2 [1] is scoring
even better and reporting a score which is almost human-like:

    
    
      EATS          4.083
      WaveNet       4.41
      Tacotron 2    4.53 
      Human Speech  4.55
    

It seems like a stretch to me to say it's "a comparable to the state-of-the-
art models".

[1]
[https://google.github.io/tacotron/publications/tacotron2/ind...](https://google.github.io/tacotron/publications/tacotron2/index.html)

~~~
quantumwoke
You can't compare the scales directly across different papers because it's a
subjective Mean Opinion Score (MOS) in 0.5 increments, usually taken from a
different Mechanical Turk sample. It's a shame that the EATS paper didn't do
their own MOS for many of the standard models. However, as pointed out in the
paper the real benefits here are in reduced inference/training time and
improved attention learning which many of the current models struggle with.

~~~
ma2rten
EATS and Wavenet are from the same research group. EATS and Tacotron have
almost the same score for the human sample. None of the papers said that they
used Mechanical Turk. Google has it's own internal MT-like service which is
probably more consistent.

------
TulliusCicero
> more recently saw its self-taught agents thrash pros in the video game
> StarCraft II.

Clarification: DeepMind never fully 'beat' SC2 pros the way they did in Go.

Their AI was super impressive, certainly far beyond any other that has been
developed. But they only went so far as consistently doing 'okay' against top
tier players on the ladder, and that with some significant caveats to their
performance. In particular, it's debatable to the extent that they're
overpowering some players through sheer APM, rather than superior tactics or
strategy. Yes, this is true even after they put in certain restrictions.

And given their recent silence on the issue, it looks like they may have given
up, which would be unfortunate.

~~~
loufe
No, it's not true. If you followed closely their APM was hamstringed to the
point of being lower than players in many cases. In many of the playbacks I've
seen it is the strategy that helped them win. Though when they lose against
grandmasters it is often obviously poor decisions caused by a rabbit hole of
thinking.

~~~
ajuc
The APM limit was averaged over longer periods of time. So AI "saved" the APM
during boring parts of the game and then used it up all in a short bursts of
over 1000 APM during crucial battles. It was especially visible during the
blink-stalker Protoss vs Protoss against MaNa (very micro-intense matchup).

Blink stalkers are very cost-effective units that are balanced by being very
hard to micromanage perfectly. It's basically impossible over large area
(during battles that take more than 1 screen) because it's pseudorandom which
of your stalkers is targetted and needs to blink in any given moment, and you
need to move the screen to that place to blink the stalker in time. AI knows
health of all units without the need to move the screen to see it so it knows
where to move the camera in advance.

AI perfectly micromanaged full limit army of blink stalkers during a battle
that was 3-screens big. No human could physically do it with the current
starcraft UI.

It's like chess pieces were 100 kg each and AI was steering an industrial
robot :) Interesting, but it's mostly measuring stuff that's not about AI :)

In the next game MaNa went into immortals (units that specifically exist to
counter stalkers in pvp), and still lost, because AI blink stalker micro was
just so impossibly good.

~~~
YeGoblynQueenne
>> The APM limit was averaged over longer periods of time. So AI "saved" the
APM during boring parts of the game and then used it up all in a short bursts
of over 1000 APM during crucial battles.

To be fair to AlphaStar, figuring out that it's a good idea to do this is not
trivial. You could say it learned a kind of exploit (of the artificial
restrictions placed on it).

~~~
ajuc
AIs tend to find exploits of the rules instead of actually doing what we want
them to, it's pretty common.

~~~
loufe
I suppose that is a part of the human understanding of neural networks. When
you take out the human perspective of humans, we do pretty much the same
thing.

------
seedless-sensat
Speech samples in the DeepMind publication post:
[https://deepmind.com/research/publications/End-to-End-
Advers...](https://deepmind.com/research/publications/End-to-End-Adversarial-
Text-to-Speech)

~~~
scotth
"Ablation: No Monotonic Interpolation" is my favorite — wait for it...

~~~
lunixbochs
I ran into that with tacotron sometimes too

------
thom
What's the state of the art in TTS that might conceivably be runnable on a
modern, offline mobile device?

~~~
ekelsen
WaveRNN. [https://arxiv.org/abs/1802.08435](https://arxiv.org/abs/1802.08435)

~~~
StavrosK
Is there any implementation? I'd love to be able to run something on my
computer and generate speech.

~~~
ekelsen
The closest open source project is probably
[https://github.com/mozilla/LPCNet/](https://github.com/mozilla/LPCNet/)

------
The_rationalist
It seems that contrary to their sota Baseline, at least for English, text to
speech is a solved problem (on their metric MOS)
[https://paperswithcode.com/sota/speech-synthesis-on-north-
am...](https://paperswithcode.com/sota/speech-synthesis-on-north-american-
english) The SOTA has a 4.526 MOS vs their 4.084 and baseline of 4.41 and
human performance is 4.55

------
The_rationalist
They don't seems to have improved the state of the art :/ But it might be a
promising new research direction.

[https://paperswithcode.com/paper/end-to-end-adversarial-
text...](https://paperswithcode.com/paper/end-to-end-adversarial-text-to-
speech)

------
hortense
It's crazy that Google is doing all that research and giving it for free to
all their competitors.

~~~
garmaine
So is Baidu.

The whole industry benefits from this sharing of knowledge.

Also, the best aren't going to work where they can't publish.

~~~
heavenlyblue
Also true: Google doesn't really have competitors. It only justifies their
monopoly.

------
theblackcat1002
Arxiv pdf link :
[https://arxiv.org/pdf/2006.03575.pdf](https://arxiv.org/pdf/2006.03575.pdf)

------
PaulHoule
I love the block diagram of the thing.

------
sabujp
amazing as always, the current home tts is pretty good now as well

~~~
visarga
What is 'current home tts', is it any kind of TTS model you can run on your
computer?

I'm wondering why MacOS which used to have a superior set of TTS voices didn't
implement any of the new neural voice engines. It's been years since Tacotron
and WaveNet, and still the same crappy system voices.

~~~
forgingahead
Speculating, but perhaps they meant "the current Google Home TTS".

~~~
sabujp
yes, thanks for the clarification :)

------
The_rationalist
Could that be applied to speech to text too?

