ZeroSpeech Challenge 2019: TTS without T

ivan_ah · on Feb 1, 2019

Datasets info:

- The Voice Dataset contains one or two talkers, for around 2h of speech per talker. It is intended to build an acoustic model of the target voice for speech synthesis.

- The Unit Discovery Dataset contains read text from 100 speakers, with around 10 minutes talk from each speaker. These are intended to allow for the construction of acoustic units.

That's not a lot of audio to learn a while language system, so some breakthroughs will be needed to make this work.