Oh it's not "hard" to get training data, you just need loads of money to buy the...

dharma1 · on Oct 24, 2017

or have more effective ways to collect tons of open source speech data. The Mozilla Common Voice project is really cool, but they should make it way easier for people to contribute.

Like, adding a mic button for voice search next to their main search toolbar on Firefox, and then ask for permission to use that data for research.

adrianbg · on Oct 24, 2017

Hah sure. By "hard," I meant that it's the largest hurdle. And probably even the common research datasets aren't enough to give you results competitive with Google, etc. AFAIK Google uses its own hand-transcribed data.