
The Common Voice Project by Mozilla reached its first goal: 1k hours in englisch - stergro
https://voice.mozilla.org/en/datasets
======
stergro
The project wants to build up a dataset to train neural networks for speech
recognition software. The first goal for every language is collecting 12000
hours. They have reached this, but they only release a dataset twice a year,
that's why you still see 1100 hours at this download.

Other languages are looking good too. Germany has almost reached 600 h, French
almost 500 h and the website was localized to many other small and big
languages in the last year.

The main factor to build a good dataset is A having a diverse group of donors
and B having enough sentences. So if you want to help you can either donate
your voice, validate other voices or collect more sentences to record. All
three tasks are equally important to create a good dataset.

[https://voice.mozilla.org](https://voice.mozilla.org)

