Hacker News new | past | comments | ask | show | jobs | submit login
Audio Datasets for Machine Learning (lionbridge.ai)
55 points by TakakiTohno 10 days ago | hide | past | web | favorite | 3 comments





IMHO the speech dataset list is missing other interesting free corpora, e.g. the TEDlium dataset, Voxforge, Common Voice. A more comprehensive (but not complete) list can be found here: https://github.com/kaldi-asr/kaldi/tree/master/egs (download links can be found in the scripts)

Also see the "Heidelberg Spiking Datasets": https://ieee-dataport.org/open-access/heidelberg-spiking-dat...

Spoken Wikipedia corpus is especially impressive



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: