Deep Neural Networks for Acoustic Modelling and Speech Recognition [pdf] (2012)

radarsat1 · on March 8, 2016

What is the current state of open source speech recognition?

It would be immensely useful to be able to run a monologue or dialog wav file through a program and get more-or-less good text, even if there are some errors. As far as I know this is still quite a difficult problem, requiring an immense amount of data, and good language models, but I wouldn't be surprised if today there are some pre-trained models available that can be run using one of the many machine learning Python toolkits?

smhx · on March 8, 2016

This tracks the current state of speech recognition. https://github.com/syhw/wer_are_we

amelius · on March 8, 2016

Cool. We need more projects like this!

praccu · on March 8, 2016

Kaldi is the best, in my opinion. It has a few open acoustic models. Unfortunately, speech is just hard; acoustic models are onlying one part of these systems.

dharma1 · on March 8, 2016

This looks good http://gitxiv.com/posts/WmodZtsPAWE2cEw8f/eesen-end-to-end-s...

cfcef · on March 8, 2016

Given the Baidu work, I think we can safely say that Hinton et al's forecasts 4 years ago were on the money. Deep approaches are now clearly dominant and have yielded fantastic performance.

praccu · on March 8, 2016

Huh?

The linked paper is 4 years old. DNNs have been dominant in speech since 2012. No one uses GMM systems anymore.

Baidu's approach isn't even the best (IBM's system tends to beat theirs on accuracy, and google tends not to publish numbers on known benchmarks), it's notable mostly for its use of RNNs to do pronunciation and language modeling (although they also tack on a mod-KN LM).

gok · on March 8, 2016

Needs a (2012) tag

Jasamba · on March 8, 2016

there you go. :)