
Deep Neural Networks for Acoustic Modelling and Speech Recognition [pdf] (2012) - Jasamba
http://research.microsoft.com/pubs/171498/HintonDengYuEtAl-SPM2012.pdf
======
radarsat1
What is the current state of open source speech recognition?

It would be immensely useful to be able to run a monologue or dialog wav file
through a program and get more-or-less good text, even if there are some
errors. As far as I know this is still quite a difficult problem, requiring an
immense amount of data, and good language models, but I wouldn't be surprised
if today there are some pre-trained models available that can be run using one
of the many machine learning Python toolkits?

~~~
smhx
This tracks the current state of speech recognition.
[https://github.com/syhw/wer_are_we](https://github.com/syhw/wer_are_we)

~~~
amelius
Cool. We need more projects like this!

------
cfcef
Given the Baidu work, I think we can safely say that Hinton et al's forecasts
4 years ago were on the money. Deep approaches are now clearly dominant and
have yielded fantastic performance.

~~~
praccu
Huh?

The linked paper is 4 years old. DNNs have been dominant in speech since 2012.
No one uses GMM systems anymore.

Baidu's approach isn't even the best (IBM's system tends to beat theirs on
accuracy, and google tends not to publish numbers on known benchmarks), it's
notable mostly for its use of RNNs to do pronunciation and language modeling
(although they also tack on a mod-KN LM).

------
gok
Needs a (2012) tag

~~~
Jasamba
there you go. :)

