
Deep Speech 2: End-To-End Speech Recognition in English and Mandarin - sherjilozair
http://arxiv.org/abs/1512.02595
======
dplarson
During the GPU Technology Conference (GTC) 2015, Andrew Ng showed a live demo
of Deep Speech (1?) [0] (demo starts ~41 minute mark). There are other videos
showing Deep Speech, but I found this one the most useful/interesting (of the
ones I've seen).

[0]
[http://www.ustream.tv/recorded/60113824](http://www.ustream.tv/recorded/60113824)

~~~
ilurk
non-flash version of the video

[https://www.youtube.com/watch?v=qP9TOX8T-kI](https://www.youtube.com/watch?v=qP9TOX8T-kI)

------
svantana
The results are comparable to human transcribers, they note -- which is more a
testament to the low quality of Mechanical Turk work than the high quality of
this system. Surely a word error rate of 8% (for clean speech) would be
unacceptable for a paid transcription service?

~~~
mabbo
I suffer from auditory dyslexia (not officially diagnosed, but it runs in my
family and I have all the symptoms, etc). I have a slightly lower than average
word recognition rate, especially if I can't see the speakers lips. Yet I get
by, because usually the context and what words are expected are enough.

This makes me ask two questions: #1- Do systems like this _need_ the court-
reported word recognition rate in order to be useful? Or can they compensate
for mistakes by using the context? #2- Could we improve these systems by also
feeding them video of the speakers lips?

Maybe I should go do a masters to figure out the answers.

~~~
robrenaud
Here is a paper that audio and video for speech recognition, and they find
that video helps especially in noisy environments.

[https://www.uni-
ulm.de/fileadmin/website_uni_ulm/allgemein/2...](https://www.uni-
ulm.de/fileadmin/website_uni_ulm/allgemein/2014_iwsds/iwsds2014_lp_receveur.pdf)

~~~
mabbo
Awesome!

Noisy environments are exactly when seeing the lips is a huge deal for me. I
have a friend who has a tendency to absent-mindedly place his hand in front of
his mouth. In a quiet office or home, no issue. In a bar? He pressed mute, as
far as I'm concerned.

------
espadrine
Will they publish code and learned weights for English and Mandarin?

One great impact of the Deep Dream team contributing tools and libraries to
the community was a wealth of applications from a wide variety of people.

~~~
robohamburger
If they do it might make projects like
[https://jasperproject.github.io/](https://jasperproject.github.io/) a lot
nicer. It sounds like at least in their demo they were using a top of the line
GPU to run the network though.

------
oska
Previous discussion of an article about work by this team:

[https://news.ycombinator.com/item?id=10358072](https://news.ycombinator.com/item?id=10358072)

------
melling
About an hour in, near the end, Ng addressing the stupid digression that the
industry recently had about evil AI destroying the world.

~~~
sp332
An hour into what?

~~~
haarts
The linked movie up there.

