Well, it is actually very demanding. ASR systems usually work with the speed of ...

nl · on June 3, 2015

This isn't entirely accurate. Or rather, it is accurate as far as it goes, but doesn't tell the whole story.

Training a neural network uses a lot of computational power. From memory I think training the Android voice recognition was weeks of training on Google's GPU cluster ([1] talks about 95 hours for partial training, but I don't think that's the production system).

However, once the network is trained it doesn't use much power at all. The trained network can run a mobile phone, and it doesn't even drain the batteries much.

[1] http://static.googleusercontent.com/media/research.google.co...

_r5wf · on June 3, 2015

I was mentioning about run time operations, not training. Yes training DNNs are much more time consuming, but my point is, using them is also not cheap. As mentioned, processing 1 second of speech, lets say in 0,5 seconds is expensive. Considering a web search is done in sub millisecond time. of course I assume speech recognition is done in server side.

jdiez17 · on June 3, 2015

Very interesting links, thanks for sharing. I'm not sure if you're familiar with Android's speech recognition, but it seems to work offline as well. I wonder if they offload the computation to their servers when you're online and compute it locally when you're not. However the latency seems to be on the same order of magnitude.

_r5wf · on June 3, 2015

Yes it works off line and it is a work of marvel IMO. Seems like all work is done in the phone when you are offline. And it performs close to the server counterpart. Latency is probably because of the nature of the live ASR processing. System cannot recognize word sequences immediately, just as humans.

There is a paper from Google on the issue:

http://static.googleusercontent.com/media/research.google.co...