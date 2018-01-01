Hacker News new | comments | show | ask | jobs | submit login
Show HN: Automatic Speech Recognition in TensorFlow (github.com)
Show HN: Automatic Speech Recognition in TensorFlow (github.com)
131 points by zzw922cn 10 months ago | 12 comments



Just skimming the code, this is an impressive piece of work, probably one of the most comprehensive open source ASR implementations using deep learning I've seen yet. It certainly looks correct based on the careful documentation (English comments would also help), but the most important part is providing a pre-trained model. Validating TF code can be an expensive and painstaking process, and if you have the weights, then sharing them proves that your net works. Saying this seems obvious, but very few open source implementations actually take that step, for whatever reason (not sure why).


I suspect that the size of the weights file might have to do something with that. 500M+ is not strange and if enough people hit your small VPS or Amazon account that can either cause you to be rate capped, go down or go broke.

It would be nice if a universal 'model+dataset+weights' sharing service would spring into being.


Seems like this is a perfect place to use Bittorrent: a large, desirable file for which you own the distribution permissions.


Yeah, that seems really obvious now that you say it. Not sure if there is such a thing as a cheap seedbox around.


If you quantize the model, you already get a factor of 8 reduction in model size, and GitHub Releases allows files up to 2 GB.


>It would be nice if a universal 'model+dataset+weights' sharing service would spring into being.

I've asked for this multiple times and yet every time someone comes out of the woodwork saying it's been done or that nobody would use it.

People usually link to the Caffe model zoos or something like this (potentially unsafe according to Chrome) https://www.gradientzoo.com/


The weights are also useful to build on, as an initialisation for a related task, or to look for semantic vectors without having to do the (often lengthy) training oneself.

I second your call for more publishing of trained models as well as source code and applaud the OP for doing so.

Great post.


I know nothing about ML, but I'm willing to read manuals, write scripts, and deal with tedious technical stuff; is it feasible to use[1] this without really understanding how it works?

[1] "Use" being defined as having short spoken phrases[2] trigger my scripts. [2] I'm willing to accept significant restrictions on the nature of these phrases, such as intentionally making them sound very different.


Alternatives for the reference:

https://github.com/mozilla/DeepSpeech

https://github.com/pannous/tensorflow-speech-recognition

https://github.com/buriburisuri/speech-to-text-wavenet


Nice! Just curious, is the git repo all one needs in order to run text to speech on his PC? How accurate is this thing? Is there a demo somewhere perhaps?


Good job! How much time did it take to train this model? I've heard TensorFlow is slow comparing to the other toolkits.


how does one evaluate an utterance?




