
Recurrent Neural Networks - A Short TensorFlow Tutorial - mrubashkin
https://github.com/silicon-valley-data-science/RNN-Tutorial
======
sillysaurus3
_If you have a NVIDIA GPU with CUDA already installed_

It's unfortunate OpenCL mostly failed to gain mindshare. It would've let
anyone with a powerful video card enjoy the benefits of acceleration, rather
than half the market.

Cool tutorial. Thanks for sharing!

~~~
nikcub
I don't think anybody is opposed to Tensorflow on OpenCL, it's just that CUDA
is so common.

You can follow the issue to add OpenCL support here:

[https://github.com/tensorflow/tensorflow/issues/22](https://github.com/tensorflow/tensorflow/issues/22)

and one of the projects here:

[https://github.com/benoitsteiner/tensorflow-
opencl](https://github.com/benoitsteiner/tensorflow-opencl)

From what I understand it requires Linux at the moment since it is built on
ComputeCpp

~~~
ReverseCold
Tensorflow and Theano themselves only really support Windows. Unless you have
a spare 5 hours to set up (and more hours to maintain), you NEED Linux for any
serious ML.

Note: I'm just a beginner at ML, but this was my experience setting things up
for the first time.

~~~
nl
_Tensorflow and Theano themselves only really support Windows._

I think you mean "only really support Linux"? The rest of your comment reads
like you know that.

TensorFlow at least has now begun supporting Windows in the main release, but
you are absolutely correct in saying it has much better support on Linux.

------
nl
This is very good - I've never seen a RNN-for-speech-in-Tensorflow model
before.

Note, though that the real problem here is the lack of training data.

In a recent podcast I heard that the Baidu speech recognition team uses "small
models" of 10,000 hours of speech. I forget how big the production quality
models were, but it was at least 5 times that.

This model uses ~1500 hours[1]. It's very impressive it does as well as it
does just using that.

[1] [https://svds.com/tensorflow-rnn-tutorial/](https://svds.com/tensorflow-
rnn-tutorial/)

~~~
woodson
Mozilla Deepspeech has been around for a while
([https://github.com/mozilla/DeepSpeech](https://github.com/mozilla/DeepSpeech)).
In fact, this code looks like it heavily copied from there (see attribution
notices in comments).

Their examples use much less data, just 5 utterances from the Librispeech
training set. Which is perfectly fine for a tutorial, since training on 1500h
worth of speech data takes from several days to multiple weeks, depending on
your hardware.

[edit: IMHO, the tutorial from the Bay Area DL School is more useful to get
started: [https://github.com/baidu-research/ba-dls-
deepspeech)](https://github.com/baidu-research/ba-dls-deepspeech\))]

~~~
posterboy
[https://github.com/baidu-research/ba-dls-
deepspeech](https://github.com/baidu-research/ba-dls-deepspeech) is 404

------
aetherspawn
Is there a way you can decode arbitrary wav files by cloning the repo after
you train it? I couldn't find out whether it was capable by reading the
tutorial and README.

~~~
mrubashkin
Hey Aetherspawn, the current repo does not currently have code for decoding
individual fed in .wav files that are not in the train/dev/test sets. We'll
polish up our code that simplifies decoding and add it to the repo soon, then
shoot you a message

------
iplaman
Thank you this is very interesting. I wonder about your initial train config,
wouldn't it be more efficient with time in mind for demo purposes, to use more
wav samples with less epochs?

~~~
mrubashkin
I agree that it would be more efficient to have more wav files in the github
repo, but we kept them minimal to reduce the total file size when cloning the
repository. You can find more of the Librispeech data here:
[http://www.openslr.org/12/](http://www.openslr.org/12/)

We kept the epochs at 100 to demonstrate the negative consequence of
overfitting training data, when doing the test or dev set evaluations. We
could probably reduce that to ~50 though to save time in training :)

------
abainbridge
Is this speech-to-text or text-to-speech or something else?

