
A TensorFlow Implementation of DeepMind's WaveNet Paper - ot
https://github.com/ibab/tensorflow-wavenet
======
jonnycowboy
Do you know how long it took to train using that dataset and with what
hardware configuration?

------
teddyknox
Doing a forward pass for every sample sounds like it would be prohibitive for
real-time applications.

~~~
nicklo
It absolutely is. DeepMind reported that 1 second of audio generation takes
about 90 minutes to generate.

~~~
throwawaymsft
Assuming it's computation bound, it's a factor of 5400 (~13 doublings in CPU
power required to get to real-time, assuming no algorithmic improvements).

~~~
mattnewton
Do they mention it was CPU trained? I assumed GPU. If it was CPU trained, I
wonder what the operations keeping it off the GPU were?

~~~
Houshalter
Google has special neural net ASICs now.

~~~
ogrisel
Google never stated they use those to train models as far as I know. It seems
that they are primarily used to spare energy when deploying trained models at
scale.

~~~
Houshalter
Theres no reason they couldn't use them to train, as long as they can account
for the lower precision operations. I think it would be much cheaper to train
on them, at that scale anyway.

~~~
dharma1
Afaik the Google TPU does inference only, at 8 bits. I don't think it's
possible to train a neural network at 8 bit precision at this point in time.
FP16 works for training though, and is twice as fast as FP32 on certain nvidia
chips

~~~
Houshalter
Backpropagation can work with any precision, as long as you use stochastic
rounding (so that the rounding errors are not correlated.) Without stochastic
rounding even 16 bits will have rounding error bias.

[http://arxiv.org/abs/1412.7024](http://arxiv.org/abs/1412.7024)

~~~
dharma1
OK. I was going by this - [https://petewarden.com/2016/05/03/how-to-quantize-
neural-net...](https://petewarden.com/2016/05/03/how-to-quantize-neural-
networks-with-tensorflow/)

I haven't seen 8bit training implemented in any (public) frameworks yet -
that's not to say it's not possible. If it works then that's great, especially
for specialised hardware.

------
alexbeloi
I had plans to do the same, I'm glad somebody beat me to it.

~~~
Kenji
That is an admirable mindset - I cannot help but be a bit frustrated when
someone independently implements my ideas first. On the one hand, it validates
the idea. On the other hand, building it would be fun, but now that someone
else did it, doing it again would be akin to reinventing the wheel and it is
more productive to turn your attention to something new (unless you think you
can execute the project much better).

~~~
vintermann
In this case, it's DeepMind's idea anyway :)

I get more disappointed when the opposite happens. I think something like,
"Yeah, I'm totally going to add support in torch for noisy activation
functions like in this paper!"
([https://arxiv.org/pdf/1603.00391.pdf](https://arxiv.org/pdf/1603.00391.pdf)).
Then I procrastinate and put it off. Then I think, "No matter, someone else
has surely done it by now". Then they haven't.

------
blaurence5
Here's a Theano implementation: [https://github.com/huyouare/WaveNet-
Theano](https://github.com/huyouare/WaveNet-Theano)

------
Dowwie
I wonder whether accents could be layered on top of the trained data?

~~~
aab0
Probably. You can add in 'speaker' as a bit of metadata to the samples (this
is what is meant by 'conditioning on') and teach it to speak like different
people, so if you have a diverse sample of speakers and you add in 'accent' as
another variable, it might well learn to disentangle individual speakers from
their accents and then you can control generated accents by changing the
metadata.

------
sargun
I'm really interested in building an RNN and training it against something
like movies. I'd love to then take it and input a song and translate it to a
music video that's a composite. I'm also interested in the legal ramifications
of doing such a thing...

Does anyone know of prior art?

