Hacker News new | past | comments | ask | show | jobs | submit login

It's in the abstract. "... For the audio synthesis model, we implement a variant of WaveNet that requires fewer parameters and trains faster than the original ..."[1]

[1]: https://arxiv.org/abs/1702.07825




Disclosure: I'm one of the co-authors of the QRNN paper (James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher) produced by Salesforce Research.

There are many interesting advances that Deep Voice paper and implementation make but the part I'm excited by (and which might be transferable to other tasks that use RNNs) is showing that QRNNs are indeed generalizable to speech too - in this case in place of WaveNet.

"WaveNet uses transposed convolutions for upsampling and conditioning. We find that our models perform better, train faster, and require fewer parameters if we instead first encode the inputs with a stack of bidirectional quasi-RNN (QRNN) layers (Bradbury et al., 2016) and then perform upsampling by repetition to the desired frequency."

QRNNs are a variant of recurrent neural networks. They're up to 16 times faster than even Nvidia's highly optimized cuDNN LSTM implementation and give comparable or better accuracy in many tasks. This is the first time that it has been tried in speech - to see them note the advantages hold across the board (better, faster, smaller) is brilliant!

If you're interested in technical details, our blog post[1] provides a broader overview and our paper is available for deeper detail[2].

[1]: https://metamind.io/research/new-neural-network-building-blo...

[2]: https://arxiv.org/abs/1611.01576




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: