There are many interesting advances that Deep Voice paper and implementation make but the part I'm excited by (and which might be transferable to other tasks that use RNNs) is showing that QRNNs are indeed generalizable to speech too - in this case in place of WaveNet.
"WaveNet uses transposed convolutions for upsampling and conditioning. We find that our models perform better, train faster, and require fewer parameters if we instead first encode the inputs with a stack of bidirectional quasi-RNN (QRNN) layers (Bradbury et al., 2016) and then perform upsampling by repetition to the desired frequency."
QRNNs are a variant of recurrent neural networks. They're up to 16 times faster than even Nvidia's highly optimized cuDNN LSTM implementation and give comparable or better accuracy in many tasks. This is the first time that it has been tried in speech - to see them note the advantages hold across the board (better, faster, smaller) is brilliant!
If you're interested in technical details, our blog post provides a broader overview and our paper is available for deeper detail.