Looking to convert text to audio (narration) but don't want the narration to sound robotic. What's the best-in-class ML model that does this kind of synthesis?
Currently, there isn't any perfect TTS system that almost sounds like a human and if you are looking for good TTS services then neural voices from Google, IBM, and Amazon might be promising since they are trained on recent DeepLearning technologies. But they are pretty monotonous although you can define the speech prosody using SSML tags but the results are not good.
https://cloud.google.com/text-to-speech (select voicetype: wavenet) https://text-to-speech-demo.ng.bluemix.net
And if you are looking to train your own model then [tacotron 2](https://ai.googleblog.com/2017/12/tacotron-2-generating-huma...) will be a good start. https://github.com/mozilla/TTS