
I've used Amazon's text-to-speech API to create voice-over for my video - rayalez
https://www.youtube.com/watch?v=dGx-IVI3Cg8
======
rayalez
I have trouble speaking, I don't really like my voice and accent, so I decided
to try using Amazon Polly and see if it helps me to create a watchable video.

It ended up working surprisingly well. I have used reveal.js to generate
slides, and PhantomJS to automatically render them into images. Then I've
edited them together using Kdenlive. The entire process of turning my article
into a video took a couple of hours.

I think it turned out pretty cool and I figured I'd share it with you guys.
It's imperfect, but it's really impressive how far text-to-speech engines have
come.

At this point it still probably makes more sense to spend $10-$20 on fiverr to
create a professional voiceover, but I can totally see how in the near future
it would be possible to automatically generate high quality videos.

I think I'm onto something here, maybe I'll make a SaaS tool out of this. Very
cool stuff.

~~~
ineedasername
Fascinating. Is the api simply text-in to audio-out, or can the text be marked
up for composable with something like SSML? The ability to insert appropriate
paused for pacing, change intonation, etc would be very useful.

~~~
ineedasername
Oh wow, they go further than SSML, with visemes and their phoneme maps as
well. [0] It's a shame there doesn't seem to be a quality IDE for such things.

[0]
[http://docs.aws.amazon.com/polly/latest/dg/speechmarks.html](http://docs.aws.amazon.com/polly/latest/dg/speechmarks.html)

