This is cool from the tech vantage point. But the labor-sympathizer in me would like to see a person employed to do it rather than licensing the right to their image.
They make it sound like they invented their own TTS model or something. I wonder, did they actually, or is this Eleven Labs, some other API, or Style TTS2 or something?
I mean, it seems like there are a ton of papers for attempts at realistic TTS, but hard to find something really equivalent to the Eleven Labs voice clone that doesn't have a non-commercial restriction on the weights or code. Maybe they really did train a model from scratch?
It is a really big company. Maybe they have resources for that.