This basically already exists. Siri and similar TTS voices today are generated off of a lot of recorded speech from a person. There's a lot to get right for it to sound natural, not just hit the phonemes. You have to deal with the transitions between phonemes, declination, etc.
I've even seen a demo converting one person's voice to another (without going through text) trying to preserve the pattern (pauses, stresses, etc.). It was kinda cool, but you wouldn't think it was the other person in a genuine way.
On the positive side you might end up with a Culture type situation where it's impossible to blackmail anyone due to it not being possible to verify the authenticity of any evidence.
You could credibly put any words in the mouth of anyone.