OpenAI's TTS does this. You can hear it in regular ChatGPT's voice mode (which this demo is based on, it uses the same animations on the robot's face). It will also sometimes randomly hallucinate syllables or whole nonsense words, although that is rarer.
Is there a setting for this in the ChatGPT app? I have never once noticed it produce an "uh" or repeated syllable like "I... I think I did pretty well."
Really? Have you used it much? I haven't used it a ton but it definitely says "uh" and has various other artifacts. Maybe they have improved it recently but it was quite obvious when I first got access. Or maybe some of the voices are more prone to it than others.
The naturalness of the speech is extremely good, though.
It sounds so human, a person would also stutter at an introspective question like this. I wonder if their text to speech was trained on human data and produces these artifacts of human speech, or if it is intentional.
I believe it's Eleven Labs API with the Stability setting turned down a little bit. It is definitely trained on human speech and when you use a somewhat lower setting than default, it will insert those types of natural imperfections or pauses and is very realistic.
I'm not sure when OpenAI added them, but you can hear similar things when using the ChatGPT voice mode on iOS. Sometimes it feels almost like a latency stutter and other times it feels intentionally human.
I use ChatGPT voice a lot, and it is prone to this exact type of stutter. I don’t think it’s intentional. I think there are certain phonetic/tonal linkages that are naturally “weird” (uncommon in the training corpus) and that AI struggle with them. Why this struggle manifests as a very human-like stutter is a fascinating question.