Hacker News new | past | comments | ask | show | jobs | submit login

Interesting stutter at 0:53



OpenAI's TTS does this. You can hear it in regular ChatGPT's voice mode (which this demo is based on, it uses the same animations on the robot's face). It will also sometimes randomly hallucinate syllables or whole nonsense words, although that is rarer.


Is there a setting for this in the ChatGPT app? I have never once noticed it produce an "uh" or repeated syllable like "I... I think I did pretty well."


I use the conversation mode (head phones icon) and it regularly says uh. It’s cute but spurious and unnecessary.


Really? Have you used it much? I haven't used it a ton but it definitely says "uh" and has various other artifacts. Maybe they have improved it recently but it was quite obvious when I first got access. Or maybe some of the voices are more prone to it than others.

The naturalness of the speech is extremely good, though.


Huh, TIL.

I noticed the stutter too, interesting to see that is what TTS just does now, and not a sign of a human and a sound filter.


Similar at 1:47 I... I think

It sounds so human, a person would also stutter at an introspective question like this. I wonder if their text to speech was trained on human data and produces these artifacts of human speech, or if it is intentional.


I believe it's Eleven Labs API with the Stability setting turned down a little bit. It is definitely trained on human speech and when you use a somewhat lower setting than default, it will insert those types of natural imperfections or pauses and is very realistic.


I would have added umms and hmms artificially just to make the latency less apparent, so Id say good chance thats what they did lol


Now that you mention it, I, uh, also add umms when my speech pathways experience high latency.


I'm not sure when OpenAI added them, but you can hear similar things when using the ChatGPT voice mode on iOS. Sometimes it feels almost like a latency stutter and other times it feels intentionally human.


I use ChatGPT voice a lot, and it is prone to this exact type of stutter. I don’t think it’s intentional. I think there are certain phonetic/tonal linkages that are naturally “weird” (uncommon in the training corpus) and that AI struggle with them. Why this struggle manifests as a very human-like stutter is a fascinating question.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: