Hacker News new | past | comments | ask | show | jobs | submit login

Is it just me, or does the spoken content not correspond with the written prompt in many cases? Though, I’m sure it’s just a problem with matching the right file with the text in the HTML and not a TTS problem.

[Edit: My bad, I looked at the page on a phone screen, where only the text and the first audio playback button are visible.]




The first column of audio is just a sample of that person reading different text. That’s what the model gets to hear to learn what they sound like, before trying to speak the text in their voice.


Ah thanks! I looked at the page on a phone screen, where only the text and the first audio playback button are visible. My bad..


The speaker prompt is the sample speaker voice reading a random text, that’s one piece that the model uses as input. The second column corresponds to the human speaker reading the text (ground truth) The two next columns are baseline and VALL-E producing text-to-speech respectively, given the first column and only the text as input.


I did the same thing—-on mobile the many column headings are not discoverable in portrait.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: