Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground

mahmoudfelfel · 2024-11-13T20:19:55 1731529195

PlayAI (fma PlayHT) founder here, this is a native multiturn voice model that is built for conversations like real-time agents or podcasts. Try it through our playground (https://play.ai/playground) or API (https://docs.play.ai/). Feel free to ask anything.

guytv · 2024-11-13T20:20:09 1731529209

Ouch. If you know Arabic or Hebrew, try selecting those languages and typing something in—it’s hilarious.

Looks like they’re “testing in production.”

mahmoudfelfel · 2024-11-13T20:24:45 1731529485

The current deployed model is English only, we are rolling out a multilingual version later this week!

guytv · 2024-11-23T10:45:41 1732358741

Oh. the UI lets you select fro various languages, so I assumed they should work. Thanks!

Stanleyc23 · 2024-11-13T23:18:42 1731539922

overall impressive. noticed a weird quirk of reading $100 million as "one hundred dollar million" instead of one hundred million dollars

byearthithatius · 2024-11-14T00:19:14 1731543554

Love the idea but this is not good yet. Mine had random changes in pace/cadence of speech and was basically uncanny valley territory

owenpalmer · 2024-11-14T15:41:14 1731598874

This is impressively low latency. Also, it's cool to see another option for TTS with real-time streaming.

dulldata · 2024-11-13T19:38:34 1731526714

if you are developer, then there's an api - https://docs.play.ai/tts-api-reference/endpoints/v1/tts/stre...

yawnxyz · 2024-11-13T23:29:14 1731540554

did you listen to the output of your own demo?

> Speaker 1: Dang man, I’d come find you for sure.

that part sounds like a broken robot

yavorgiv · 2024-11-14T09:52:12 1731577932

Hey, can you give an example ? The model is not perfect and this is our first version, so will get better and faster for sure. Still I generated the full prompt you referenced and it sounds good to me. Adds some laughter, but this makes it more non-robotic in my mind.

https://drive.google.com/file/d/1JzfweTdvCWzJ6Wwv0KdgfaxZcyn...

Speaker 1: Oh yes, the deep sea, nature’s basement. Home to creatures so bizarre, even nightmares are like, “Nah, I’ll pass.” Speaker 2: Right? It's like the ocean was running a clearance sale on leftover parts. “Hey, who wants a fish with a lightbulb head? No one? Alright, let’s just drop this bad boy in the Mariana Trench.” Speaker 1: Oh man, let’s start with a classic: the anglerfish. It’s a fish that decided it was uh, tired of chasing its food and thought, “What if I just dangle a glow stick on my head and let dinner come to me?” Speaker 2: Honestly, I respect that. Can you imagine if we had that? Like, I’m sitting on my couch with a glowing Dorito on my forehead, waiting for snacks to find me. Speaker 1: Dang man, I’d come find you for sure.

treesciencebot · 2024-11-13T20:00:00 1731528000

i don't think anyone has done real-time multi-speaker dialog generation before

Asjad · 2024-11-13T20:03:34 1731528214

[flagged]

byearthithatius · 2024-11-14T00:22:32 1731543752

Fake account / bot meant to promote the company. Look at comment history.

aspenmayer · 2024-11-14T02:30:41 1731551441

Please email hn@ycombinator.com to report these kinds of things.