Hacker Newsnew | past | comments | ask | show | jobs | submit | arkobel's commentslogin

The lack of parallel accent data makes this fundamentally unsupervised. Curious if this leans more on latent disentanglement than direct supervision.

Have you compared with Krisp-TT models? https://krisp.ai/blog/krisp-turn-taking-v2-voice-ai-viva-sdk... Krisp LLC also shares an End-of-Turn Test dataset. Did you test your model on that? https://huggingface.co/datasets/Krisp-AI/turn-taking-test-v1

And can you share some information about the model size and FLOPS?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: