Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is there a whisper-like speech-to-text that detects the speaker?
13 points by authorfly 9 months ago | hide | past | favorite | 3 comments
I found some commercial (expensive) offerings doing this but there doesn't seem to be an open source way to categorise the output of whisper into different speakers/sources?

Thinking of this for podcast analysis purposes.




whisper.cpp supports a model with "speaker segmentation" or "local diarization". It is called "local" because that it doesn't name the distinct speakers; it only tells you when the speaker changes. See https://github.com/ggerganov/whisper.cpp/issues/1715#issueco.... Once you compile whisper.cpp and download the model, run `main` with that model and the option `-tdrz true`.


Thank you. This is exactly it, perfect. I will now try and detect the speaker, I guess for podcasts where one side asks questions and the other tends to respond it might be easier, but perhaps not!


"diarization" is your search term. eg. https://github.com/MahmoudAshraf97/whisper-diarization




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: