Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We did a lot of work at https://www.quillmeetings.com to build a diarization & speaker recognition pipeline that works locally on mac and windows. Basically, we can create embeddings of parts of the audio, like you might create embeddings for text for a RAG system, and cluster them (simplifying a lot of details from the "last 80%" that has taken a lot of effort to get working...)

The speaker recognition can't be as perfect as listening to each stream separately like Zoom itself can do, but it also learns your contacts over time and can recognize voices for ad-hoc in-person meetings, etc. which I've found really magical since we launched it.



Ah yes, a locally-run, mostly-accurate speaker recognition pipeline that isn't open source. Love to see cool features locked away while the rest of us plebs make do with whatever scraps the OSS world has managed to build. But hey, at least it kind of works, so you can enjoy your slightly-wrong diarization in private.

Truly the future of meetings.


not open source :/




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: