I found some commercial (expensive) offerings doing this but there doesn't seem to be an open source way to categorise the output of whisper into different speakers/sources?
whisper.cpp supports a model with "speaker segmentation" or "local diarization".
It is called "local" because that it doesn't name the distinct speakers;
it only tells you when the speaker changes.
See https://github.com/ggerganov/whisper.cpp/issues/1715#issueco....
Once you compile whisper.cpp and download the model,
run `main` with that model and the option `-tdrz true`.
Thank you. This is exactly it, perfect. I will now try and detect the speaker, I guess for podcasts where one side asks questions and the other tends to respond it might be easier, but perhaps not!