whisper.cpp supports a model with "speaker segmentation" or "local diarization".
It is called "local" because that it doesn't name the distinct speakers;
it only tells you when the speaker changes.
See https://github.com/ggerganov/whisper.cpp/issues/1715#issueco....
Once you compile whisper.cpp and download the model,
run `main` with that model and the option `-tdrz true`.
Thank you. This is exactly it, perfect. I will now try and detect the speaker, I guess for podcasts where one side asks questions and the other tends to respond it might be easier, but perhaps not!