Great question! Whisper processes audio in 30 second chunks. But on a fast GPU i...

Great question! Whisper processes audio in 30 second chunks. But on a fast GPU it can finish in only 100 milliseconds or so. So you can run it 10+ times per second and get around 100ms latency. Even better actually because Whisper will predict past the end of the audio sometimes.

This is an advantage of running locally. Running whisper this way is inefficient but I have a whole GPU sitting there dedicated to one user, so it's not a problem as long as it is fast enough. It wouldn't work well for a cloud service trying to optimize GPU use. But there are other ways of doing real time speech recognition that could be used there.