I thought whisper and others took large chunks (20-30 seconds) of speech, or a complete wave file as input. How do you get real-time transcription? What size chunks do you feed it?
To me, STT should take a continuous audio stream and output a continuous text stream.
Whisper and Moonshine both works in a chunk, but for moonshine:
> Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER.
Also for kyutai, we can input continuous audio in and get continuous text out.
Having used whisper and noticed the useless quality due to their 30-second chunks, I would stay far away from software working on even a shorter duration.
The short duration effectively means that the transcription will start producing nonsense as soon as a sentence is cut up in the middle.
Oh, this does sound cool. Couple of questions that aren't clear from the readme (to me).
What exactly does the silence detection mean? does that mean it'll wait until a pause, and then send the audio off to whisper, and return the output (and stop the process)?
Same question with continuous. Does that just mean it continues going until CTRL+C?
Nvm, answered my own question, looks like yes for both[0][1]. Cool this seems pretty great actually.
agreed, both of those make sense, but I was thinking realtime. (pipes can stream data, I'd like and find useful something that can stream tts to stdout in realtime.)
FYI:
owhisper pull whisper-cpp-large-turbo-q8
Failed to download model.ggml: Other error: Server does not support range requests. Got status: 200 OK
But the base-q8 works (and works quite well!). The TUI is really nice. Speaker diarization would make it almost perfect for me. Thanks for building this.
Sorry, maybe I missed it but I didn't see this list on your website. I think it is a good idea to add this info there. Besides that, thank you for the effort and your work! I will definetely give it a try
These are list of local models it supports:
- whisper-cpp-base-q8
- whisper-cpp-base-q8-en
- whisper-cpp-tiny-q8
- whisper-cpp-tiny-q8-en
- whisper-cpp-small-q8
- whisper-cpp-small-q8-en
- whisper-cpp-large-turbo-q8
- moonshine-onnx-tiny
- moonshine-onnx-tiny-q4
- moonshine-onnx-tiny-q8
- moonshine-onnx-base
- moonshine-onnx-base-q4
- moonshine-onnx-base-q8