I use VAD to chunk audio. Whisper and Moonshine both works in a chunk, but for m...

zveyaeyv3sfye · 2025-08-15T13:05:10 1755263110

Having used whisper and noticed the useless quality due to their 30-second chunks, I would stay far away from software working on even a shorter duration.

The short duration effectively means that the transcription will start producing nonsense as soon as a sentence is cut up in the middle.

mijoharas · 2025-08-14T20:29:58 1755203398

Something like that, in a cli tool, that just gives text to stdout would be perfect for a lot of use cases for me!

(maybe with an `owhisper serve` somewhere else to start the model running or whatever.)

ctbellmar · 2025-08-15T14:16:24 1755267384

I wrote a tool that may be just the thing for you:

https://github.com/bikemazzell/skald-go/

Just speech to text, CLI only, and it can paste into whatever app you have open.

mijoharas · 2025-08-15T16:31:27 1755275487

Oh, this does sound cool. Couple of questions that aren't clear from the readme (to me).

What exactly does the silence detection mean? does that mean it'll wait until a pause, and then send the audio off to whisper, and return the output (and stop the process)? Same question with continuous. Does that just mean it continues going until CTRL+C?

Nvm, answered my own question, looks like yes for both[0][1]. Cool this seems pretty great actually.

[0] https://github.com/bikemazzell/skald-go/blob/main/pkg/skald/...

[1] https://github.com/bikemazzell/skald-go/blob/main/pkg/skald/...

yujonglee · 2025-08-14T20:36:29 1755203789

Are you thinking about the realtime use-case or batch use-case?

For just transcribing file/audio,

`owhisper run <MODEL> --file a.wav` or

`curl httpsL//something.com/audio.wav | owhisper run <MODEL>`

might makes sense.

mijoharas · 2025-08-14T20:47:03 1755204423

agreed, both of those make sense, but I was thinking realtime. (pipes can stream data, I'd like and find useful something that can stream tts to stdout in realtime.)

yujonglee · 2025-08-14T20:54:50 1755204890

It's open-source. Happy to review & merge if you can send us PR!

https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625...