Hacker News new | past | comments | ask | show | jobs | submit login

I'm using Whisper to transcribe notes I record with a lavalier mic during my bike rides (wind is no problem), but am using OpenAI's service. When it was released I tested it on a Ryzen 5950x and it was too slow and memory hungry for my taste. Using large was necessary for that use case (also, I'm recording in German).



The original release was full precision model weights running in an old version of PyTorch with no optimizations.

Fast forward to now and you have faster-whisper (using Ctranslate2) and distil-whisper optimized weights.

Between the two of them Whisper Large uses something like 1/8th the memory and is likely at least an order of magnitude faster on your hardware.

German has no effect on these metrics and for accuracy it actually has a lower word error rate than English.


With Whisper, you can find many smaller models that are fine-tuned for a particular language, so even smaller models can perform adequately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: