based on Whisper  and Whisper.cpp , I created a comparison of transcription performance (quantitative metrics such as relative speed).
You can find the code in the Colab  and a blog post  containing a how-to guide and visualizations.
In the future, I'd love to add WER evaluation and visualizations based on ground-truth data.
Bonus: Normally you would log these results from Python to Weights & Biases, but there is a way to log even from C++ / the cli by using `subprocess`
Would love to know what you think of this comparison and what features / attributes you would like to see in a more sophisticated comparison.