Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem wrt open source / free solutions is data. Kaldi is open source and gets state of the art results -- but the data costs a lot of money. Training the models is doable on a commodity GPU although it takes quite a while.


I didn't realize Kaldi could get state of the art results. Do you say that because you know of people doing that, or is your comment based on knowing the architecture of Kaldi?


The 8.5% in this file is what you'd compare to Microsoft and IBM's recent ~5% results.

https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_sw...

Kaldi hasn't been in first place on that dataset recently, but it was a few years ago.

On other more researchy datasets (eg. for distant speakers or languages other than English), the best system is often based on Kaldi.


One reason for the discrepancy between quoted numbers is that, if you are only after pushing that number down and not particularly interested in getting a scalable system, then you are free to run as many systems as you like in as many configurations as possible and then try to combine their outputs (ROVER etc.).


Yeah. I don't get the impression that the Kaldi core team has been trying very hard recently to get SOTA on eval2k/switchboard. This number uses one acoustic model with a trigram LM decode + fourgram rescoring -- there isn't even a neural net language model in there. If I remember correctly, Microsoft's first "human parity" result used something like three acoustic models and at least four types of language models. This Kaldi model is competitive with the best single acoustic model Microsoft used.


Fully agree. I think their work on training data augmentation (e.g., their ICASSP paper, http://danielpovey.com/files/2017_icassp_reverberation.pdf, or the ASPiRE model before) has a bigger impact on the practical usefulness of ASR than getting an X% relative improvement over the previous SOTA on the eval2000 set.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: