It's amazing that Google, Amazon, and Apple, with all of their billions of dollars and research teams couldn't train a speech recognition model that works reliably well most of the time regardless of accent or background noise or even without any of those 'problems'.
Then OpenAI with its whisper model showed it is very much possible. The whisper model is so good, I sometimes feel like it's magic. I have tried all sorts of situations, non-native speakers, severe background noise etc, and it still does incredibly well both for English and Arabic (these are the only two I have extensively tested).
Then OpenAI with its whisper model showed it is very much possible. The whisper model is so good, I sometimes feel like it's magic. I have tried all sorts of situations, non-native speakers, severe background noise etc, and it still does incredibly well both for English and Arabic (these are the only two I have extensively tested).