I'm surprised to hear this organisation is successfully doing ML speech to text. Is it running 100% of volume in production? Or is it more of a pilot type thing? I know of a French multinational bank that just tried for 2 years to get a ML speech transcription up and running, for transcribing conversations with customers, but due to unreliable results, recently put the project on ice. Their experience was much along the lines of everything discussed in this comment thread.
I think the expectation of "drop some AI in" and do XYZ job is off in the same way that "we're going to drop a humanoid robot in" and do XYZ job is off.
Where I've seen it used well is as a piece in a larger system of automation. In the healthcare case, it's doing a first pass at transcribing an audio dictation so that a transcriptionist can then start with a 90%+ accurate document.
This is tough, their role shifts some (more editor/correctionist than true transcriber) and not everyone makes that transition well, but the end result is 2x+ efficiency gains.