It'll expand though.
This is quite a big practical shift in ML; we moved from research on specialized solutions for separate problems to generalized solutions that work on large classes of problems with minimal adaptation, highly reducing the need for domain-specific knowledge. I mean, it's still usually useful, but not absolutely necessary as it was before.
It may definitely lower the barrier of entry to those fields, but I'm not sure it has removed it altogether, at least not just yet.
That's the whole point, in many domains the accumulated knowledge about feature engineering isn't necessary anymore, since you can train a deep network to learn the same features implicitly from data, and (depending on the problem) it's quite possible that the learned features in the initial layers will be better than anything people have engineered before.
For your speech example, speech recognition models used to contain explicit phonetics systems (where the chosen set of phonemes mattered), separating acoustic models and language models. But now you can also get decent results from an end-to-end system, by throwing all the phonetic knowledge you had into the trash bin and training an end-to-end model from the sound data (though, not a waveform but a frequency domain representation, e.g. https://en.wikipedia.org/wiki/Mel-frequency_cepstrum - but that's not "understanding of speech", it's understanding of audio data format) straight to the output text characters.