Ask HN: What is the max audio quality to feed to machine-learning transcription?

Since machine learning doesn't map directly to human cognition, I was wondering how wasteful it is to feed something like 160 kbps speech audio to a transcription service, and how much time and performance could be saved by reducing it to something vastly smaller.

I know little to nothing about this field, but as someone interested in where transcription is and in archival, I was wondering what the optimal solution is when ripping videos from different places and extracting the audio to transcribe it programmatically.