I often have voice recordings with a lot of background noise (e.g. a public lecture in a room with poor acoustics, recorded from a phone in the audience — there's usually sounds of paper rustling, noises from the street, etc). Is this "source-separation" the sort of thing that could help, or does anyone have other tips? The best thing I have so far is based on this https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordi... —
(1) Open the file in Audacity and switch to Spectrogram view,
(2) set a high-pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which tend to be loud anyway),
(3) don’t remove the higher frequencies (which aren’t loud), because they are what make the consonants understandable (apparently),
(4) look for specific noises, select the rectangle, and use “Spectral Edit Multi Tool”.
But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.
I'd generally try iZotope RX for cleaning up audio - Dialogue Isolate is probably the exact feature you would want (and I gather is often used in movies to clean up on location dialogue), but it's only in the most expensive Advanced version:
Cheaper versions of RX still have various noise reduction tools, de-verb for reducing reverb and room echo, and a range of spectral editing tools as well.
You could give a shot to the Nvidia RTX Voice plugin if you have one of the compatible cards. I'm not sure how it deals with low background noises, the youtube reviews mostly tested it with over the top cases like a vacuum cleaner next to the speaker.
https://krisp.ai uses machine learning to remove background noise. I've used them with Zoom calls and it works really well. I think they don't currently have an "upload audio" feature for existing recordings, but it would be awesome if they offered this in the future.
Sorry it's not something you can use now, but I just thought I would mention it! I also did a quick Google search but unfortunately I couldn't find any AI noise removal tools that might solve this problem.
(1) Open the file in Audacity and switch to Spectrogram view, (2) set a high-pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which tend to be loud anyway), (3) don’t remove the higher frequencies (which aren’t loud), because they are what make the consonants understandable (apparently), (4) look for specific noises, select the rectangle, and use “Spectral Edit Multi Tool”.
But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.