
RNNoise: Learning Noise Suppression - clouddrover
https://people.xiph.org/~jm/demo/rnnoise/
======
guiriduro
Oddly, I found the RNNoise suppression more distracting than the less
sophisticated Speex suppressor. If you'll forgive the figurative language;
compared with the noisy tracks, speex de-emphasises the noise into a quieter
but smoother robotic 'bokeh', which allowed me to concentrate on the main
speakers.

RNNoise on the other hand seemed to detect silences well, but left artifacts
in the speech such that it had a choppy and artificial feel. Lacking the
smoothness in the background I found I was more distracted by the distortions
in the words.

~~~
jmvalin
Interesting -- and unexpected. I also wrote the Speex suppressor and one of
the things that specifically annoyed me about it was the robotic noise and the
pseudo-reverberation it adds to the speech, but it seems like some people
(like you) like that. Trying to understand exactly what you don't like about
RNNoise... is it how the remaining background noise sounds or how sharply it
turns on/off?

I did a quick hack to RNNoise to smooth out the attenuation and prevent it
from cancelling more than 30 dB. I'd be curious if it improves or makes things
worse for you (compared to the samples in the demo):
[https://jmvalin.ca/misc_stuff/rnn_hack1/](https://jmvalin.ca/misc_stuff/rnn_hack1/)

~~~
mtrimpe
Subjectively it seems to me that the RNNoise sample doesn't trigger my brain
to attempt to fill in the gaps.

With the Speex/raw ones I have all the data so if I listen to it again over
and over I can get more out of it eventually.

With the RNNoise one I obviously don't even have enough extra data to even try
doing that so all I can do is blame the algorithm.

Perhaps what you really want is an algorithm that lets through a bit more of
the 'possible noise' for the human brain to have another go at.

~~~
jmvalin
What you're describing is more or less why noise suppression algorithms in
general cannot really improve intelligibility of the speech. Unless they're
given extra cues (like with a microphone array), there's nothing they can do
in real-time that will beat what the brain is capable of with "delayed
decision" (sometimes you'll only understand a word 1-2 seconds after it's
spoken). So the goal of noise suppression is really just making the speech
less annoying when the SNR is high enough not to affect intelligibility.

That being said, I still have control over the tradeoffs the algorithm makes
by changing the loss function, i.e. how different kinds of mistakes are
penalized.

~~~
mtrimpe
Perhaps being more lenient in noisier situations could be an interesting
tradeoff then. At lower noise levels it's already pretty good...

------
piceas
I was curious how RNNoise would perform on a noisy street scene. I grabbed a
section from a random noisy video and ran it through RNNoise as well as light
naive use of the noise removal plugin in Audacity sampled from the 43rd
second. The speaker distortion, as noted by ZeroGravitas from their fancy
example, is quite evident but I'm still pretty impressed.

audacity screenshot
[https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e7...](https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e75.ssl.cf2.rackcdn.com/spect.png)

input
[https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e7...](https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e75.ssl.cf2.rackcdn.com/input.mp3)

RNNoise
[https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e7...](https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e75.ssl.cf2.rackcdn.com/rnnoise.mp3)

naive audacity filter
[https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e7...](https://d4344e4d9b25f298d9ea-790118db7dd23376c2de685644429e75.ssl.cf2.rackcdn.com/audacity_naive_noise_reduction_from43rdsecond.mp3)

source [https://www.youtube.com/watch?v=4HpF-
IoK2y8](https://www.youtube.com/watch?v=4HpF-IoK2y8)

~~~
sunsetMurk
Thank you for these samples, very cool.

To me it sounds like we've got some room for improvement all-around... but
color me impressed. I'm also always impressed by Audacity's noise removal when
I use it for stupid simple voice-overs. I'd bet this deep learning approach
will do nothing but improve, quickly.

~~~
baldfat
Well there are different tools to clear up a audio track.

$50 will get Audio Cleaning Lab. If I was home I would do a quick auto clean
and then hand scrube the file. [http://www.magix.com/us/audio-cleaning-
lab/detail/](http://www.magix.com/us/audio-cleaning-lab/detail/)

$1199 will get you iZotope’s RX. I actually normally get better results form
the $50 one then when I use this tool at a friend's shop.
[https://www.izotope.com/en/products/repair-and-edit/rx-
post-...](https://www.izotope.com/en/products/repair-and-edit/rx-post-
production-suite.html)

Spectral editors are amazing for removing certain sounds and keeping the over
all sound intact. This is where we need to move into. Editing at the Spectral
level, which will have a much higher CPU overhead.

Here is a great article showing the different hands on techniques for noise
removal. [https://www.soundonsound.com/techniques/noise-reduction-
tool...](https://www.soundonsound.com/techniques/noise-reduction-tools-
techniques)

------
jonathanstrange
It would be great to have a good open source noise supressing tool in VST
format. The leading software solutions are fairly expensive, e.g. Izotope RX.
Only Accusonus Era-N is kind of affordable, but at the price of not beeing
tweakable at all.

By the way, good noise suppression hardware is also comparatively expensive,
see for example the Cedar DNS 2 [1]. There could be some business
opportunities in that area.

[1] [https://www.cedar-audio.com/products/dns2/dns2.shtml](https://www.cedar-
audio.com/products/dns2/dns2.shtml)

~~~
vortico
Xiph's implementation of RNNoise is licensed under the 3-Clause BSD license,
so it would be easy to wrap a VST around it with a simple GUI.

------
tgirod
Long time ago I used to make amateur remixes, and one tricky part was to
isolate vocals from the remixed track. To do that I was using the noise
removal tool: select a part of the track without vocals, run a spectral
analysis on it and then substract the result to the whole track. Most of the
time the result was terribly mangled, but sometimes I got something usable.

this demo got me thinking: if I want to remove something very specific from
one track instead of learning a generalized filter, can I train this model
with a smaller dataset, like a few seconds from that track?

~~~
djaychela
I'm not sure that would work well as it would need to understand the cycle of
the (unwanted) backing - i.e. in a simple backing track there might be a kick
drum, then snare, and the system would need to know where those 'should' be in
relation to the backing - I don't think this would work that way -
particularly due to the way that it's achieving the removal of the unwanted
noise (altering the response of each band of frequencies).

I'd think it would be possible to create something that would do what you're
looking for, but it would be much more complex than the above (and -way-
beyond what I'm capable of at the moment, maybe in a couple of years I'll be
able to do something like it).

I've had more luck with taking the backing and using phasing to remove it from
different sections of a song - if you get a track where the backing is simple,
sequenced and samples/repeatable synths (so that the sound is identical each
time it happens), then it's possible to take that non-vocal section, and align
it with the vocal section on another track and reverse its phase to get
cancellation; You have to be precise and get lucky in terms of the rest of the
track, but it is possible. There is, of course, the old stereo swap and
reverse phase trick which removes everything that's not panned centrally; that
can get you a lot of mileage.

As mentioned, though, in another comment, getting hold of acappellas/stems can
be much better, and having listened to some of some classic tracks, you can
learn a lot about production in a short time by doing so.

------
lbill
The RRNoise suppression is less appealing to my ear than the Speex
suppression... But:

\- the approach is pretty cool!

\- as mention in the article, it might be very useful when applied to multiple
speakers (conferencing)

\- it might be very interesting for speech recognition softwares

Also, as a sound guy, when I have a noisy signal I sometimes remove it a bit
too heavily -> I mask the artifacts with some background music. I will
definitely try that with the RNNoise suppression !

------
b0rsuk
I used to be interested in such things, but then I found a really nice
Funk/Acid Jazz/House/Soul playlist on Youtube (50 videos). Some of them I
don't like, but overall - very enjoyable and puts me in a good mood when
programming. It helps I'm new to Funk.

I think Funk and related genres are particularly suited for tasks that demand
concentration. Funk de-emphasises melody in favor of rhythm. Melody calls for
"active" listening. Funk is at the same time predictable and varied. I spend
very little time clicking "next track". For me it's very stimulating
listening.

So, a problem that could potentially be solved by neuroscience research and
programmers (I understand this is interesting in itself) has been solved by
good old playlists for me. And experimentation (trying new music).

~~~
s_kilk
> I found a really nice Funk/Acid Jazz/House/Soul playlist on Youtube (50
> videos)

Link? :)

~~~
sunsetMurk
yes please @b0rsuk! I'd like the link too. I've been digging funk these days.

Here is an instrumental Funk playlist on Spotify that I have been passively
listening to:
[https://open.spotify.com/user/spotify/playlist/37i9dQZF1DX8f...](https://open.spotify.com/user/spotify/playlist/37i9dQZF1DX8f5qTGj8FYl)

------
abetusk
From the article:

    
    
        As strange as it may sound, you should
        not be expecting an increase in intelligibility.
    

I thought one of the reasons hearing aids were so bad was that they pick up
noise equally. Wouldn't this method have a direct impact on making hearing
aids better?

I also have a real hard time differentiating people talking in Google
hangouts, say, especially if they're using silverware on porcelain. Wouldn't
this type of noise suppression help in this case as well?

Seems like pretty awesome stuff.

~~~
jmvalin
My comment about intelligibility refers to a human (with normal audition)
directly listening to the output. When the output is used in a hearing aid, a
cochlear implant, or a low bitrate vocoder, then noise suppression may be able
to help intelligibility too.

------
mbrumlow
From what I can tell it seems the RNN learned when the speaker was talking. It
then just make sharp cuts to blank out when the speaker is not talking. It
does not appear to have learned how to extract just the frequencies of the
speaker but rather just when a speaker is speaking.

I feel this could be taken a step more in such that when the speaker and
overlapping loud sound happens at the same time it is able to extract just the
speakers voice.

Now obviously this is easier said than done.

~~~
jmvalin
A lot of people get the impression it's only cancelling where there's no
speech, but it's also cancelling during speech -- just not as much. If you
look at the spectrogram at the top of the demo, you can see HF noise being
attenuated when there's LF speech and vice versa.

------
blubb-fish
Wearing my Bose QC25 and still hearing my colleagues talking - I would really
like to have ANC that filters out speech.

~~~
dharma1
ANC is basically realtime phase reversal of the waveform and needs to be
almost zero latency. Hard to get a neural network to run fast enough, esp on
an embedded chip

------
olejorgenb
Would be cool to try to train it using the technique from
[https://blog.openai.com/deep-reinforcement-learning-from-
hum...](https://blog.openai.com/deep-reinforcement-learning-from-human-
preferences/)

------
jokoon
One application that would be awesome with NN is instrument separation.

That way one could build much better music visualizations programs, and also
be a little more creative. I know I have some ideas if I could do it...

~~~
radarsat1
There's definitely some more work to be done on that topic but it's been
researched at least since 1995 or so.. search for "source separation".

------
poorman
This could be incredibly useful for processing spectra produced off a mass
spectrometer (which are notoriously noisy).

------
radarsat1
I don't see a link to the dataset on that page. Is the data publicly
available? I would love to play with it.

~~~
jmvalin
For training I've had to use some non-free data, but there's also some free
stuff around. The speech from the examples is from SQAM
([https://tech.ebu.ch/publications/sqamcd](https://tech.ebu.ch/publications/sqamcd))
and I've also used a free speech database from McGill ([http://www-
mmsp.ece.mcgill.ca/Documents/Data/](http://www-
mmsp.ece.mcgill.ca/Documents/Data/)). Hopefully if a lot of people "donate
their noise", I can make a good free noise database.

~~~
radarsat1
Thanks!

