Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Alsa_rnnoise – RNNoise-based noise removal plugin for ALSA (sr.ht)
137 points by ArsenArsen on Jan 31, 2021 | hide | past | favorite | 47 comments



Ever since I got a new microphone I've been having issues with large quantities of background noise, primarily typing sounds, leaking in through my microphone. A friend of mine had told me about rnnoise, describing it as "very good" at removing noise, so I decided to test it. My initial testing started by me piping raw PCM from arecord through rnnoises denoiser demo into aplay. The results frankly shocked me. When not speaking, there was no noise (or sound) at all (this is due to rnnoises voice detection system, which essentially mutes the microphone when there's no voice), and when I was talking, the sound of my keyboard was a lot quieter without it affecting my voice, this lead me to decide to develop alsa_rnnoise to have good real time noise cancellation.

alsa_rnnoise is a very simple ALSA filter plugin that runs its input through rnnoise before outputting it back to ALSA. It operates in real time, adds very little latency (the amount depends on what size frames ALSA delivers, but is nominally less than 10ms). After enabling it, any annoying background sounds were gone. My intended use cases for this were VoIP, screencasting and streaming, and, as far as I can tell, it works great for all three.


A few weeks ago, I tried doing this with the LADSPA version of RNNoise and ALSA, but couldn't get it to work reliably.

I have also experimented with NoiseTorch which routes the microphone through LADSPA using Pulseaudio but it didn't work reliably either. The biggest problem with this is that Pulseaudio will load one CPU thread 100% even when no audio input. This makes it a deal breaker for laptops.

I will definitely check this out. RNNoise is truly amazing tech, but it is not as accessible as I would like. The best use if it is in the Mumble client where it is an optional setting.

It is a shame Nvidia has taken over this space completely with RTX voice. RNNoise does a comparable job without the need for an Nvidia GPU. But I guess it is because RNNoise is just not as easy to setup.


I believe Pulse also has a LADSPA module that you could try.


Hey, thanks for building this! There are multiple options like this for Pulseaudio, but last I researched it nothing for pure ALSA. On a system without Pulseaudio this is obviously better, great to have.

The pulseaudio plugins like noisetorch have the issue of significant system load even without current sound input (something about how the loopback works iirc), will this alsa plugin share that issue or will the system load be lower when currently there is no sound input?


The plugin uses very little CPU, and is entirely inactive when not in use (i.e. when data isn't being pulled => the transfer function isn't being called) due to how ALSA works

EDIT: Do note, though, that each process pulling audio will be denoising independently, so the usage scales linearly with the amount of clients. This is due to how ALSA plugins work, but regardless of that, on a Ryzen 5 1600x (the only CPU I can test on), the plugin uses 2.5% of a single core when recording mono 48k


Excellent.

I'm testing this right now and am noticing that some more info about the installation could be helpful. Specifically, when installing rnnoise as shown in the readme it of course goes to /usr/local/lib, but /usr/local/lib/pkgconfig was not in the PKG_CONFIG_PATH of my distro. Maybe there could be a hint to set that when calling `meson build` if rnnoise can't be found?

Packaging software is always annoying, sorry for dragging you into that mud. Ideally distros will pick it up and compiling manually unnecessary. I would have left this as an issue but saw no issue tracker on the project page.


There's an issue tracker on that page, under tickets, but I'd prefer if you took a discussion to the attached mailing list first before it hits the official tracker.

As for packaging, that's my field of work for some projects I'm working on so it's not unfamiliar to me, the only problem is that the RNNoise upstream lacks releases, although there's discussion about something happening about that.


Okay. To also mention the result: Installation worked, alsa plugin worked and the filter does work. Nice, thanks again.

With extreme sounds (vacuum) in the background the voice gets a bit more distorted than ideal, but something like a keyboard gets filtered nicely to be less noisy. I assumed that's just how RNNoise behaves, I'm just mentioning it because of the sound quality discussion above. Maybe also to that: Just activating the alsa_rnnoise filter does not significantly lower recording quality, at least not that I can notice.


> There are multiple options like this for Pulseaudio,

Mind sharing? I didn't manage to find any that was easy to install/configure, so it would help me a lot. Thanks!


NoiseTorch is the one I use on my Pulseaudio-enabled laptop, https://github.com/lawl/NoiseTorch. It has a GUI and is very easy to install, seems to work well. Significant cpu usage when active, so I only load it when it's needed, but that's okay for me.


I'm intrigued by this. I've never been really satisfied with any software noise reduction system, but this sounds like a phenomenal improvement.

Tried installing it on Ubuntu LTS via latest pulseeffects, but it seems like it switched to pipewire 0.3 which is not available yet so I can't really run it.

My solution so far has been a headset with a boom microphone, like the CoolerMaster MH630 (one of the better boom mics, see https://www.rtings.com/headphones/reviews/cooler-master/mh63... for a sound demo at the "Recording quality" section). When it comes to noise reduction, bringing the microphone as close to the mouth as possible is a really good way to get an amazing SNR boost immediately (even for omnidirectional mics). Unfortunately that's a pain with large capsule condenser microphones (unless you're ready to have a large boom arm hanging about on your desk and accept view obstruction, and you add postprocessing to remove the very bassy proximity effect)

Another benefit of headsets with boom mics is consistency (without DSP). No matter how you move, the distance and angle to the microphone is always identical, and therefore the sound is very consistent (save for your own personal loudness)

You can of course add DSP (limiter, noise reduction etc) to that, but the better input you provide, the better output you get.


Sound engineer here.

RNNoise is an amazing feat, but please, don't overdo it. Most of the time, you don't really want complete ambient noise elimination, as human speech appearing from dead silence sounds unnatural. Moreover, most noise reduction software is considerably less effective in reducing noise during a person speaking, either removing too much, producing degraded speech sound (worst case) or too little. If it's possible, always start adding your noise reduction gradually, stop when it sounds good to your ear and then back up a bit.

If you're doing voice recording/streaming, please, get to know Expanding and Compression first, and only after configuring your sound processing chain add noise reduction in.

On of the serious offenders is OBS studio, which recently added RNNoise filter, but provides no means of mixing processed sound with the dry one (in other words, filter is always 100% on). Wet/Dry mix knob is heavily needed for most filters there.

I'm very saddened by the state of sound quality in lots of amazing videos people have been producing lately and now I'm considering writing a guide for voice processing for streams/conferences/etc for the techy people, if anyone's interested.


Great post.

I'm also an audio engineer. This is the truth.

In an audio recording featuring spoken voice, there are two sounds present in every recording: the spoken voice, and the room ambiance in the background. We typically will refer to the latter as "room tone."

Even though we don't usually explicitly realize this, our ears/brain implicitly do. So, when people overdo noise removal, we implicitly hear the difference since half of the sounds that compose your filtered output are now gone. We tend to associate such recognizable "noise gating" with lower production quality and we find that generally such processing leads to lower intelligibility of the human voice.


The addition of an artificial ambient background is known as "comfort noise" for those who are interested to look further into it; usually it's done on the receiver end.


I'd be quite interested in such an article, again, my goal (besides VoIP) is screencasting and/or streaming, so any bit of advice someone with experience might have is greatly useful.

I'll look into expansion and compression, and I could implement a wet/dry setting that multiplies the source samples and then mixes them into the result, if I understood the concept right.

EDIT: RNNoise seems to be alright when it comes to canceling noise during speech too, I didn't notice it overdoing it.


> I could implement a wet/dry setting that multiplies the source samples and then mixes them into the result, if I understood the concept right.

Haven't tested your version yet, but werman/noise-suppression-for-voice plugin introduces some delay and dumb wet/dry control (or mixing with original sound source in some other way) doesn't work, so it might turn out to be not so simple.


Right now there's no such feature in place, but I imagine keeping the buffer from before denoising and mixing it into the denoised result (plus the multiplication) will do what you're describing? It may increase volume, I might need to reduce the volume of the denoised audio first. I'll play around with it, and am open to hearing what you've got to say about it.


I wouldn't be too worried about it unless you're working on something at the level you know why to be worried about it (i.e. you're mixing audio as part of the what you're doing not because you just need the audio output to work). For instance I'd take missing comfort noise 10 times before everyone hearing my water heater kick up once on a conference call or while playing a team shooter.

That being said RNNoise isn't that great at actually filtering background noise as much as guessing when to drop the levels and as you mention it really doesn't block much when it detects you're speaking rather just lets most everything through until you stop.

RTX voice made the gold standard in filtering IMO though and as amazing a feat RNNoise is (I certainly couldn't do better) it's just not that good in comparison. I'm not sure what they did to make their model so good but I can use a boom mic set to omni, run a fan at high speed into the mic, bang on the desk repeatedly with one hand, have the water heater making noise, my phone vibrating on the table, a car alarm going in the background, the cat scratching a post, and so on and as long as I remember to talk at a normal volume it's damn near indistinguishable from talking in a quiet room. It may sound preposterous or like I'm exaggerating for effect but I'll be damned it actually filters that well. I didn't believe it until I tried. It finally gets "bad" when the noise is so bad and loud on the microphone your voice starts to sound a bit distorted but it's still isolated. Does let cat meows through, though that is technically voice and I'm not sure how you could identify it was a meow without massive latency to hear the whole thing first.

That being said they seem to have completely fucked something up porting it to Nvidia Broadcast as the mic filtering in that leaks to the point it was like it wasn't even on.


UPD: I've written the guide. *Voice recording and processing for talks, streaming and conferencing. The Reference.*

I'm not so good with short names and the post itself is pretty long.

Here's the link: https://indiscipline.github.io/post/voice-sound-reference/


I think pretty much everyone who does A/V production (and some people who don't, like me) would be interested in such a guide. Please do write it!


Your guide would be a blessing for techies looking to improve their audio quality. Please, do it!


>ng feat, but please, don't overdo it. Most of the time, you don't really want complete ambient noise elimination, as human speech appearing from dead silence sounds unnatural.

No. Most sane programs don't do comfort noise because it is everything but comfort. Iff you speak data should be transmitted.


Just tried the pulseaudio https://github.com/lawl/NoiseTorch and I must say it makes an astonishing difference: https://www.youtube.com/watch?v=5rAfyMrE49o&feature=youtu.be

Though basics of getting dynamic microphone close to my mouse is probably bigger, hah


Is there any RNNoise based alternative for MacOS? I managed to install the plug-in but find it hard to pipeline the audio into it.


I noticed that RNNoise doesn't appear to be an open model, you can't re-train it from scratch from the source data, which isn't publicly documented (or doesn't exist?), even if you had enough hardware.


The documentation is a bit poor. The original data is available for download (with more info about the entire process, most of which is outside of my grasp as I am not an ML person) in the demo blog post: https://jmvalin.ca/demo/rnnoise/ (towards the bottom of the page)


Coming back with information from #xiph on freenode:

  16:57 <ArsenArsen> where and under what license is the training data used for RNNoise?
  18:38 <rillian> ArsenArsen: There's a copy of what I believe is the training data on the xiph server, but afaik it's never been published
  18:39 <rillian> the original submission page has an EULA waiving copyright and liability claims, and agreeing that it _may_ be released CC0.
  18:40 <rillian> it looks like that didn't actually happen.
  18:41 <rillian> there may have been concerns about auditing it for privacy issues, but there's a lot of audio to listen to, 6.5G compressed
  18:41 <rillian> jmspeex, TD-Linux: what's the status of publishing the rnnoise training data?
  18:43 <jmspeex> Are you talking about the data that was used to train the default RNNoise model or the noise that got collected with the demo?
  18:43 <rillian> jmspeex: I think debian just cares about the training data for the default model.
  18:44 <jmspeex> There was never plan to release that -- it includes data from databases we cannot release
  18:44 <jmspeex> but I don't see what the issue is. Distributing the model is not the same as distributing the data
  18:45 <rillian> ah, I see. I didn't realize you'd used proprietary sources as well.


Any idea about the license for the original data?


The paper links to the McGill TSP speech database (English & French) as one of the sources of the data, which claims to be BSD licensed:

http://www-mmsp.ece.mcgill.ca/Documents/Data/


The other source of data mentioned in the paper is the NTT Multi-Lingual Speech Database for Telephonometry, which seems to be commercial, so presumably under a proprietary license.

https://www.ntt-at.com/product/multilingual/ https://www.ntt-at.com/product/speech2002/


Hmm, OTOH, the 6.4GB data tarball says that it is from contributors who responded to the demo and is licensed under CC0.


+1, that data is CC0, and I believe that's all the data that was used for training.


No, exactly none of that data was used for training. The training was done before the demo that was asking for noise contributions. The contributions are CC0, but were never used (i.e. totally unknown dataset quality).


So far we have 3 ideas!


Also any idea if the training required nvidia GPUs or was it done on CPUs or GPUs with non-proprietary drivers?


There is training instructions in the repository. The training scripts appear to be using some pretty standard ML libraries (I'm seeing keras and mentions of tensorflow), so I imagine that the requirements are the same as those.

I don't feel I'm qualified to elaborate on this specifically, again, I'm no ML person. For more info look here: https://github.com/xiph/rnnoise/tree/master/training https://github.com/xiph/rnnoise/blob/master/TRAINING-README


If you're using Windows, I recently found it this small tool to reduce background and keyboard/mouse noises: https://closedlooplabs.com. It's not open source as far as I'm aware but way cheaper than krisp.ai's subscription model.


It is possible to use VST2 on Windows. This way you get RNNoise and the advantages of Free software.

https://github.com/werman/noise-suppression-for-voice


rnnoise is fantastic. I use it in an Equalizer APO filter chain on my gaming machine along with an EQ and compressor which are fed from a dynamic mic. I consistently get comments about the quality of my mic setup in-game and on Discord.

The best part is that it has almost no impact on voice quality, unlike Krisp and some other options I have tried. Singing into the filter chain even sounds good, with the exception of when my 5 year old daughter joins in. rnnnoise seems to think that her voice is noise and tries to intermittently filter it out, which causes a volume warble while we sing together. To be fair, 99.9% of the time her voice should definitely be considered noise I want filtered out. ;)


looks promising.

but I've always wondered how Krisp.ai achieves such good results, considering that it works on the local device, plus the size is quite small (a few hundred MB). it really impresses me. disclaimer: I'm not affiliated in any way with Krisp.ai, just a happy user.


Is there something like a GUI that generates .asoundrc files?

The syntax is not exactly intuitive.


There's no GUI for it, but I'm willing to help you with it. If you have no asoundrc (i.e. you let ALSA figure it out for you) the example in the README will work. Otherwise you can email me or post on the mailing list and I can get back to you.


Right now I have multiple capture devices (front mic, rear mic, capture) which can be selected in alsamixer via "Input source". It would be great to have another "virtual" capture device that uses e.g. front mic but routed through rnnoise so that rnnoise can be easily turned on/off by selecting the input source.

Unrelated to this I want to be able easily switch between my Headphones and HDMI output whereas HDMI out needs to be routed through dmix->alsaequal->softvol->HDMI. I gave up after spending two hours tinkering.


I'm pretty sure the switch you're talking about is hardware level. If you want to turn rnnoise on/off you could create a new pcm rather than overriding the default, and then having the software select it. The README example of asoundrcs is for the most part the same, you'd just remove this secion:

  pcm.!default {
      type asym
      playback.pcm "cards.pcm.default"
      capture.pcm "rnnoise"
  }
As for the second thing, you could do what I do and use pcm_jack, like this:

  pcm.!default {
   type asym
   playback.pcm "plug:jack"
   capture.pcm "plug:rnnjack"
  }
  
  pcm.rnnjack {
   type rnnoise
   slave.pcm plug:jack
  }
Keep in mind this will need more setup, specifically to set JACK up, and is definitely overkill, but may be fun. Also, this config is how my setup works exactly.


For PulseAudio, there is https://github.com/lawl/NoiseTorch


I would recommend PulseEffects. It has a much nicer UI than NoiseTorch, supports a number of effects besides noise cancellation (if desired), supports auto-start, supports custom models (but has a good default included), doesn't require a sudo password and doesn't burn CPU if your mic isn't being used.

The support in PulseEffects is new but has been working well for me. I have had zero issues and don't even think about it anymore.


Just installed it. It is great! Thanks for the recommendation.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: