
Audio Processing for Dummies - gkbrk
http://adventures.michaelfbryan.com/posts/audio-processing-for-dummies/
======
holy_city
I wrote this crate [1] as a compressor in Rust which is the opposite of a
noise gate, as in gain reduction is applied after a threshold is passed
instead of gain reduction applied if it is under a threshold.

If you want a really great approach to noise gating, a fixed threshold is fine
but it works better when you apply it to the difference of two envelope
followers - one with a short attack, long release (tracks input) and long
attack, short release (tracks noise floor). Takes a bit to set it up, but it's
a stupid simple way to get extremely effective gating and is easy to fine tune
for your application. A lot of Voice Activity Detection (VAD) works this way;
it's just a matter of tuning the coefficients and thresholds for your input.

Also useful reference for envelope following are the DAFX text [2], Will
Pirkle's textbook on audio in C++ [3] and Zölzer's text [4]

[1] [https://github.com/m-hilgendorf/rusty-
compressor](https://github.com/m-hilgendorf/rusty-compressor)

[2] [https://www.amazon.com/DAFX-Digital-Effects-
Udo-Z%C3%B6lzer/...](https://www.amazon.com/DAFX-Digital-Effects-
Udo-Z%C3%B6lzer/dp/0470665998)

[3] [https://www.amazon.com/Designing-Audio-Effect-
Plugins-C/dp/1...](https://www.amazon.com/Designing-Audio-Effect-
Plugins-C/dp/1138591939/ref=cm_cr_arp_d_product_top?ie=UTF8)

[4] [https://www.amazon.com/Digital-Audio-Signal-
Processing-Z%C3%...](https://www.amazon.com/Digital-Audio-Signal-
Processing-Z%C3%B6lzer/dp/0470997850)

(pdfs can be found around the internet)

~~~
jcims
The examples used in the OP are helped by having an RF squelch to zero out the
noise floor. If there was no squelch, the difficulties finding a good static
(har har) threshold would have been much more apparent.

~~~
tripzilch
Can you explain what you mean by "squelch"? I'm assuming it's a kind of filter
but I can't find it in the code.

------
atoav
Interesting project. In my experience as a re-recording mixer for film the
best denoisers currently on the market work as a sort of spectral noise gates:
so instead of running one noise gate over the whole audio spectrum, you split
it into frequency bands each with their own thresholds. These thresholds are
tuned in by “learning” the spectral shape of the background noise itself and
usually the user can still modify the wheights with a curve (e.g. if you want
to gate high frequencies more).

This has the benefit of producing more natural results while keeping speech
understandable. It might interest you, because with a 2000x speed marging you
_could_ still crank the algorithm up.

~~~
danpeddle
Multi band compressors also help in this domain, choosing which frequency band
to attenuate or boost. For multi channel audio, phase becomes important too.
Not for nothing is a typical trick in mastering to restrict low frequencies to
mono..

~~~
SuddsMcDuff
I always thought the primary reason for that was that stereo low frequencies,
when recorded onto vinyl records, hugely increase the risk of the needle
'jumping' out of the groove and skipping into another one.

------
jonplackett
Is an audio stream really just a bunch of samples of volume? I had always
assumed you needed to record different wavelengths of sound to capture
different pitches. Is my mental model completely wrong? How do you get such
variety in sound just from recording the volume?

~~~
fireattack
You don't record each "waves" (nor is it even possible), you record the final
result of all the waves superposed together, just like how your ears work.

~~~
jcims
>You don't record each "waves" (nor is it even possible)

Depends on what you mean by 'record'. Generally the air pressure is sampled
instantaneously, but it can be broken down into frequency components before
recording/perceiving. That's kind of how our ears work and in DSP you do an
FFT (which you can then pass through an iFFT to synthesize the original
signal).

~~~
fireattack
It has been a while since I study DSP, so I'm not sure if FFT can
mathematically recover all the original sine waves (FT can). But I don't think
it is possible practically, at least.

~~~
kortex
Yes, if you use the FFT or any discrete FT, you can perfectly recreate the
signal _up to the Nyquist frequency_. To perfectly recreate the whole signal,
you'd need infinite Fourier coefficients.

~~~
fireattack
Thanks

------
nippoo
The interesting thing with this example in particular is that the audio output
from the radio is already very well-gated thanks to an RF squelch (ie muted
until a high enough RF level or pilot tone is detected - otherwise it would
sound like an loud FM radio tuned to an empty band between transmissions and
the gate wouldn't work). So internally there's already a voltage signal
somewhere to indicate mute/unmute, with better precision than deriving it from
the audio.

Of course it's a fun software exercise but if you were building a hardware
product out of it you'd be better off splitting the audio into distinct
transmissions based on the squelch signal rather than audio gate...

As it happens some modern avionic radio systems let you individually play back
the most recent few messages for exactly the reasons the author describes
(it's easy to miss)

~~~
MichaelFBryan
> The interesting thing with this example in particular is that the audio
> output from the radio is already very well-gated thanks to an RF squelch

Author here. Luckily for my purposes I can rely on the radio already having a
squelch, so I could get away with using the naive solution.

If it was the raw audio I'd probably need to implement something more
sophisticated such as splitting the audio by frequency and giving each band a
noise threshold (as atoav mentioned).

------
Mirioron
While this is an article about programming, I'd like to mention that if you
need to remove background noise from an audio clip, then Audacity has an
excellent filter for this. Of course that's not anywhere as adaptable as doing
it in code yourself is.

~~~
Joeboy
If you don't care about open source and don't mind spending money, Izotope RX7
has _much_ better denoising tools than Audacity. It's actually the main reason
I now have Windows installed on my desktop (alongside Ubuntu).

~~~
tripzilch
I know a lot of things about audacity can be a bit clunky, but the _quality_
of the denoise tool in Audacity is actually extremely good. I read its code
one time, and I was impressed.

I'm not familiar with RX7, maybe it does better audio quality but it's
probably also got a better interface/workflow. Which is definitely worth a lot
in a professional setting.

------
franky47
Are there companies/projects that are working on cleaning up radio signals
like that in real-time with digital signal processing ?

I used to work in the music industry, where there are many other real-time
processing tools that can be applied to radio voice to increase clarity (eg:
removing the static noise under speech, re-equalizing the limited bandwidth to
increase perceptibility, de-reverberation etc..)

Considering such radio communications are often used in mission-critical
scenarios, one would think clarity of speech would be a factor to consider.

~~~
henrikeh
Digital communication has a wholly different set of requirements than audio
processing. With audio it is good enough to sound good, but digital
communication has far more specific metrics of performance.

Digital communication is also far more structured, which makes it possible to
implement a large number of techniques for improving signal quality.

Almost all digital communication systems would have something along: adaptive
equalization, carrier tracking, symbol clock tracking, forward error
correction, and many, many more techniques.

~~~
nsomaru
Where can one read about these techniques and are there open source
implementations?

~~~
henrikeh
I’ve only learned about this through my engineering studies, so I can only
really recommend textbooks. I found Proakis’ Digital Communication quite good,
but it doesn’t go very deep.

I don’t know of any online resources – but I’d love to have the time to write
some signal processing for communication is fascinating and has a wide impact
on everyday life.

Also wrt. implementations: I think GNU Radio might be a good place to look,
but honestly the actual implementation of these algorithms is often very
simple, it is the theory behind them that gets hairy.

------
tripzilch
Is this a good example of how things are done in Rust, or is it over-
engineered?

I'm pretty sure you should be able to code a thing like a noise gate in 10-15
lines of code.

... and then there's this rather odd statement:

> If we want to use the NoiseGate in realtime applications we’ll need to make
> sure it can handle typical sample rates.

Maybe I just think it's funny. Back in the early 2000s I used to worry about
that and even back then, it's... 44100 samples per second. A whole second to
process no more than 44100 numbers, maybe twice that if you have stereo.
Unless you're doing something really weird (and Rust should be pretty
performant, right?) that shouldn't make the machine even blink.

------
krick
I understand that implementing stuff in Rust can just be fun on its own, but,
I mean, in case if somebody really just wants to solve the problem Michael had
here:

sox -V3 N11379_KSCK.wav part.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile :
restart

------
ngcc_hk
Very interesting. I am starting bird sound and bird ID. Hope this can help.
Thanks.

~~~
jcims
That's awesome! One very simple thing you can do with that is pass the signal
through a bandpass filter (say 2kHz-10kHz) before evaluating power levels to
minimize false positives from wind and other environmental noises.

Another method that can be effective with the identification itself is to
convert the signal to a spectrograph and classify the resulting images with a
neural net.

~~~
jacquesm
Vote for the second, but you won't need a neural net, a sum-of-squared
differences on a set of reference spectra would probably get you 99% there.

------
polishdude20
Isn't this really easy to do on these clips because there is no noise when
there is no audio transmission? It would be awesome if you showed some
examples where there is a bunch of noise between audio transmissions.

