If you want a really great approach to noise gating, a fixed threshold is fine but it works better when you apply it to the difference of two envelope followers - one with a short attack, long release (tracks input) and long attack, short release (tracks noise floor). Takes a bit to set it up, but it's a stupid simple way to get extremely effective gating and is easy to fine tune for your application. A lot of Voice Activity Detection (VAD) works this way; it's just a matter of tuning the coefficients and thresholds for your input.
Also useful reference for envelope following are the DAFX text , Will Pirkle's textbook on audio in C++  and Zölzer's text 
(pdfs can be found around the internet)
This has the benefit of producing more natural results while keeping speech understandable. It might interest you, because with a 2000x speed marging you could still crank the algorithm up.
So yes, an (uncompressed) audio stream is just a sequence of samples of pressure (or displacement, depending on the medium).
The variety you get comes from the high sampling frequency: a high sampling frequency allows representing vibrations of the air pressure at many frequencies.
> I had always assumed you needed to record different wavelengths of sound to capture different pitches
Why? Look at a mono vinyl record. It has a single groove, which is the soundwave carved into the vinyl. Digital audio is the same, except you store a digital image of the groove.
>Is my mental model completely wrong?
>How do you get such variety in sound just from recording the volume?
By recording the volume 44100 times per second.
Samples of pressure (or displacement, minor phase difference), so kinda yeah.
How one gets so much variety? By sampling it really often. If you sample it 40000 times per second you actually get the full information for all pitches that are between 0 and 20000Hz.
If you don’t sample it often enough you don’t get all the information needed to store the sound. So yes we kinda indirectly do capture all the different pitches.
The magic is the linearity of the signals. They are additive. The different signals pass through the same stream of samples without interacting.
If you add volumes from sine wave in 400 Hz to sine wave in 300 Hz, you can transfer them together in a stream of samples , then separate them later.
Mathematically: f(at + bt) = af(t) + bf(t)
So different wavelengths added together will tell your brain there are different pitches at the same time.
When all those different wavelengths are using the same volume at the same time you can imagine that all those added waves will become a mess. That is what is called noise.
But when a high pitch wave is 'traveling' on top of a low pitch wave your brain can distinguish the different pitches.
What reaches your ears is continuous differences in the air pressure. That's more-or-less a single changing 'amplitude'. From that, your brain can pull apart the fundamental and overtone pitches from each source ... and sort them out so you can identify each one. Amazing technology ... almost indistinguishable from magic!
These days lots of people haven't seen audio on an oscilloscope ... which shows changes in amplitude with time. Look at this video. You'll see that at each point in time (left to right), there's ONE amplitude, whether he plays one string or more.
Both exist and are beeing used, but Samples are the default in uncompressed audio.
Depends on what you mean by 'record'. Generally the air pressure is sampled instantaneously, but it can be broken down into frequency components before recording/perceiving. That's kind of how our ears work and in DSP you do an FFT (which you can then pass through an iFFT to synthesize the original signal).
Of course it's a fun software exercise but if you were building a hardware product out of it you'd be better off splitting the audio into distinct transmissions based on the squelch signal rather than audio gate...
As it happens some modern avionic radio systems let you individually play back the most recent few messages for exactly the reasons the author describes (it's easy to miss)
Author here. Luckily for my purposes I can rely on the radio already having a squelch, so I could get away with using the naive solution.
If it was the raw audio I'd probably need to implement something more sophisticated such as splitting the audio by frequency and giving each band a noise threshold (as atoav mentioned).
I'm not familiar with RX7, maybe it does better audio quality but it's probably also got a better interface/workflow. Which is definitely worth a lot in a professional setting.
Plus I don’t think it’s fair to say “not anywhere as acceptable as doing it in code yourself”. I understand the point of your disclaimer but as someone who’s been writing code for three decades, I really love the fact that we no longer have to write our own tools.
I used to work in the music industry, where there are many other real-time processing tools that can be applied to radio voice to increase clarity (eg: removing the static noise under speech, re-equalizing the limited bandwidth to increase perceptibility, de-reverberation etc..)
Considering such radio communications are often used in mission-critical scenarios, one would think clarity of speech would be a factor to consider.
For the first couple of video courses I made my work flow was:
- Open REAPER (a software DAW with a business model like Sublime Text)
- Create and save a noise gate sample based on my specific room noise using its "ReaFIR" VST plugin until it was good
- Use a piece of software (ASIO Link Pro) to act as a virtual audio switch
- Open my video recording tool of choice (Camtasia)
- Configure Camtasia's microphone to be a virtual device that ASIO Link Pro created on my system
So now the end result is the audio being recorded live is already processed by REAPER and has no background noise. I also added in effects like a compressor and EQ too to offset my very thin microphone at the time. Totally seamless with no latency issues. You could do the same thing with OBS but OBS is even easier since you can directly use VST plugins so you don't even need REAPER or ASIO Link Pro running. You can download REAPER's VSTs separately.
Nowadays I just use hardware to do the same thing in real time so I don't need to think about it and I have flexibility in the recording tools I use, although the above approach works fine once you have it all configured. You would just open REAPER before your recording tool and it was ready to go.
Personally I would never go back to editing in post production for recording voices with a microphone. The amount of time the above saves is massive because it's all done on the fly with zero human intervention once it's set up. There's no complex import / export / import flow needed or fiddling with things for every recording.
Digital communication is also far more structured, which makes it possible to implement a large number of techniques for improving signal quality.
Almost all digital communication systems would have something along: adaptive equalization, carrier tracking, symbol clock tracking, forward error correction, and many, many more techniques.
I don’t know of any online resources – but I’d love to have the time to write some signal processing for communication is fascinating and has a wide impact on everyday life.
Also wrt. implementations: I think GNU Radio might be a good place to look, but honestly the actual implementation of these algorithms is often very simple, it is the theory behind them that gets hairy.
I'm pretty sure you should be able to code a thing like a noise gate in 10-15 lines of code.
... and then there's this rather odd statement:
> If we want to use the NoiseGate in realtime applications we’ll need to make sure it can handle typical sample rates.
Maybe I just think it's funny. Back in the early 2000s I used to worry about that and even back then, it's... 44100 samples per second. A whole second to process no more than 44100 numbers, maybe twice that if you have stereo. Unless you're doing something really weird (and Rust should be pretty performant, right?) that shouldn't make the machine even blink.
sox -V3 N11379_KSCK.wav part.wav silence 1 3.0 0.1% 1 0.3 0.1% : newfile : restart
Another method that can be effective with the identification itself is to convert the signal to a spectrograph and classify the resulting images with a neural net.