
Beamforming in PulseAudio - arunarunarun
https://arunraghavan.net/2016/06/beamforming-in-pulseaudio/
======
mgraczyk
Great to see this work being talked about. I work on the media signal
processing team at Google. My team built this beamformer before I started, but
I'm happy to see it being used in PulseAudio. The paper hasn't been released,
but the nonlinear beamformer code is open source. You can find it in WebRTC.

[https://chromium.googlesource.com/external/webrtc/+/master/w...](https://chromium.googlesource.com/external/webrtc/+/master/webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc)

~~~
arunarunarun
Kudos to your team -- having access to the fantastic work you guys have done
(especially the AudioProcessing module that we use in PulseAudio) has been
very useful indeed.

------
JorgeGT
If anyone is looking to experiment with beamforming algorithms, here is a free
dataset from MIT with a 1020-microphone array:
[http://groups.csail.mit.edu/cag/mic-
array/](http://groups.csail.mit.edu/cag/mic-array/)

------
mmastrac
Kudos to the author. It takes a lot to write about something in a way that
makes it accessible to folks not skilled in the art.

He's written some other interesting posts on PulseAudio features too, but this
is more well-written and approachable:

[https://arunraghavan.net/2016/05/improvements-to-
pulseaudios...](https://arunraghavan.net/2016/05/improvements-to-pulseaudios-
echo-cancellation/)

[https://arunraghavan.net/2011/08/hello-hello-
hello/](https://arunraghavan.net/2011/08/hello-hello-hello/)

------
chipsy
Have some DSP resources:

Richard G. Lyons, Understanding Digital Signal Processing [0]

Gareth Loy, Musimathics: The Mathematical Foundations of Music (volume 2) [1]

r8brain-free-src (high quality sample rate conversion algorithms) [2]

KVR's DSP forum, frequented by actual pro audio developers [3]

[0] [https://www.amazon.com/Understanding-Digital-Signal-
Processi...](https://www.amazon.com/Understanding-Digital-Signal-
Processing-3rd/dp/0137027419?ie=UTF8&tag=stackoverfl08-20)

[1]
[https://www.amazon.com/gp/product/026251656X/ref=pd_cp_0_1?i...](https://www.amazon.com/gp/product/026251656X/ref=pd_cp_0_1?ie=UTF8&refRID=589F25CMK0PJ0P3B1WRW)

[2] [https://github.com/avaneev/r8brain-free-
src](https://github.com/avaneev/r8brain-free-src)

[3]
[https://www.kvraudio.com/forum/viewforum.php?f=33](https://www.kvraudio.com/forum/viewforum.php?f=33)

------
brandmeyer
The reason the author's beamformer doesn't work very well may have nothing
whatsoever to do with his implementation. A short-baseline microphone array
won't be able to have good selectivity in the human hearing range no matter
what you do. All synthetic aperture systems are fundamentally limited by the
wavelength of the thing being sampled. You need an aperture many wavelengths
wide in order to get a significant effect at any given frequency. The tiny
two-mic device shown in the picture will only be an effective aperture in the
near-ultrasonic range.

That's why commercially available phased array microphone systems that
actually work are typically one to several meters wide.

~~~
throwaway7767
I understood his article such that he used the same laptop to test both his
own beamforming algo as well as the google one. Since the google one seems to
work very well, it seems likely that the limit is not hardware.

I'm actually very surprised that it works as well as it does, based on the
sample recordings.

~~~
brandmeyer
Any claim that someone beat the diffraction limit should be viewed with the
same level of skepticism as faster-than-light particles, engines that don't
conserve momentum, perpetual motion machines, etc.

------
Shish2k
I wonder if this could be auto-calibrated, like prompt the user with "sit in a
quiet place and say '1,2,3'" then brute-force the audio offsets to get the
highest peak signal; and from then on you could have the mic focus follow the
user's head as they move it by constantly trying slightly different offsets
and jumping to a new offset if one sounds stronger?

~~~
throwaway7767
If this becomes popular, I could see laptop manufacturers putting information
about the microphone position relative to the screen/camera somewhere (in the
ACPI tables?). Then there would be less need for calibration.

Of course, we'll have the same problems as all the other information in there,
that some cheap equipment will have bogus values as someone just copied the
ACPI tables from the previous hardware without verifying the information...

------
zimbatm
Wouldn't it be possible to assume that by default, the user will be placed
between both microphones? It would be less than ideal but just keeping the
sound that hits both mics at the same time could be a nice default
improvement.

~~~
arunarunarun
That is actually the default with the webrtc beamformer (the sample recordings
point straight forwards which effectively works out to being between the two
microphones).

~~~
mgraczyk
Yes this is the default. The beamformer is steerable and you'll notice that a
significant portion of the code is responsible for steering.

------
tacos
Repost as a semi-useful thread below didn't meet humor standards and people
who aren't logged into HN should see it, too.

If you need an FIR filter, click here and push the button. Generates the code
too.

[http://t-filter.engineerjs.com/](http://t-filter.engineerjs.com/)

Also if you don't know what you're talking about, kindly refrain from
wandering into it in the middle of an article that might otherwise be useful.
"An Intro To Beamforming" is a hell of a lot stronger if it doesn't have
several flaming errors about basic DSP processing in the middle of it. Those
sorts of errors may cause experts discovering you for the first time to avoid
your project, not devote time to fixing it.

~~~
amluto
Sad that the thread got flagged into oblivion.

> There is a fair amount of academic work describing methods to perform
> filtering on a sample to provide a fractional delay. One common way is to
> apply an FIR filter. However, to keep things simple, the method I chose was
> the Thiran approximation — the literature suggests that it performs the task
> reasonably well, and has the advantage of not having to spend a whole lot of
> CPU cycles first transforming to the frequency domain (which an FIR filter
> requires).

I'm not a real DSP expert by any stretch of the imagination, but applying an
FIR filter does _not_ require transforming to the frequency domain. To the
contrary: a basic FIR filter just means that the ith output sample is a[0]
_x[i] + a[1]_ x[i-1] + a[2] _x[i-2] + ... + a[N-1]_ x[i-N+1] where x is the
input, a is the filter coefficients, and N+1 is the (finite) length of the
filter. (If the input is all zeros except that x[0]=1, then the input is an
impulse and a is the output, i.e. the impulse response. Hence the name: Finite
Impulse Response.)

Some very long FIR filters are more efficient to apply by Fourier transforming
the input, but that's almost certainly not the case here. It's worth noting
that FIR filters vectorize _very_ nicely.

(My FIR description isn't quite 100% accurate. I described only the discrete-
time causal case. If you drop the causality requirement (which is fine but can
be awkward in real-time processing) then you add negative indices to a. If you
switch to continuous time, you end up with a convolution instead of a sum of
products.)

~~~
reitanqild
> Sad that the thread got flagged into oblivion.

I sometimes feel the PC threshold is a bit high in some threads. In this case
however even I felt the snark level was painful.

As a productive engineer I often put out internal tools or POCs that would be
possible to snark at in this way.

I do however expect that people improve them or explain the better way instead
of posting dismissive comments on what this guy has done _on top of_ his open
source contributions.

That said: I'm happy to see it reposted although I would prefer if he removed
the defence for his previous post as it did add confusion to the new post as
well.

------
anonbanker
Still nothing that isn't already possible in JACK. over ethernet or Wifi.

~~~
Bromskloss
How do you mean?

~~~
anonbanker
Pulseaudio is still trying to catch up to features JACK implemented years ago.
One of which is a beamforming implementation that could easily be done in any
post-processing application plugged into the JACK chain (Trivial to do in
Ardour or even FL Studio under WineASIO (Wine for Realtime music
applications). This has been done in live recording environments using a
couple of mic'ed-up laptops over 802.11n/ac for a while now; producing a
properly-sync'ed master recording of the performance takes less than 15
minutes after the show is complete to finish.

