Hacker News new | past | comments | ask | show | jobs | submit login
Beamforming in PulseAudio (arunraghavan.net)
121 points by arunarunarun on June 29, 2016 | hide | past | web | favorite | 32 comments

Great to see this work being talked about. I work on the media signal processing team at Google. My team built this beamformer before I started, but I'm happy to see it being used in PulseAudio. The paper hasn't been released, but the nonlinear beamformer code is open source. You can find it in WebRTC.


Kudos to your team -- having access to the fantastic work you guys have done (especially the AudioProcessing module that we use in PulseAudio) has been very useful indeed.

If anyone is looking to experiment with beamforming algorithms, here is a free dataset from MIT with a 1020-microphone array: http://groups.csail.mit.edu/cag/mic-array/

Kudos to the author. It takes a lot to write about something in a way that makes it accessible to folks not skilled in the art.

He's written some other interesting posts on PulseAudio features too, but this is more well-written and approachable:



Have some DSP resources:

Richard G. Lyons, Understanding Digital Signal Processing [0]

Gareth Loy, Musimathics: The Mathematical Foundations of Music (volume 2) [1]

r8brain-free-src (high quality sample rate conversion algorithms) [2]

KVR's DSP forum, frequented by actual pro audio developers [3]

[0] https://www.amazon.com/Understanding-Digital-Signal-Processi...

[1] https://www.amazon.com/gp/product/026251656X/ref=pd_cp_0_1?i...

[2] https://github.com/avaneev/r8brain-free-src

[3] https://www.kvraudio.com/forum/viewforum.php?f=33

The reason the author's beamformer doesn't work very well may have nothing whatsoever to do with his implementation. A short-baseline microphone array won't be able to have good selectivity in the human hearing range no matter what you do. All synthetic aperture systems are fundamentally limited by the wavelength of the thing being sampled. You need an aperture many wavelengths wide in order to get a significant effect at any given frequency. The tiny two-mic device shown in the picture will only be an effective aperture in the near-ultrasonic range.

That's why commercially available phased array microphone systems that actually work are typically one to several meters wide.

I understood his article such that he used the same laptop to test both his own beamforming algo as well as the google one. Since the google one seems to work very well, it seems likely that the limit is not hardware.

I'm actually very surprised that it works as well as it does, based on the sample recordings.

Any claim that someone beat the diffraction limit should be viewed with the same level of skepticism as faster-than-light particles, engines that don't conserve momentum, perpetual motion machines, etc.

I wonder if this could be auto-calibrated, like prompt the user with "sit in a quiet place and say '1,2,3'" then brute-force the audio offsets to get the highest peak signal; and from then on you could have the mic focus follow the user's head as they move it by constantly trying slightly different offsets and jumping to a new offset if one sounds stronger?

If this becomes popular, I could see laptop manufacturers putting information about the microphone position relative to the screen/camera somewhere (in the ACPI tables?). Then there would be less need for calibration.

Of course, we'll have the same problems as all the other information in there, that some cheap equipment will have bogus values as someone just copied the ACPI tables from the previous hardware without verifying the information...

Better yet, you could listen for breathing sounds if they're not speaking, and use those to find speakers. ;-)

Wouldn't it be possible to assume that by default, the user will be placed between both microphones? It would be less than ideal but just keeping the sound that hits both mics at the same time could be a nice default improvement.

That is actually the default with the webrtc beamformer (the sample recordings point straight forwards which effectively works out to being between the two microphones).

Yes this is the default. The beamformer is steerable and you'll notice that a significant portion of the code is responsible for steering.

Repost as a semi-useful thread below didn't meet humor standards and people who aren't logged into HN should see it, too.

If you need an FIR filter, click here and push the button. Generates the code too.


Also if you don't know what you're talking about, kindly refrain from wandering into it in the middle of an article that might otherwise be useful. "An Intro To Beamforming" is a hell of a lot stronger if it doesn't have several flaming errors about basic DSP processing in the middle of it. Those sorts of errors may cause experts discovering you for the first time to avoid your project, not devote time to fixing it.

I see a lot of rudeness on Hacker News, but your comments in this thread went past careless rudeness. You were a genuine asshole. It doesn't matter how mistaken the other person was, or how far short of your technical glory they fall, you can't treat others like that here.

Had this been a first offense, I would have posted something like: "If you know more than other people, the thing to do is provide correct information. Then we all learn." And I'd have pointed out that going on about "several flaming errors about basic[s]", without specifying what those errors actually are, is just a putdown that adds no real information.

But you've done this many times before, we've given you many warnings already, and you've ignored them. That indicates that you don't want to use this site as intended, so I've banned your account.

The "several flaming errors" were previously mentioned here and as comments on his website (not by me). To detail them again could also be construed as piling on. You are reading this thread outside the timeline in which it occurred and taking the most negative possible connotation of my actions.

Even in this thread people can't quite decide where my remarks lie on the spectrum. (I assure you this is not some game I'm playing to test the waters.) Likewise in a previous thread your banhammer was overruled by the community.

I personally think you're overreacting (calling me a "genuine asshole" ... really, dang?) but I don't wish to cause you any additional stress by being an unintentional canary in this coal mine. I will however deduct a few points for waiting for the thread to die, zeroing out an upvoted comment containing useful information, then banning my account at 1am on a holiday weekend. I know you abhor off-topic meta discussion but, as a fellow moderator who handles forums far larger and friendlier than this one, that's pretty weak. Good luck, you have my word that I won't reappear under a different name.

I didn't call you an asshole; I said you had been one in this thread, which was a carefully circumscribed statement. I doubt very much that you are an asshole or even had a negative intention. But what you did is worse than most misbehavior on HN, because it cuts into what should be the heart of this place: substantive, respectful discussion about others' work. Worse, you'd done it repeatedly already, and been asked repeatedly to stop. That's why I chose a negative interpretation in this case.

As for your procedural complaints, holiday weekends (what are those?) had nothing to do with it, 1am enters the picture because it wasn't until 1am that I had time to deal with yesterday's user flags, and there was no 'waiting for the thread to die'. It takes a long time to write comments like the one explaining why we were banning you. Had I wanted to shove this under the rug, I'd not have bothered.

If you'd like to change your ways and treat others respectfully in comments here from now on, I'm more than happy to unban you. Other users have gone from posting assholish comments on a regular basis (I may even have been one myself) to becoming good citizens of this community. If you want to do that too, you're welcome here. But this requires sincerely accepting the model of how HN is supposed to work and doing your best to abide by it. When we give people repeated warnings and they ignore them, it's hard not to conclude that they have no intention of doing so.

There's a difference between being a prick to fellow community members versus being tough on submissions. I rank myself in the top 5% regarding expertise on a few topics that come up here regularly, and most (not many, most) of the submissions are simply tragic. To enter a thread late with a hundred young people politely chatting about something that's completely wrong or completely plagiarized says quite a bit about your ability to maintain decorum. But I don't think it's quite as flattering regarding your ability to maintain quality.

Googling a machine learning or signal processing technique from 40 years ago, seeing a link to HN on the first page of Google results that gets it wrong, and having the only comment that actually provides counterpoint or correction missing because you pushed a magic button? I can't believe that's the impact either of us are looking for with regards to contributing to world knowledge.

Likewise I increasingly find myself reading the top comment then scrolling down and squinting at the dimmed out bottom comment. Half the time it's someone being an idiot. The rest of the time it's useful counterpoint or something correct but imperfectly phrased. Groupthink and tone-policing-by-committee may be a bigger problem than someone who dares to use the adjective "flaming" to qualify the nature of errors in a submission.

I too hate dealing with the stack of complaints at the end of a long day, and I too hate seeing the same names causing trouble. It certainly is hard to remain impartial given even a small number of people who are obsessed with the "report misbehavior" button or, worse, those who harbor grudges and abuse the feature. In general those people need to toughen up and the people who are causing the grief need to turn it down half a notch. But HN, in technology and in personnel, lacks the nuance necessary to correct and communicate evolving community standards.

There's nothing more boring (and more damaging) than a public debate around a moderator's standards versus a popular user who bumps against the edge on occasion. Out of respect for your work (and my time) I'm stepping out as that's precisely where this is headed. Good luck with the site, I'm all too aware of the effort you put into it.

Sad that the thread got flagged into oblivion.

> There is a fair amount of academic work describing methods to perform filtering on a sample to provide a fractional delay. One common way is to apply an FIR filter. However, to keep things simple, the method I chose was the Thiran approximation — the literature suggests that it performs the task reasonably well, and has the advantage of not having to spend a whole lot of CPU cycles first transforming to the frequency domain (which an FIR filter requires).

I'm not a real DSP expert by any stretch of the imagination, but applying an FIR filter does not require transforming to the frequency domain. To the contrary: a basic FIR filter just means that the ith output sample is a[0]x[i] + a[1]x[i-1] + a[2]x[i-2] + ... + a[N-1]x[i-N+1] where x is the input, a is the filter coefficients, and N+1 is the (finite) length of the filter. (If the input is all zeros except that x[0]=1, then the input is an impulse and a is the output, i.e. the impulse response. Hence the name: Finite Impulse Response.)

Some very long FIR filters are more efficient to apply by Fourier transforming the input, but that's almost certainly not the case here. It's worth noting that FIR filters vectorize very nicely.

(My FIR description isn't quite 100% accurate. I described only the discrete-time causal case. If you drop the causality requirement (which is fine but can be awkward in real-time processing) then you add negative indices to a. If you switch to continuous time, you end up with a convolution instead of a sum of products.)

> Sad that the thread got flagged into oblivion.

I sometimes feel the PC threshold is a bit high in some threads. In this case however even I felt the snark level was painful.

As a productive engineer I often put out internal tools or POCs that would be possible to snark at in this way.

I do however expect that people improve them or explain the better way instead of posting dismissive comments on what this guy has done on top of his open source contributions.

That said: I'm happy to see it reposted although I would prefer if he removed the defence for his previous post as it did add confusion to the new post as well.

Thanks for the explanation. I've corrected the post to not assert that the FFT is necessary.


1. I was pretty clear about not being a DSP-person

2. The module provides infrastructure for other beamformer implementations to be plugged in, my implementation is intended to be a trivial test

3. module-beamformer did not ship (the link points to a branch in my git repository)

> What is it about audio that attracts this curious level of "engineering"?

What is it about HN that attracts haters of open source POCs? Why don't you contribute and implement an FIR filter? You probably won't, because it's easier to bash someone's work rather than do the work yourself.


I believe that you know a lot about signal processing, but please don't express it in a snarky, dismissive way. If you can't post civilly and substantively, please don't post at all.



POC? Why is this a race thing?

proof of concept.

For what it's worth, I had exactly the same thought.

My dang-compliant version of the same sentiment would go something like, "If you're going to write posts for public consumption, please consider limiting the topic to things you actually know about. You never know who might read and/or learn from what you write."

Still nothing that isn't already possible in JACK. over ethernet or Wifi.

How do you mean?

Pulseaudio is still trying to catch up to features JACK implemented years ago. One of which is a beamforming implementation that could easily be done in any post-processing application plugged into the JACK chain (Trivial to do in Ardour or even FL Studio under WineASIO (Wine for Realtime music applications). This has been done in live recording environments using a couple of mic'ed-up laptops over 802.11n/ac for a while now; producing a properly-sync'ed master recording of the performance takes less than 15 minutes after the show is complete to finish.

Jack is a audio daemon for Linux that performs a similar function to Pulseaudio, but is intended for a wholly different audience.

Jack is intended as a low-latency audio and midi router for audio production. It routes audio data between applications and whatever analog/digital/midi I/O you have plugged into it that Linux supports reasonably well. In other words it's for build a audio workstation.

So you could, provided you had the skills, quite easily create a audio processing pipeline that will do any type of 'beamforming' you want and more.

Pulseaudio on the other hand is designed as a general purpose desktop and mobile audio daemon that is designed to make it simple to manage typical hardware you find in a typical PC. This sort of stuff is designed to make it nice to have a webcam for simple blog or talking to your mom, not necessarily the best choice if you want to run a music studio from your laptop.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact