I believe this minimal implementation actually over-compensates for pan law because it has a -6dB center. In other words, if you ask this tool for zero panning, it will actually divide the amplitude by 2 (probably not what most people would expect!) In DAWs, -3dB or 0dB center is more common. In a tool like this I would suggest a 0dB center by default, so that zero panning does not change the amplitude. This would be what you second link calls triangular pan law.
Having had to do some mass audio stuff on datasets recently I discovered SoX - a self proclaimed "swiss army knife" for sound processing. Delighted to find multiple bindings in Go, I eventually decided to just do it in python as there was a very solid library there and it was just simpler for my use case.
Yeah you're right it is. I was strongly considering just writing a bash script but I had a bunch of probabilistic stuff going on for using effects, so stuck with python.
This is a fun tutorial for processing audio in Go, but generally panning will sound bad if you only adjust the amplitude/level.
The two important missing pieces are phase delay and frequency-specific attenuation (the "head" model). Audio coming from the left side of your head reaches your left ear first, then your right ear later. Also, your head blocks certain frequencies more than others. Both of these effects are super noticeable and necessary to make panning sound "3D".
Adding these things to the Go code would actually be fairly straightforward, you just need to add a filter to `applyPan`. The filter would only need around 12 taps to sound good.
Here's a javascript tutorial which includes both of these additional features:
You say this like it's bog standard but adding HRTF and compensation for speaker distance are not part of a standard panning algorithm. This is more likely to exist in another module, such as for ambisonics or correction.
HRTF is often used in game audio and AR [2], but never [1] as a mix effect in commercial DAWs.
HRTF is a blunt instrument anyway. It's supposed to model your head and not just a standard imaginary generic head, and that's impossible without sticking a microphone inside each ear.
It's also a headphone-only effect. As soon as you play the sound through speakers you get room reflections and all kinds of other artefacts which wash out the spatial detail.
[1] Almost never. There are HRTF plugins, but they're hardly ever used in music mixing.
[2] Apple has some HRTF patents for recording audio on multiple channels - not just two - using HRTFs to capture apparent depth.
Sorry this is just wrong. What is your basis for saying that HRTF are never used in commerical DAWs? Have you read anything by Bob Katz or Rashad Becker? Or perhaps you're not aware of the several options people reach for to perform multichannel mixing using ambisonics:
It isn't about speaker distance, it's about distance between ears.
I agree that HRTFs are uncommon in mixing, but spatialization without some frequency dependent attenuation generally sounds "flat" with headphones.
Although VR systems and big budget spatialization like Atmos will account for speaker distance (so called near field effects) it's almost impossible to notice those effects for stationary sources and no head-tracking.
The tutorial you linked to doesn't explain how to implement any of that stuff. But it doesn't matter because the panner in the article is fine as it is. Simple panning is often the best option in music, since filtering your sounds will just make your mix sound bad on common sound systems. Left/right delay will be great on headphones but create phase effects on common speaker setups, often making things worse as well. I don't think a general-purpose tool should add potentially unwanted effects like that.
To be fair, it’s not general, it’s a “speaker only” tool. For headphones, it’s incredibly unnatural to have audio only coming into one ear without a delayed signal in the other. That simply never happens in the real world.
that can easily get pretty tricky if you don't want to depend a lot on libraries, which the last time I checked weren't great for golang in this area. I have worked with ALSA (advanced linux sound architecture) a bit, implementing basic midi input from a keyboard + sound generators + plugins with go interfaces and pkg/plugin + stream to audio output... and it's a mess of devices, buffers and formats. and that's only for one OS. and if you start doing anything non-trivial, making the programs work well in real-time is not for the faint of heart.
if there was a single format supported everywhere, minimum buffer sizes and a common API for all OSes, it would be a whole another, much more pleasant story
[ Edited/Deleted reply because I missed the fact that this code distributes a single mono signal across 2 outputs. The terminology for this stuff is never totally clear: some people would call this a mono panner, some would call it a stereo panner, some call it a 1in/2out panner ]
Hey there, yep I'm actually aware of what you're saying. I actually wrote this at the bottom of the post:
"There is actually a flaw with this panning function that we are using. However it is not apparent to us yet because we can only set a pan for an entire audio source"
I'm working from something simple up to something more complex, but tackling it in small parts. Next I'm writing about applying breakpoints where the pan can be set throughout the track and you can notice the power dip - and then we'll work on fixing that. ;-)
* https://en.wikipedia.org/wiki/Pan_law
* https://www.image-line.com/support/flstudio_online_manual/ht... (search for "panning law")