Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Intentionally subtle headphone crossfeed filter (github.com)
51 points by LnxPrgr3 on Dec 31, 2014 | hide | past | web | favorite | 21 comments



A long time ago, recordings were made by putting a microphone or three in a place near where the audience was sitting at a performance.

If you get it right -- don't flip the phase of the signals, use appropriate microphones, and get good levels all the way through, record to a really good medium -- you can get amazing results.

If you are certain that the music will be played back through headphones, you can embed two mikes in a mock-up of a human head -- better yet, inside the ears, and require playback through in-ear monitors -- and get an amazing sense of realism.

That's not the way most music is produced these days, though. That's like a single-author software project with everything written from scratch: there are some, but not many.

Music is rarely made in real-time at a live performance. It's usually made in an editing suite after dozens, hundreds or even thousands of individual recordings, samples and synthesized effects are made, gathered or licensed. These elements are brought together, altered, fused, and made into a final bitstream that represents the producer's best effort at realizing their musical intent.

You can get amazing things out of that process, too. Works that could not reasonably be made any other way.

However, in that workflow, the original real world spaces in which the source materials were made -- if any -- are mixed and lost. To produce stereo in a two-channel final recording, many tracks are simply assigned to one channel or both channels at specified volumes. This loses or confuses phase and delay information that humans use for spatial location.

The crossfeed filters are another post-processing step, this time in the hands of the audience playing it back, that can simulate some of that spatial location information. It can do this to a greater or lesser degree; the more information available at the beginning of the process (about spatial location), the more can be introduced.

Eventually we might see artists routinely producing multiple final products, some of which have been rendered to virtual spaces for headphones or IEMs, others being set for surround configurations or stereo playback. Better software tools will help with that -- a scriptable 3D GUI that places your virtual instruments and virtual audience in a virtual ambience could be immensely useful.


I agree that space is often lost in modern productions of some genres, mainly folk music. In most other cases however, I'd say loss of space doesn't happen all that much, or isn't a concern. With the most common cases I can think of being:

- You're working in a genre that involves lots of electronic sounds, and space is something that you can go crazy with. Effects that spatialize sound realistically are routinely used and abused by electronic artists. Rock music is also generally mixed with very aggressive panning and spatial effects - it's just cooler that tame realism.

- In the case of soundtracks, mockups might be used instead of getting the music performed. All the good orchestral samplebanks provide a unified sense of space, giving you the choice of mixing near/far mics for example. You might loose some sense of space by mixing different samplebanks, but that's nothing compared to the loss of realism in expressive performance that you'll get anyways.

- In the case of classical music, recording is usually done in a proper, old-school, realistic fashion.


We already have software to independently place sound sources within an acoustic environment[1]. Convolution provides an incredibly convincing sense of space, by directly recording the reverberant properties of a real environment. We can also render multichannel mixes that will work on playback systems from 2 to 64 channels[2]. Unfortunately, we can't rely on the integrity of the signal chain outside of a very limited number of applications, like Dolby-certified movie theatres.

At present, stereo mixing is constrained by the need for compromise. We just don't have the ability to accurately ascertain how the listener is experiencing our mix. They might be using mono equipment that only plays one channel or that sums both, they might be listening over a highly compressed stream or a poor FM signal that mangles the stereo image, they might be using headphones (or one earbud), or a whole range of stereo speaker systems that may or may not have glaring technical faults (channels out of phase, comb filtering due to poor placement etc).

A mix that creates an effective illusion of space on one system can sound downright unpleasant on another, so the main goal is compromise. You narrow the stereo image to avoid hard pans that sound unpleasant on headphones, you avoid big phase differences that would cause cancellation when summed to mono and so on. Modern stereo is basically mono with bells on.

My hope is that high-resolution music services like Pono, Quobuz and Tidal will help reignite an appetite for audio quality and create a market for mixes that demand a quality playback environment. I'm not overly optimistic however - the vast majority of listeners are just indifferent.

[1] https://vsl.co.at/en/Vienna_Software_Package/Vienna_MIR_PRO

[2]https://en.wikipedia.org/wiki/Dolby_Atmos


In-ear binaural microphones can be used to reproduce in with headphones what a listener hears during a live performance.


How expensive would that be? Plenty of people already wear headphones to shows. I bet some of them would love to have a high-quality audio souvenir, replicating exactly what they heard.


One example is the Roland CS-10EM microphones/earphones. A pair is $99 on Amazon.


Disclaimer: was inspired to post by this: https://news.ycombinator.com/item?id=8816344. You should check out his work too. He attempts to simulate surround sound—my program limits itself to stereo audio.

This filter is designed to be unobtrusive—ideally, it's easier to hear the effect of turning it off than turning it on. A huge goal was preserving my headphones' frequency response—I chose them because they sounded great to me in that regard, and I wasn't about to break that to fix stereo imaging.

If you're lacking a Mac or the desire to deal with the libsndfile version, here's a demo: https://www.dropbox.com/s/t5dvah8zke3uky8/HellOfAGuy_demo.mp.... The source audio came from http://www.audiogroup.web.fh-koeln.de/anechoic.html.


What does it actually do?

When a part is recorded to a single stereo track only, it can sound pretty unnatural (sometimes even annoying - makes me feel like I'm half deaf...) on headphones. The only thing I'd really want to do to avoid this is mix the mono signal with the stereo, so that 100% right becomes e.g. 75% right and 25% left. (Fully mono would be 50/50 on all sounds, obviously).


Turns out, we have two left-right localization cues—intensity difference, and timing difference. Sounds on the left hit the right ear a bit late and filtered. This filter aims to reproduce that, effectively mimicking well-placed stereo speakers on headphones.


Thanks for the clarification. I've played a bit with the Haas effect myself, it's a nice way to separate sounds when you do mix for headphones.

Besides the delay, what filter does your program apply when carrying stuff from the left to the right?


This is the goal transfer function (x is frequency in Hz):

    return pow(10, (x <= 250 ? 1 : 1 + 1.5*log2(x/250) + (x > 5000 ? 5*log2(x/5000) : 0)) / -20);
This is the header library that generated the filter to match this as closely as possible: https://gist.github.com/LnxPrgr3/8262666

Then, it's time-reversed and the sign flipped, and an impulse is added at sample 0. This filter translates to mid/side stereo, applies this kernel to the side channel, then converts back to L/R for playback.


Is that based on a comb filter with truncated notches at 250 and 5000 Hz (a 2ms phase separation where applied)?


I've been using this one for foobar2000 since I can remember - http://www.naivesoftware.com/software.html. It's also subtle but effective. The qualitative difference is that it sounds, for example, like the band is playing in front of you, not that you have two speakers stuck to the sides of your head.


I'm not on a Mac so I can't test this right now but I don't know if I think this sounds like a good idea. The stereo image is a big part of the mix down, especially in bass heavy electronic music where a lot of big sounds need to compete for space in the frequency spectrum. Won't this possibly destroy the mix in those cases?


It shouldn't, unless the mix down was done for headphones.

If it wasn't mixed for headphones, the problem is headphones exaggerate stereo separation. See: http://www.cns.nyu.edu/~david/courses/perception/lecturenote...

The goal is to simulate playback on speakers. If it breaks music mastered for speakers, I'd consider that a bug.


Here is an example song that would benefit from a plug in like this: https://www.youtube.com/watch?v=KeIQvxSS2M8. The guitar riff that hard pans from left to right can sound jarring when listening with headphones without a crossfeed filter. With the filter, the sound is more natural.

I'm surprised apple hasn't added crossfeed functionality to any of their audio products. I remember discovering crossfeed back in the early 2000s. My audio player was foobar2000 which had a plugin to do crossfeed. When I switched to Mac in the late 2000s, I had to go back to listening to music without crossfeed. It still perplexes me that with the explosion of high end headphone popularity that this hasn't become a standard feature for audio players.


The most accurate way to make a stereo recording sound like it's coming from two speakers when listening with headphones is to use the head-related transfer function (HRTF), which phases and filters audio sources in a virtual 3-D space in the same way that their positions, the human head, and ears do. When waves pass through the head and ears from different directions, the loudness and frequency content is changed before it reaches the sensors in the inner ear.


I tried this with over half a dozen songs, including the one listed on the README (Ingrid Michaelson- Be Ok), but I am unable see the difference when I turn the crossfeed filter ON/OFF. I can't tell the difference whether the filter is on or off. I am using iPhone's default headphones while listening to these songs.

Am I doing something wrong? Is the difference very subtle?


I would've thought headphone quality mattered, but my boss heard the effect through iPhone earbuds.

The effect is fairly subtle. To make it as clear as possible, try leaving it on for a couple minutes, then switch it off and listen for a bit. Or, play this example, which is the clearest demo I've found: https://www.dropbox.com/s/t5dvah8zke3uky8/HellOfAGuy_demo.mp.... The song's played twice—with and without the filter.


This seems cool.

How portable is the actual filter code? It seems to use Accelerate framework.


The filter lives in crossfeed.c, with an API described in crossfeed.h. It's pure math and makes no system calls itself—should be plenty portable.

The filter designer does use Accelerate, but it's only needed to generate arrays for crossfeed.c. Porting to another FFT lib should be easy enough, though I should mention I need to update the designer in this code: the current version was made through this header library: https://gist.github.com/LnxPrgr3/8262666




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: