If you get it right -- don't flip the phase of the signals, use appropriate microphones, and get good levels all the way through, record to a really good medium -- you can get amazing results.
If you are certain that the music will be played back through headphones, you can embed two mikes in a mock-up of a human head -- better yet, inside the ears, and require playback through in-ear monitors -- and get an amazing sense of realism.
That's not the way most music is produced these days, though. That's like a single-author software project with everything written from scratch: there are some, but not many.
Music is rarely made in real-time at a live performance. It's usually made in an editing suite after dozens, hundreds or even thousands of individual recordings, samples and synthesized effects are made, gathered or licensed. These elements are brought together, altered, fused, and made into a final bitstream that represents the producer's best effort at realizing their musical intent.
You can get amazing things out of that process, too. Works that could not reasonably be made any other way.
However, in that workflow, the original real world spaces in which the source materials were made -- if any -- are mixed and lost. To produce stereo in a two-channel final recording, many tracks are simply assigned to one channel or both channels at specified volumes. This loses or confuses phase and delay information that humans use for spatial location.
The crossfeed filters are another post-processing step, this time in the hands of the audience playing it back, that can simulate some of that spatial location information. It can do this to a greater or lesser degree; the more information available at the beginning of the process (about spatial location), the more can be introduced.
Eventually we might see artists routinely producing multiple final products, some of which have been rendered to virtual spaces for headphones or IEMs, others being set for surround configurations or stereo playback. Better software tools will help with that -- a scriptable 3D GUI that places your virtual instruments and virtual audience in a virtual ambience could be immensely useful.
- You're working in a genre that involves lots of electronic sounds, and space is something that you can go crazy with. Effects that spatialize sound realistically are routinely used and abused by electronic artists. Rock music is also generally mixed with very aggressive panning and spatial effects - it's just cooler that tame realism.
- In the case of soundtracks, mockups might be used instead of getting the music performed. All the good orchestral samplebanks provide a unified sense of space, giving you the choice of mixing near/far mics for example. You might loose some sense of space by mixing different samplebanks, but that's nothing compared to the loss of realism in expressive performance that you'll get anyways.
- In the case of classical music, recording is usually done in a proper, old-school, realistic fashion.
At present, stereo mixing is constrained by the need for compromise. We just don't have the ability to accurately ascertain how the listener is experiencing our mix. They might be using mono equipment that only plays one channel or that sums both, they might be listening over a highly compressed stream or a poor FM signal that mangles the stereo image, they might be using headphones (or one earbud), or a whole range of stereo speaker systems that may or may not have glaring technical faults (channels out of phase, comb filtering due to poor placement etc).
A mix that creates an effective illusion of space on one system can sound downright unpleasant on another, so the main goal is compromise. You narrow the stereo image to avoid hard pans that sound unpleasant on headphones, you avoid big phase differences that would cause cancellation when summed to mono and so on. Modern stereo is basically mono with bells on.
My hope is that high-resolution music services like Pono, Quobuz and Tidal will help reignite an appetite for audio quality and create a market for mixes that demand a quality playback environment. I'm not overly optimistic however - the vast majority of listeners are just indifferent.
This filter is designed to be unobtrusive—ideally, it's easier to hear the effect of turning it off than turning it on. A huge goal was preserving my headphones' frequency response—I chose them because they sounded great to me in that regard, and I wasn't about to break that to fix stereo imaging.
If you're lacking a Mac or the desire to deal with the libsndfile version, here's a demo: https://www.dropbox.com/s/t5dvah8zke3uky8/HellOfAGuy_demo.mp.... The source audio came from http://www.audiogroup.web.fh-koeln.de/anechoic.html.
When a part is recorded to a single stereo track only, it can sound pretty unnatural (sometimes even annoying - makes me feel like I'm half deaf...) on headphones. The only thing I'd really want to do to avoid this is mix the mono signal with the stereo, so that 100% right becomes e.g. 75% right and 25% left. (Fully mono would be 50/50 on all sounds, obviously).
Besides the delay, what filter does your program apply when carrying stuff from the left to the right?
return pow(10, (x <= 250 ? 1 : 1 + 1.5*log2(x/250) + (x > 5000 ? 5*log2(x/5000) : 0)) / -20);
Then, it's time-reversed and the sign flipped, and an impulse is added at sample 0. This filter translates to mid/side stereo, applies this kernel to the side channel, then converts back to L/R for playback.
If it wasn't mixed for headphones, the problem is headphones exaggerate stereo separation. See: http://www.cns.nyu.edu/~david/courses/perception/lecturenote...
The goal is to simulate playback on speakers. If it breaks music mastered for speakers, I'd consider that a bug.
I'm surprised apple hasn't added crossfeed functionality to any of their audio products. I remember discovering crossfeed back in the early 2000s. My audio player was foobar2000 which had a plugin to do crossfeed. When I switched to Mac in the late 2000s, I had to go back to listening to music without crossfeed. It still perplexes me that with the explosion of high end headphone popularity that this hasn't become a standard feature for audio players.
Am I doing something wrong? Is the difference very subtle?
The effect is fairly subtle. To make it as clear as possible, try leaving it on for a couple minutes, then switch it off and listen for a bit. Or, play this example, which is the clearest demo I've found: https://www.dropbox.com/s/t5dvah8zke3uky8/HellOfAGuy_demo.mp.... The song's played twice—with and without the filter.
How portable is the actual filter code? It seems to use Accelerate framework.
The filter designer does use Accelerate, but it's only needed to generate arrays for crossfeed.c. Porting to another FFT lib should be easy enough, though I should mention I need to update the designer in this code: the current version was made through this header library: https://gist.github.com/LnxPrgr3/8262666