Hacker News new | past | comments | ask | show | jobs | submit login
Mathematical Analysis of The Beatles’ Magical Mystery Chord (kevinhouston.net)
97 points by ColinWright on Dec 14, 2014 | hide | past | web | favorite | 20 comments



This was enjoyable.

I wonder if software exists that is aware of the overtones of each particular note of a given instrument. Theoretically, this would be a more precise way to pull out a chord from a recording without having to manually compensate for them later. Standard Fourier analysis is based on sine waves which aren't the best model for complex tones like that of a piano or a 12-string guitar.

Mathematically, the best approach I'm aware of in the long run would be to study the harmonic spectrum of the exact instrument as closely as possible, and then perform inner products against the spectrum of the mystery sound being analyzed. There's no guarantee that the note-vectors would be orthogonal (probably not), so the problem becomes more interesting in that you might want to perform a sparsity maximization on the resulting output - that is, look for the least number of notes capable of matching the recording within epsilon error in sample (or spectrum) space. Sparsity maximization is NP-complete, so a good way to approximate this is to actually perform L1-minimization using linear programming. Another approach might be to look for a balance between the absolute error term and the sparsity term.


Yes, software like this exists. Using multiple overtones to reconstruct a single note is pretty basic in this field, if you search for "wave to midi" software (for example) you'll find a few that do this. The most advanced right now might be Melodyne [1] (used in the article), which also allows you to pitch-shift the detected notes (and all their partials), without affecting the rest of the sound. However it's quite proprietary, and I don't think the details of their algorithms are available, but it does a good job on a variety of instruments. The polyphonic detection can never be perfect, of course, especially with layered organic instruments such as voice and winds, it becomes impossible to determine what note a particular partial belongs to.

[1]: http://en.wikipedia.org/wiki/Celemony


Agreed -- Just looking for individual spikes in the harmonic spectrum is naive. However, I think even if you have a good idea of which instrument it is there are other factors involved which could make matching a generic instrument model more difficult (e.g. the player, the playing space, instrument variations, etc.). For instance with the french horn (it's what I know) there are sometimes drastic differences between players and instruments -- the most stark being possibly the vienna horn vs. the usual double horn.


If you know the number of notes being played you can solve in polynomial time. You could run it with a few variations in the assumed number of notes.

The harmonic spectrum of each instrument will be extremely difficult to get, though, as each note can have a different waveform and will be affected by unknown non-linear effects in the instrument itself as well as in mixing (i.e. compression or distortion).


Not really. First, you can simulate overtones by where you strike a guitar. Secondly, we don't know the model of the tube amps they were using. Tube amps do some really funky mixing, which can exasperate FFTs by giving weird/wrong results.


>First, you can simulate overtones by where you strike a guitar.

If you know the make of the guitar and the strings and where it was picked each time and the location of each of the tone and volume knobs and the position of the pickup switches.

>Tube amps do some really funky mixing, which can exasperate FFTs by giving weird/wrong results.

>unknown non-linear effects...distortion


Interesting reading. Just today I was doing an internet research on a similar topic.

I want to take a piece of Salsa music and extract a pattern each of the instruments make. To figure out whether it is 2/3 or 3/2 (for clave). Which patterns the bongo plays. Where is the '1' (phrase start). And - of course - I want this all happen auto-magically.

It sounds like there must be a solution for it, but I did not find the actual ready answer for it. And my own music skills are below useful.


Randy Bachman (of Bachman Turner Overdrive) did a very interesting analysis of the chord from a guitar player's perspective on CBC a few years ago:

http://www.openculture.com/2011/12/guitarist_randy_bachman_d...


The author of the linked article addresses Bachman's story, and offers some seemingly compelling reasons why he thinks Bachman was mistaken about some things.

I say "seemingly compelling" because much of this is over my head!


What a wonderful year-end gift to lovers of music, math and the Beatles. So thank you.


So, so many words. But you can stop reading right here:

"The track was from a CD and therefore suffers from compression issues. That is, some of the frequencies may be missing due to the method employed to fit everything on the CD"

Programmers dicking around with audio and getting it all wrong is the new programmers dicking around with graphic design/UX and getting it all wrong.

FFTs really don't work for this stuff. There's not enough frequency resolution at the low end. Even higher up the most common error is an octave error. This can be caused by even harmonic distortion. Also, the brain is great at creating a missing fundamental. Mixing engineers use this to clear out the low frequency mud from a mix.

Tinkering with AutoTune for even a few minutes you'll quickly learn that the name itself is a lie. It doesn't Auto-matically Tune anything. It generates a crude, crappy attempt at tracking pitch (and monophonic only pitch) which then requires a trained musician to wiggle against using a mouse in a tiny window for hours and hours. Get it wrong and it sounds like a robot. Most people get it wrong. "Wrong" has become a sound. Now people get it wrong on purpose.

Melodyne is more interesting but also trips over itself.

This is a hard, hard problem. Even if you can find the loudest frequency and map it to a note, it doesn't mean it's the note in question.


I think the difficulty in audio engineering is in the fact that audio and music in particular are such cultural constructions. Take a modern dance track back a thousand years and it wouldn't be a hit; it would probably get you killed as a witch. No wonder Autotune is so hard to get "right".

When you find the loudest frequency and map it to a note, it is most definitely the note in question (in the mathematical sense). But cultural and stylistic expectations of the human brain might not agree with the math.

That's why I think somewhat surprisingly audio engineering will be a human career as long as music will be a human art. You can't say that about many automatable professions.


Sounds like trying write software to identify and enhance the most "important" elements of paintings, or to clarify the theme of each stanza of a poem.


You accurately describe the many subtleties in audio engineering and pitch perception, but you omit the conclusion that, at least for pathological cases, there is no objective "note in question." Two reasonable people, both musically educated, can have different interpretations of a note, especially in the context of a particular chord or musical arrangement.


Agree, with caveats: in the context of vocal tuning there's often a desired note that represents the melody. And objectively there's often a note that a musician intended to play.

Even if the instrument and signal path cooperate to capture this intent (the inharmonicity of the string itself; nonlinearities of the microphone and amplifiers; resonances in the instrument soundboards and rooms themselves) you're already chasing ghosts!

Then add in a mixing engineer boosting and reducing frequencies, a tape machine varying in speed ever so slightly, and a mastering compression step throwing away amplitude detail -- and you've really got to wonder what the hell someone thought they were doing by running FFTs on a 50 year old analog mix and pretending they've got insight into the original solo instruments.


FFTs actually work for this kind of stuff but not in the way he's trying to apply it.

Instead of trying to reconstruct the sound directly you could use the data for getting hints at which instruments (by looking at their particular characteristic spectral distribution) were involved. From there you could then try to "fill" the observed frequency pattern with the instruments involved at the notes that would seem more plausible.

Still much harder to do than to say but that's what comes to my mind.

If anyone's ever interested in doing this kind of analysis I can't recommend this tool [1] enough, is it one of the most easy to use yet very versatile and helpful when dealing with these kind of problems. It is also free.

[1]: SPEAR, http://www.klingbeil.com/spear/


The tradeoff on a FFT is time vs frequency. So you can have either accurate time data or accurate frequency information. If two notes cross a time window you're going to get the note in between. But if you shrink the time, then you don't get enough frequency resolution. Because pitch is logarithmic (the difference between C1 and D1 is less than 2Hz; C4 and D4 is 32Hz) FFTs really don't work that well for this sort of stuff. Short, low notes in particular are a killer. And the loss of bucket resolution down low makes detection of the fundamental very difficult. The big bump in the wigglegrams that tool provides are not particularly useful for the specific task at hand.


Most music analysis tools use a sliding window combined with a windowing function to correct for most of this. You'll still have your frequencies smeared a little in time if you're plotting f against t, but the windowing helps limit the smear.


What? No... you can approximate both data and frequency as much as you want to.


What tacos is trying to say is that in an FFT, you will have the same number of samples in the time domain and the frequency domain (weeeeellll, I'm simplifying a bit, half of those 'frequency domain' samples are actually phase information, so you really have half as many frequency domain samples as you do time domain). So, if you wish to have a high resolution in the frequency domain, implying lots of samples, you're going to have a total sample time that is quite long, and you won't be able to say exactly when each of the different frequencies was actually present in the sound.

Take a concrete example. Let's assume that I have a recording in PCM at 44.1kHz, and I want to see all of the frequencies present, to a resolution of 10Hz. Nyquist tells us that we only have 22.05kHz of signal present in the sample (well, one can hope, otherwise if the recording didn't filter out everything above that when coding to PCM, we're going to have a mess of aliasing going on, which is outside the scope of this post - just assume the person doing the encoding knew what they were doing, mkay...). So, 22.05kHz. And we want a resolution of 10Hz, so we're going to need 22.05kHz / 10Hz = 2205 samples. Of course, we actually need double that in the time domain (we throw away the phase information remember), so we need 4410 samples in the time domain to get our 2205 samples in the frequency domain. 4410 samples at 44.1kHz sampling is equal to 0.1 seconds.

If I decide that I want a better frequency resolution, say down to 1Hz, then you quickly arrive at the conclusion that you will need a full second of samples. In other words, your resolution in the time domain has decreased as you improve the resolution in the frequency domain.

You can improve a bit on this situation by using a sliding window with a carefully chosen windowing function. The frequency information will still be smeared out in time, but the windowing function reduces the amplitude of a frequency the further it is from the centre of the sampling window, which sharpens up the smear a bit.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: