This is really, really awesome. Some other ideas that would be really cool:
1) Do key detection and pitch-shift all the loops to a common key before processing. That might make more of the melodies come through the eigenvectors.
2) Visualize the loop point cloud - maybe with like 10-50 dimensions of PCA followed by 2 or 3 dimensions T-SNE.
3) Maybe some form of earlier dimensionality reduction? I.e. you could do a short-time fourier transform and then threshold and bin frequencies - then invert the transform to reduce the sounds to their basic characteristics. That make make it so not all of the first 20 or so eigenvectors all have slightly different kick drums. For example, if you have kick drums with fundamentals at 21Hz, 22Hz, 23Hz, 24Hz etc. those will each require two eigenvectors to represent the sine or cosine phases of the signal - but if you could "project" every kick drum sound so they were close to linearly related then PCA could isolate them with fewer eigenvectors.
I would love to play with some sort of live music generation system based on this - really, really interesting ideas. And goes to show what can be built with traditional data analysis techniques and a clever idea!
EDIT: Also if you uploaded your preprocessed data I think that would be really amazing.
Thank you for these great suggestions. I will make the dataset available eventually. It's about 15 gigs so I just need some time to do it right. I will let you know when I have it released.
These are all good suggestions. "Earlier dimensionality reduction" could possibly be used to reduce noise in the eigenvectors: while PCA removes noise as defined by dimensions with low overall variance, the results still sound noisy from a sound engineering perspective.
I saw some research years ago about breaking down sound into pitched and unpitched components, using some kind of kernel transformation in fourier space. If I remember rightly this reduced the usual 'artifacts' you get when manipulating fourier transforms of white noise. Perhaps a good candidate for this case. Sorry I can't remember more though I think the talk was by this guy https://www.york.ac.uk/music/staff/academic/jez-wells/
Either way, to the OP who I presume can only be Dr Brady of the Museum of Techno; good work old chap, you deserve a sherry.
How do you visualize 10-50 dimensions? I'm asking because in Algebra courses I was told more than once to try not to visualize vectors of higher dimensions, and I understood that to mean it would at least be difficult.
It's often a fools errand, but on real-world datasets people make scatter plots after using t-sne to project to 2 or 3 dimensions - in many cases it works out alright.
Hi, I am the author of the article. I would appreciate any comments/questions/suggestions.
Update: I switched the audio files to Soundcloud now so everything should work again.
Eigencrisis! It looks like the UCI servers are having trouble. I uploaded the audio files to the following Soundcloud playlist and I am working on embedding the soundcloud player in place of the files.
First and foremost: This is really cool, and thank you for sharing! (It's also the first explanation of the terms in the matrix equation for SVD that I've happened across that has really clocked for me: much appreciated.)
Here's my worry, speaking as someone who knows physics but not all that much audio processing. I feel like this approach will inevitably put a whole lot of (unhelpful!) emphasis on the detailed phase information of the various sounds. The near-perfect cancellation of the average track illustrates this: two loops of the exact same bass drum beat offset by a fraction of a second will be treated as orthogonal or even opposite by this algorithm, but they're essentially identical as perceived by the listener.
Conceptually, I imagine what you'd want is some way of encoding the various loops whose average came out sounding like a real average, rather than as nearly silent due to phase cancellation. My first instinct is to say "take the FFT of each loop first, and then run your PCA on that". Maybe that's not the right answer: like I said, I'm not an audio processing expert. But I suspect that right now your analysis is spending a huge fraction of its effort effectively trying to get the first drum beat to happen at precisely the right fraction of a second, and separately to get the second drum beat to happen at precisely the right fraction of a second, and separately the third, and the fourth, and so on. And heaven help you if the different loops' bass drums are tuned to marginally different notes.
Edit: Only after writing this did it sink in that the top overall comment's remarks (by "highd") about earlier dimensional reduction were getting at the same issue. I'll leave this here, just in case the different framing is useful.
That's my thinking too. PCA completely ignores the phase shift symmetry and treats a phase-shifted sine wave as a different thing.
One could definitely get something better if they took the audio-loops as translation-invariant. I thought about trying to do a symmetry-invariant (or 'group-action-invariant') PCA but I could not find a good way of doing that. More advanced methods, e.g. 1-d convolutional networks, WaveNet etc. do have this translation-invariance built into them.
Interestingly, the second eigenvector is not just a phase-shifted version of the first eigenvector but contains slightly higher frequency information as well, but I agree totally with what you said.
In [1]: X = randn(10000,100)
In [2]: timeit apply_along_axis(np.mean, 0, X)
100 loops, best of 3: 7.91 ms per loop
In [3]: timeit X.mean(axis=0)
1000 loops, best of 3: 1.21 ms per loop
This is impressive, did you actually hand-prepare loops from 10k tracks? Even at 1 track per minute, which is pretty fast, that's 166 hours, a whole month of working full time!
This is great. I could totally see "21 Transition 1420 Twm" appearing on an album as a self-deprecating joke. Mathematically generic techno and it still sound good :)
Even though the sampling loses information about the wave; from the sampled data, your computer’s digital-to-analog converter can perfectly reconstruct the portion of the sound that contains all the frequencies up to half of the sampling frequency. This is the Nyquist Theorem. So we are O.K.
Pedantic note: This is only true in a theoretical setting where you have infinitely precise samples. Since each sample has only so-and-so many bits, and calculations also are lossy, you cannot reconstruct the wave perfectly. But it's good enough. There are also formats with more bits per sample to increase quality.
Beautiful work and I think he should probably make a vst or something out of. Producers are always looking for new ways to break sounds apart and put them back together.
FFT filters and the like are already a thing, I'm not sure what extra practical benefit you think this would bring. Nevertheless it's a good start and well-documented.
PCA can decompose sound to more accurate base functions (if you look at covariance), so if you combine a few of them together with different coefficients, you might be able to generate some cool sounds much faster than with traditional methods. Later you might feed this to a RNN and perhaps compose brand new good sounding songs automatically, make a new Vocaloid band in Japan and go on a tour, generating new music on each performance ;-)
I am actually very interested in its application to musical patterns, ie the actual notes rather than the audio. I think there's already a tool that uses this to generate rich and musically-correct MIDI on the fly but I'm having trouble remembering the name/manufacturer now. Future Retro maybe.
I don't get it. I guess he is deconstructing music by PCA instead of frequency or time. What is a practical application or interpretation of this? (not being critical, just trying to understand the motivation)
IIRC PCA finds the "stochastically optimal" base functions for you, whereas with FFT, DCT etc. you supply base functions yourself and then just do simple linear projections to this base.
Thank you for your question. At this point, this is just to see how PCA works on this kind of data-set. Normally, PCA is used to reduce the dimension of the data in a way that loses less information. It can be useful for saving computation on classification tasks for example.
Its a cool exploration for curiosity's sake. Seems weird to use it vs Fft though because while some have said PCA is supposed to be "more optimal" , we have prior knowledge that the music is actually constructed along the FFT dimensions. So I wouldnt expect PCA to be any better
FFT and PCA are totally different. FFT is a fixed linear transform - PCA finds the linear projection to k-space that retains the most variance from the original dataset. If you projected to the top k frequencies you'd just get a couple of tones, while PCA finds linear combinations of the original signal that are most "descriptive" in a sense.
Also human hearing is much closer to a wavelet transform or STFT than FFT.
I was starting on the opposite tack, tho haven't got far, synthesizing/sampling kicks, snares and bass lines from scratch, which is prett tricky, like a kick is a sine wave swept down, play with ADSR envelope, saturate/compress, then you typically layer 3 separate kick tracks by phase aligning and more post processing. There's a lot to getting a good sounding kick with presence, attack, body.
The reference on this is Rick Snoman's Dance Music Manual, 3rd ed.
IMO you are limiting yourself by being in the analog mental model - ADSR, oscillators etc. are what made analog circuits generate sound but in the digital world they aren't necessary anymore and you can view all of it as just (periodic) function compositions in different spaces and model far beyond what analog can reach. Some new VSTis completely abandoned analog-style already.
From a musical perspective the analog mental model produces very clean and high quality results. My experience of advanced digital techniques is that they sound somehow lower quality.
This is not to be confused with the analog/digital debate, In which I am a fan of digital: digital synths that use the analog mental model can and do sound great e.g. virus, sylenth, zebra, fm8. It's more that the analog model evolved from how our ears perceive things while newer paradigms while mathematically neat often lead to unwanted artifacts, or boil down to much the same sort of thing as before e.g. NI Razor.
Then again we might be talking about different things bitL, if you want to post links to the sort of sounds you're talking about then I'm all ears.
Rick Snoman is fairly basic though I'd say the definitive synth guide was the old Sound on Sound Synth Secrets series (google). Or better just hang out on forums with more experienced people and post up links to sounds you like saying 'how do I make this?'
> It's more that the analog model evolved from how our ears perceive things while newer paradigms while mathematically neat often lead to unwanted artifacts,
There are basically two approaches to designing sounds: you either model perception of audio or you model the physics that produces audio. Analog (subtractive, additive, FM) synthesis is basically entirely in the first camp. Digital is strictly more flexible so you have more options, like Karplus Strong, which gives you a simplified model of a plucked string.
The most "mathematically neat" synthesis techniques are still the old analog models: you start with a simple mathematical function like a sawtooth wave and put it through a four-pole filter. Or you stack a few different overtones on top of a fundamental. It's hard to get more simple/neat in terms of the math. The digital synths can be fairly complicated, mathematically, when they're not emulating analog synths.
I don't know what you're talking about when you say that advanced digital techniques sound somehow lower quality.
Things like impulse responses are everywhere, like reverb, guitar amp sims, and distortion plugins. They sound better than ever.
> or boil down to much the same sort of thing as before e.g. NI Razor.
Wait, what? I own a copy of NI Razor and I don't see how it boils down to the "same sort of thing as before". Sure, it's an additive synth, and that has existed before, but it's not really a rehash of something I'd seen before, it has controls which simply don't exist in other synthesizers.
I'll bite on some guesses, not sure which the OP is talking about.
A) Historically, before computing was fast enough, digital synthesizers often had major issues with aliasing (which could be defined as "lower quality", certainly it could be defined as unwanted artifacts. Although conversely some saw the aliasing as a sort of a "character").
IMHO this is no longer as big of a problem with the fast computing power today: techniques like oversampling, BLEP for oscillators, etc. exist to minimize aliasing. There is possibly a few mathematical issues out there for any "real time" oriented calculations due to the quantized nature of sampled audio, but software engineers even are coming up with techniques to deal with some of them (this paper -- https://www.native-instruments.com/fileadmin/ni_media/downlo... -- is well known for going over some solutions for filters for instance).
B) The other possibility: the nice thing (or not so nice thing, depending on your point of view) about subtractive synthesis with pure waveforms, is you are usually starting out with waveforms that are harmonic in nature (like a square wave or sawtooth). With techniques like additive (Razor's technique), digital FM / PM, or the like, it's much easier to get non-harmonic sounds. I wouldn't call this "lower quality", just different.
By "mathematically neat" I mean what you said; periodic function composition which is a nice general model in the mathematical sense but not specific to audio synthesis like the "ADSR" model is. So we may be singing from the same hymnbook here.
What I meant by advanced digital techniques is I suppose anything that takes things into the frequency domain, manipulates and converts back. Physical Modelling, Impulse Responses and Digital FM are all fine things but not in this category. I used to have a plugin that did weird things like let you stretch or clip the frequency domain for example. i.e. a 1-2khz bandlimited signal would become a 0.5-4khz signal. Some of the effects in Razor also do things in frequency domain that you wouldn't really do in the standard subtractive model.
My personal perception of Razor is that if you don't use those effects then it just sounds like a subtractive synth (same as before), while if you do use them it sounds a bit lo-fi (undesirable in my book though I appreciate that can be an artistic choice). Digital FM on the other hand sounds very clean to me.
I've been mucking around with LMMS[1] lately and I found a great video that shows you how to synthesize a kick drum using ZynAddSubFX: https://www.youtube.com/watch?v=coVl1D-q7Bo. Basically, it's all about havin no sustain and whitenoise. Also, a tiny bit of reverb with a HPF goes a long way.
also, as an alternative to PCA, try ICA (independant component analysis). This will attempt to find non-gaussian sources in the data and might be worth seeing if it can carve out more interesting portions of the sound.
I'm working full time on a new DAW that should make writing music a lot faster and easier. Current DAWs don't really understand music theory or the creative process. Also the note input process and experimentation is extremely time consuming and the DAW never helps. Current DAW : my thing = Windows Notepad : IDE. The HN audience is definitely one of my core groups.
Current DAWs don't really understand music theory, that's right.
But there are such things as Synfire Pro by cognitone for example, which on the other hand is unusable as DAW exactly like all the other software for music composition.
I would love to see an usable combination of both software categories eventually.
One question left: Your software won't be FOSS, or will it?
I'm aware of synfire, I definitely took some inspiration from it but mine will have a faster workflow, will be cheaper (synfire pro is like a thousand usd), will look visually better etc.
I definitely think that current DAWs are terrible at the composition stage of things, but I wouldn't really want to give up the ease of arrangement / sound design / mixing that they provide. Something that could be used for composition that fed MIDI into my existing DAW would be great.
Thinking about this even more, I feel like the composition could be represented as a series of transformations that go from simple to complex. So you start out with:
* Track in C#Maj, 124BPM, 4/4
* Chord progression I–V–vi–IV over 4 bars.
* OK but rhythm is that current chord repeats on every beat like a stab
* Timing should be early by half beat so it would be | I I I V| V V V vi| vi vi vi IV| IV IV IV I|
* another transformation to adjust the voicing, etc
* Then you can create a bass track from this that mirrors the lowest note but changes the rhythm, etc, etc etc,.
Any change along the way doesn't affect something else unless it should and then you're given a notice that you should check what was affected.
You may want to try out RapidComposer, Sundog Song Studio, or Odesi to get a sense of what that workflow entails right now. I bought each of those (two of them on sales), hoping I would come to terms with them, but they just don't work for me. It's mix of "the UX could be better", in some cases, but also some underlying consequences for complexity that show up when you're given a blank piano roll and every possible axis of music theory exposed as an affordance in the UI.
What's working for me right now? ChordPulse [0] which contains arranger keyboard style presets, plus a few options for sequencing and detailing the arrangement. Export to MIDI, add melodies and tweaks on top, and the song is ready. There are much more complicated versions of this formula around like Band in a Box, but they both have things I don't need, and aren't quite as good at this basic workflow.
I tend to mix (and do everything as I go) in Ableton. I just got a Push 2 so I've been building out some templates and racks to try to skip some of the tedium of trying to explore chord/melodic ideas quickly.
I was reminded of this behemoth: https://www.youtube.com/watch?v=ndb339l81pU like the interface is terrible, but the DAW should basically know what kind of chord your playing and be able to make transformations to them.
I'm relatively new to producing in general, but one of my main issues with composing in Ableton is just not having easy way to reference the rest of my MIDI clips working on one of them. Or if I decide I want to change my chord progression or my voicing I then have to go and check every other clip to make sure they all match or at least work with the change.
1) Do key detection and pitch-shift all the loops to a common key before processing. That might make more of the melodies come through the eigenvectors.
2) Visualize the loop point cloud - maybe with like 10-50 dimensions of PCA followed by 2 or 3 dimensions T-SNE.
3) Maybe some form of earlier dimensionality reduction? I.e. you could do a short-time fourier transform and then threshold and bin frequencies - then invert the transform to reduce the sounds to their basic characteristics. That make make it so not all of the first 20 or so eigenvectors all have slightly different kick drums. For example, if you have kick drums with fundamentals at 21Hz, 22Hz, 23Hz, 24Hz etc. those will each require two eigenvectors to represent the sine or cosine phases of the signal - but if you could "project" every kick drum sound so they were close to linearly related then PCA could isolate them with fewer eigenvectors.
I would love to play with some sort of live music generation system based on this - really, really interesting ideas. And goes to show what can be built with traditional data analysis techniques and a clever idea!
EDIT: Also if you uploaded your preprocessed data I think that would be really amazing.