1) Do key detection and pitch-shift all the loops to a common key before processing. That might make more of the melodies come through the eigenvectors.
2) Visualize the loop point cloud - maybe with like 10-50 dimensions of PCA followed by 2 or 3 dimensions T-SNE.
3) Maybe some form of earlier dimensionality reduction? I.e. you could do a short-time fourier transform and then threshold and bin frequencies - then invert the transform to reduce the sounds to their basic characteristics. That make make it so not all of the first 20 or so eigenvectors all have slightly different kick drums. For example, if you have kick drums with fundamentals at 21Hz, 22Hz, 23Hz, 24Hz etc. those will each require two eigenvectors to represent the sine or cosine phases of the signal - but if you could "project" every kick drum sound so they were close to linearly related then PCA could isolate them with fewer eigenvectors.
I would love to play with some sort of live music generation system based on this - really, really interesting ideas. And goes to show what can be built with traditional data analysis techniques and a clever idea!
EDIT: Also if you uploaded your preprocessed data I think that would be really amazing.
I saw some research years ago about breaking down sound into pitched and unpitched components, using some kind of kernel transformation in fourier space. If I remember rightly this reduced the usual 'artifacts' you get when manipulating fourier transforms of white noise. Perhaps a good candidate for this case. Sorry I can't remember more though I think the talk was by this guy https://www.york.ac.uk/music/staff/academic/jez-wells/
Either way, to the OP who I presume can only be Dr Brady of the Museum of Techno; good work old chap, you deserve a sherry.
Update: I switched the audio files to Soundcloud now so everything should work again.
Eigencrisis! It looks like the UCI servers are having trouble. I uploaded the audio files to the following Soundcloud playlist and I am working on embedding the soundcloud player in place of the files.
Sorry for the inconvenience.
Here's my worry, speaking as someone who knows physics but not all that much audio processing. I feel like this approach will inevitably put a whole lot of (unhelpful!) emphasis on the detailed phase information of the various sounds. The near-perfect cancellation of the average track illustrates this: two loops of the exact same bass drum beat offset by a fraction of a second will be treated as orthogonal or even opposite by this algorithm, but they're essentially identical as perceived by the listener.
Conceptually, I imagine what you'd want is some way of encoding the various loops whose average came out sounding like a real average, rather than as nearly silent due to phase cancellation. My first instinct is to say "take the FFT of each loop first, and then run your PCA on that". Maybe that's not the right answer: like I said, I'm not an audio processing expert. But I suspect that right now your analysis is spending a huge fraction of its effort effectively trying to get the first drum beat to happen at precisely the right fraction of a second, and separately to get the second drum beat to happen at precisely the right fraction of a second, and separately the third, and the fourth, and so on. And heaven help you if the different loops' bass drums are tuned to marginally different notes.
Edit: Only after writing this did it sink in that the top overall comment's remarks (by "highd") about earlier dimensional reduction were getting at the same issue. I'll leave this here, just in case the different framing is useful.
One could definitely get something better if they took the audio-loops as translation-invariant. I thought about trying to do a symmetry-invariant (or 'group-action-invariant') PCA but I could not find a good way of doing that. More advanced methods, e.g. 1-d convolutional networks, WaveNet etc. do have this translation-invariance built into them.
Interestingly, the second eigenvector is not just a phase-shifted version of the first eigenvector but contains slightly higher frequency information as well, but I agree totally with what you said.
In : X = randn(10000,100)
In : timeit apply_along_axis(np.mean, 0, X)
100 loops, best of 3: 7.91 ms per loop
In : timeit X.mean(axis=0)
1000 loops, best of 3: 1.21 ms per loop
But honestly, there are so many components that sound similar, so I dont feel we've truly gotten to something fundamental here.
My first thought is to start with a Fourier, and then look at dimension reduction from there.
(see also http://archive.museumoftechno.org/exhibition_detail.php?id=4 )
Pedantic note: This is only true in a theoretical setting where you have infinitely precise samples. Since each sample has only so-and-so many bits, and calculations also are lossy, you cannot reconstruct the wave perfectly. But it's good enough. There are also formats with more bits per sample to increase quality.
Also human hearing is much closer to a wavelet transform or STFT than FFT.
The reference on this is Rick Snoman's Dance Music Manual, 3rd ed.
This is not to be confused with the analog/digital debate, In which I am a fan of digital: digital synths that use the analog mental model can and do sound great e.g. virus, sylenth, zebra, fm8. It's more that the analog model evolved from how our ears perceive things while newer paradigms while mathematically neat often lead to unwanted artifacts, or boil down to much the same sort of thing as before e.g. NI Razor.
Then again we might be talking about different things bitL, if you want to post links to the sort of sounds you're talking about then I'm all ears.
Rick Snoman is fairly basic though I'd say the definitive synth guide was the old Sound on Sound Synth Secrets series (google). Or better just hang out on forums with more experienced people and post up links to sounds you like saying 'how do I make this?'
There are basically two approaches to designing sounds: you either model perception of audio or you model the physics that produces audio. Analog (subtractive, additive, FM) synthesis is basically entirely in the first camp. Digital is strictly more flexible so you have more options, like Karplus Strong, which gives you a simplified model of a plucked string.
The most "mathematically neat" synthesis techniques are still the old analog models: you start with a simple mathematical function like a sawtooth wave and put it through a four-pole filter. Or you stack a few different overtones on top of a fundamental. It's hard to get more simple/neat in terms of the math. The digital synths can be fairly complicated, mathematically, when they're not emulating analog synths.
I don't know what you're talking about when you say that advanced digital techniques sound somehow lower quality.
Things like impulse responses are everywhere, like reverb, guitar amp sims, and distortion plugins. They sound better than ever.
> or boil down to much the same sort of thing as before e.g. NI Razor.
Wait, what? I own a copy of NI Razor and I don't see how it boils down to the "same sort of thing as before". Sure, it's an additive synth, and that has existed before, but it's not really a rehash of something I'd seen before, it has controls which simply don't exist in other synthesizers.
A) Historically, before computing was fast enough, digital synthesizers often had major issues with aliasing (which could be defined as "lower quality", certainly it could be defined as unwanted artifacts. Although conversely some saw the aliasing as a sort of a "character").
IMHO this is no longer as big of a problem with the fast computing power today: techniques like oversampling, BLEP for oscillators, etc. exist to minimize aliasing. There is possibly a few mathematical issues out there for any "real time" oriented calculations due to the quantized nature of sampled audio, but software engineers even are coming up with techniques to deal with some of them (this paper -- https://www.native-instruments.com/fileadmin/ni_media/downlo... -- is well known for going over some solutions for filters for instance).
B) The other possibility: the nice thing (or not so nice thing, depending on your point of view) about subtractive synthesis with pure waveforms, is you are usually starting out with waveforms that are harmonic in nature (like a square wave or sawtooth). With techniques like additive (Razor's technique), digital FM / PM, or the like, it's much easier to get non-harmonic sounds. I wouldn't call this "lower quality", just different.
What I meant by advanced digital techniques is I suppose anything that takes things into the frequency domain, manipulates and converts back. Physical Modelling, Impulse Responses and Digital FM are all fine things but not in this category. I used to have a plugin that did weird things like let you stretch or clip the frequency domain for example. i.e. a 1-2khz bandlimited signal would become a 0.5-4khz signal. Some of the effects in Razor also do things in frequency domain that you wouldn't really do in the standard subtractive model.
My personal perception of Razor is that if you don't use those effects then it just sounds like a subtractive synth (same as before), while if you do use them it sounds a bit lo-fi (undesirable in my book though I appreciate that can be an artistic choice). Digital FM on the other hand sounds very clean to me.
Sign up here https://docs.google.com/forms/d/1-aQzVbkbGwv2BMQsvuoneOUPgyr...
And ill ping you when it's released.
But there are such things as Synfire Pro by cognitone for example, which on the other hand is unusable as DAW exactly like all the other software for music composition.
I would love to see an usable combination of both software categories eventually.
One question left: Your software won't be FOSS, or will it?
It won't be Foss no
Not saying compositional aids couldn't be easier or have different UX paradigms, but there are tools in today's DAW's.
Would be pretty cool if Jacob Collier got together with a developer to write a compositional aid tool for harmonising.
* Track in C#Maj, 124BPM, 4/4
* Chord progression I–V–vi–IV over 4 bars.
* OK but rhythm is that current chord repeats on every beat like a stab
* Timing should be early by half beat so it would be | I I I V| V V V vi| vi vi vi IV| IV IV IV I|
* another transformation to adjust the voicing, etc
* Then you can create a bass track from this that mirrors the lowest note but changes the rhythm, etc, etc etc,.
What's working for me right now? ChordPulse  which contains arranger keyboard style presets, plus a few options for sequencing and detailing the arrangement. Export to MIDI, add melodies and tweaks on top, and the song is ready. There are much more complicated versions of this formula around like Band in a Box, but they both have things I don't need, and aren't quite as good at this basic workflow.
I was reminded of this behemoth: https://www.youtube.com/watch?v=ndb339l81pU like the interface is terrible, but the DAW should basically know what kind of chord your playing and be able to make transformations to them.
I'm relatively new to producing in general, but one of my main issues with composing in Ableton is just not having easy way to reference the rest of my MIDI clips working on one of them. Or if I decide I want to change my chord progression or my voicing I then have to go and check every other clip to make sure they all match or at least work with the change.
Edit: audiofiles suddenly work, but still can't access the website