Hacker News new | past | comments | ask | show | jobs | submit login

Yes. I do some creative work in audio and I wanted to cluster a fairly large database of audio. I needed a dimensionality reduction technique to take a set of audio descriptors down to two values (to use for coordinates) in order to create a 2d explorable space. I actually ended up using MFCCs[1] and not typical audio descriptors[2] for my analysis data as there are lots of issues with making the numbers meaningful to begin with. I ended up munging to get around 140 numbers for each audio file by taking the MFCCs, and getting statistics over 3 derivates so that the data somewhat represented change over time. I tried out a number of reduction techniques and PCA was one of them. Perceptually, the groupings it produced were weak and techniques I found useful were ISOMAP[3], t-SNE[4] and lately UMAP[5]. [4] and [5] have given me the best perceptual groupings of the audio files.

You can see some of the code here on Github, although a lot of it depends on having some audio to test among other closed source projects (sorry I have no control over that).

https://github.com/jamesb93/data_bending

[1] - https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

[2] - http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Herrera_19...

[3] - https://scikit-learn.org/stable/modules/generated/sklearn.ma...

[4] - https://scikit-learn.org/stable/modules/generated/sklearn.ma...

[5] - https://umap-learn.readthedocs.io/en/latest/




The quality of the mapping to this 2D space would seem to be pretty subjective, but incredibly powerful once an intuition of it is developed. For example, I've been watching a bunch of guitar pedal reviews, and it would be very cool to vizualize the 'topography of tone' comparatively between products.

Does the work you're doing give you that ability to compare the outcomes of various reduction schemes to compare them?


The quality of the mapping is entirely subjective. Person to person and corpora to corpora a different kind of mapping will work better for known and tacit goals with the results.

The work I am doing now has the ability for me to compare different plots as well as to experiment with another layer of processing that runs clustering on the output of the dimensionality reduction. I then manually play through the clusters to see how well the groupings seem to me, in terms of how 'close' the audio samples are together and how well a cluster is differentiated from another.

I hope to have some documentation on my project soon as its going to be part of my dissertation in music technology.

Also, to answer your question about guitar pedals - Stefano Fasciani has done some work on finding smoother parameter combinations across chaotic synthesisers.

https://youtu.be/yMLl7-aI_kc

That is perhaps up your alley!


Very cool! If it's appropriate I hope you post about it here when complete.

Also that TSAM vid is exactly what i had in mind, thanks!


Ooh, corpus based concatenative synthesis! I look forward to reading through these links. The only features that I have had luck with are spectral centroid (for ‘brightness’) and some measure of loudness, but I have always wanted more. Thanks for posting this stuff!


I'm not sure if i'm parsing this conversation correctly, but 'corpus based concatenative synthesis' reminds me of Scrambled? Hackz! - https://www.youtube.com/watch?v=eRlhKaxcKpA


Hey no worries! In the realm of audio descriptors centroid and brightness go far in terms of mapping on to something perceptual. I use them all the time for doing basic corpus navigation and concat synthesis. This indeed was an experiment because I wanted something that respected more perceptual features of the sound. I am in the process of writing my results up into a thesis actually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: