Hacker News new | past | comments | ask | show | jobs | submit login
Neural Translation of Musical Style (imanmalik.com)
152 points by pshaw on June 9, 2017 | hide | past | favorite | 29 comments



im glad to see someone talking about dynamics rather than pitch for once. its really what breathes life into music anyway. the results are impressive. one thing i worry about is that midi patches are often very sloppy about subtle changes to velocity- sometimes 60-70 sound the same, and then 71 is way louder, or the tone is much different. that means that even though you have a nice, dynamic range of velocities, it can get aliased in a way that robs it of some of that. not sure what kind of midi patches you are using for the sample generation. if you use a piano simulator, or a really high quality patch library you are probably ok.

the other thing i would note is that there is not a right or wrong way of coloring a piece with dynamics. its a huge area for artistic choice. there doesn't seem to be a great nomenclature for really detailed dynamics in the western music tradition, so it hasn't been able to be recorded on written scores, but i think it can be as much (or more!) part of the composition as the pitches are. a solo snare drum piece would be a good example of this. whats cool about what you did is that it gets away from monotonous dynamics that are the default when you write stuff on a daw by hand.

my own personal approach to this problem has been to treat dynamics as compositional elements independent from pitches or rhythms, and realize this though code. its a pain in the ass to explore dynamic ideas when you have to change all the note velocities by hand, like you said, so i decouple the dynamic curves / sequences / ??? and map them on to different regions of pitches and rhythms. its nice to add a breathing quality to a line by adding some kind of cyclical dynamic, then also being able to add a crescendo on top of that, or being able to decrescendo while still maintaining some kind of pattern of accents or dynamic phrasing. and then being able to change all that quickly, instead of by hand.


> its nice to add a breathing quality to a line by adding some kind of cyclical dynamic, then also being able to add a crescendo on top of that, or being able to decrescendo while still maintaining some kind of pattern of accents or dynamic phrasing.

Love that. As a singer that's what I've trained all my life, even when it's a just a single marking on the page, you read it within the context of, is this on a phrase, what's the emphasis of the words/syllables, how does it relate to the pitch of the line (e.g. a lot of time a dim on an ascending line merely means "don't be an amateur and get louder"), what's happening with the other voices/instruments. Nuancing a crescendo from one large section to another while also giving respect to phrases within. Fun challenges.


> im glad to see someone talking about dynamics rather than pitch for once. its really what breathes life into music anyway.

So half of Bach's music was dead when he wrote it?


what do you think makes glenn goulds recordings so great?


As the article itself notes, performance and composition are distinct.


For comparison, three other human performances of the same Chopin piece:

https://www.youtube.com/watch?v=V7SvQzkZmuM

https://www.youtube.com/watch?v=GOe670xcKhk

https://www.youtube.com/watch?v=fRqynzR_8Ts

Both of the performances in the demo do a mediocre job with the shapes in the music, including the phrasing and dynamics.

I suspect more people would be able to hear a clear difference if a more representative human performance was used for the comparison.

As is often the case with ML in music, the bar is higher than it seems to be.


as i understand it the human piece was chosen from the available dataset

your comment seems to imply intentional misrepresentations

the thing about recordings of performances of music from these periods and composers is that the music is public domain but the performer can copyright the performance

if the human performance midi recordings dataset used in the thesis was legally able to also include the performances by Valentina Lisitsa, Pollini, and Horowitz i am unable to see how the net would fail to make use of their contribution

also for the best results those performers would need to be involved in the production of those midi files because they carry with them a lot of a subjective meta information

i commend the human performers in the available midi files for their effort both in their expression of the piece as well as their desire to make music accessible in a verbose spec'd digital data standard

regardless i feel the real impressive part of the thesis is taking the droned midi and altering it to sound like the human midi.. which i believe is the point moreso than the lcd wow! effect of an, author defined, 'musical turing test'

i mean, really.. access to: the thesis, a full blog write up, repo containing code and a jupyter notebook, and the dataset used;

this work was excellent and the write up phenomenal


[spoilers:] When hearing the first human I also thought, that it shows some artistic choice with a few strong notes, but overall is not played well. Still the AI version is impressive!


This is really cool, but the difference between A and B was immediately obvious to me.

I couldn't find it – did OP say whether or not respondents were musicians or not?


Yes, I'd say the most obvious tell was the strict quantization. If they can transfer some of the timing information it might be more convincing.


I thought the most obvious giveaway was that the melody was not articulated at all by the robot. Aside from that, the dynamics in the robot version were completely absent. I actually didn't think it was even a very good song until I listened to the human playing it, then I kinda liked it.


It might be simple to take the amount that each note is ahead or behind where it would be quantized to and model that parameter exactly the same way that dynamics are modeled.


OP here

Thanks for the comment! The respondents weren't musicians. The respondents were selected randomly as I just wanted to see if StyleNet could fool the average person. However, I will definitely be performing surveys on musicians as I continue my work with StyleNet.


First off super cool project with impressive results!! I am not a musician but I was fairly reliably (>75%) able to identify the NN. It took listening to a few examples but I quickly realized that smoothness was the give away. Humans make much more significant jumps when going from soft to aggressive or visa versa where the NN tended to smooth out these changes. I'm going to have my sister (a classically trained pianist) take a listen and see what she thinks.

Edit spelling


So my sister went 14/18. She says,

"The jazz ones are very obvious. Also the voicing of the parts in the first chunk of songs was more nuanced with the human than the AI. The jazz is too straight in the runs with the AI, too "perfect". Real life players stall/hitch, even if just a little! But really, it's pretty impressive - way better than old school canned midi player stuff!"

How effective do you think this approach can be with altering timings to try and imitate that style?


Wow, thanks so much for the feedback! It really helps to hear such detailed feedback from a musician.

Learning timing imperfections with my current setup shouldn't be too difficult to implement. Considering that it can predict velocity quite well, I assume that it would be able to pick up timing too. It's definitely the next thing I plan on experimenting with.


Please compress your .wav files to mp3/mp4/ogg/whatever. And please only load the music when I want to play it. The size of that page is ridiculous.


Just (losslessly) converted the wavs into mp3s. Hope it loads faster now.


Slightly pedantic, but there is no such thing as lossless mp3 compression.


Also pedantic, but lossless mp3 compression is possible; it just requires the decompressed signal to be identical to the original.


More just having fun now, but I feel lossless compression implies deterministic decompression, i.e. there is one and only one signal which compresses to a given compressed signal.

Even if you had a signal which compressed to itself, it seems to me that there would likely be other possible signals which would compress to an identical compressed signal.


Good point, even if compressing and then decompressing does not change anything, you need to know that nothing changed, otherwise you lose information about the compression error.


Could you provide a baseline case of a complete random musical style? I know it will sound terrible, but I may be too much of a musical pleb to even notice the difference.


That's a really good idea! I've produced mp3s with randomised velocities for the Chopin track and also the Yiruma track. I'll be adding them to the blog post. Thanks!

Chopin: http://imanmalik.com/assets/audio/random_chpn.mp3

Yiruma: http://imanmalik.com/assets/audio/random_y.mp3


I've added the randomised tracks to the blog :)


I know the focus is on neural networks and music, but there's an underrated art of data curation and preprocessing of which this is an interesting example.


Really really cool, but anyone having some classical music background - my grandpa was a professional cellist, I did cello for 8 years - can easily guess who the bot is in the A, B test


> So the StyleNet model has successfully passed the Turing Test and can generate performances that are indistinguishable from that of a human.

Not quite. There is only a ~2% chance that those ~72 people who made a choice got such a good result (62% correct) by random guessing. Still an impressive result.


This is come cool stuff. >:(




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: