Hacker News new | past | comments | ask | show | jobs | submit login
Deep learning for assisting the process of music composition (highnoongmt.wordpress.com)
75 points by albertzeyer on Aug 16, 2015 | hide | past | favorite | 30 comments

Well, this field is really exploding right now! I was curious about the performance and searched around a bit: in another other post, the author gives a slightly more detailed explanation of how the tunes are automatically turned into audio:

"I convert each ABC tune to MIDI, process it in python (with python-midi) to give a more human-like performance (including some musicians who lack good timing, and a sometimes over-active bodhran player who loves to have the last notes :), and then synthesize the parts with timidity, and finally mix it all together and add effects with sox."


The generation of tunes by the RNN is pretty nice and definitely the trending topic, but I think I'm more impressed by the little performance script that he's put together. The output is quite pleasant and I'm curious about the code that generates the bodhran part. Hope this gets open-sourced!

(Off-topic to the guy who submitted this: thank you for making OpenLieroX and turning my university into a chaotic LAN party on many an occasion.)

This was posted twice. We kept this thread as the earlier of the two, but changed the URL to the more explanatory post. The other URL is http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/inde... and actually plays the music. The other HN thread was https://news.ycombinator.com/item?id=10069007, but we moved the comments here.

This is great! He is using deep learning as it should be used in regards to music - not as the sole generator of songs (no technique is quite up to that yet) but as a source of inspiration for a 'proper' musician, who can take it's output and do cool things with it. As a bagpipe player, I can hear ideas for several new pieces among the output he posted! I see this in the same line as IBM's 'Chef Watson' - great if a sufficiently skilled person is there to supervise :) Good work.

Fiddle player for the last 18 years: can confirm that this is pretty much what most traditional music circles sound like (especially their endlessness).

At a workshop I attended, American piper Kieran O'Hare was once decrying the typical American "intensification" pattern as applied to seisiúns. It becomes about people's egos and shoehorning in their sets or showing that they can keep up with the best. Really, it should be about friends socializing and sharing music. Tunes-tunes-tunes-tunes is antisocial and a bit mechanical when it comes down to it.

At another workshop, Kevin Creehan (Junior Creehan's grandson) noted how it was strange for seisiúns to end up in pubs. Originally, they were in people's houses. They were intimate gatherings of family and friends, and some of the heights of artistry in Irish trad music clearly stem from this quieter and more intimate context.

For the most part Bay Area seisiúns seem to include a healthy dose of socializing and human interaction. Haven't been down to Mountain View yet. I hear those guys play fast. O'Flarety's in San Jose is pretty raucous, but even so, they manage to get in a good amount of socializing.

You techies that show up to The Plough and the Stars sessions -- know that you are in one of the premier sessions in North America, both for the skill of musicians and the health of the community that surrounds it. Don't expect to be entertained, like it's a show for you. It's a gathering for the musicians to share music with other musicians. If you like Irish trad, listen carefully, because the very best music comes out when it's musicians sharing with their friends. (Also listen patiently, because there's still a certain diversity of musicianship.)

Also, when someone is singing a slow air, then it's good manners to be quiet and listen! (And very bad manners if you don't!)

In Newfoundland, anyway, it wasn't just the tunes that used to be in people's homes, it was the dances, too: "Mostly they'd have a dance in the kitchen. It was all wooden floors. Some of the kitchens was a bit small so they'd have to take out the stove, lift it outdoors in the yard for to make more room for dancing...and they'd play all night and in the morning they'd bring in the stove again for to boil the kettle and have a cup of tea." (Vince Collins)

Was also once the pattern for Irish traditional dancing, with even more wrinkles on top of that. (Churches building the parish halls, and forwarding Ceili dancing as a more chaste form of social dancing.) Newfoundland has benefited in terms of the preservation of its cultural heritage through a degree of isolation.

You're missing an 'i' and a fáda: it's 'seisiún'.

Thanks, but it's too late to edit.

I added it, hoping you don't mind.

I really appreciate the effort that went into the performance part of this work. There was a real effort to try and make it sound like a reasonable representation of humans playing...a little off beat, out of sync at times. Instead of just hammering the notes out like I hear with lots of these systems, it makes it listenable...I've had the endless trad on for 15 minutes now in the background.

I also like how the basic structure of the musical forms has mostly carried through the model, that seems to be a good "sniff test" if the model is producing reasonable output, if the musical structure makes sense as well as the notes. It makes it feel like there was a little bit of planning.

Great work.

>I really appreciate the effort that went into the performance part of this work. There was a real effort to try and make it sound like a reasonable representation of humans playing...a little off beat, out of sync at times.

That said, there's a certain ways that humans interact when they play together (even if they track their parts independently).

Just being a little off beat randomnly doesn't capture that, and can sound fake just as being perfectly on beat.

There are a few algorithms about how actual players interact, here's a relevant study: http://scitation.aip.org/content/aip/magazine/physicstoday/a...

There are several more for real-life like quantization/humanization.

Thanks everyone! This is Bob L. Sturm.

Pierrec and bane: My scripts are just a hodgepodge of bash and python. Happy to share (email me). I do not care much for MIDI piano; and since this music is typically monophonic, why not just use all the typical instruments? I generate the bodhran part from the MIDI and randomly choose to play a note, which kind of note to play, and whether to double up a note. I also give to each player the option to be late or early. Sometimes it gets a bit much, but is fun nonetheless.

bane: One of the reasons why I got into this music generation work is that it provides a sanity check of the internal models, just like speech recognition: look at the transcription to confirm the model is paying attention to relevant aspects. You may be interested in my "horse" article: "A Simple Method to Determine if a Music Information Retrieval System is a “Horse”" (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6...). Another reason is that I like to compose, I like traditional music, and here we go!

Yenrabbit: Thank you! I agree completely. It is quite hopeful to believe a single artificial system will produce "music." It is merely shifting characters around according to a probabilistic model in light of constraints it has learned (such as four whole notes to a measure in common time), and it is up to musicians to "realise" it. Certainly, a lot is missing from the reduction to ABC. Music is much more than a sequence of arbitrary symbols. :)

All: you can browse all the tunes so far generated here: http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/Sess...

Many thanks for your comments!


I get what you're saying, buuut, just for fun:

Western Pop / Hip Hop blended with Arabic / Middle Eastern tunes https://www.youtube.com/watch?v=5BzkbSq7pww

Arabic or Indian / Asian styles can be measured , modified, and released in Western markets. https://www.youtube.com/watch?v=pxr0Cofbmpk


This seems it is almost getting it to work in a musical sense. (The Irish songs seem to be simple enough for the NN to understand it back to front, also a lot of 'similar' samples definitely helps)

The NN seems to be able to assemble the repeat sections with different endings and having a song with two distinct sections

But they seem to all be in the same key and time signature

The NN will imitate whatever it is trained on - it is just that some styles of music may require more powerful neural networks.

You're point about "Irish songs seem to be simple enough for the NN to understand it back to front," is a good one. There are a number of aspects of the compositional parameters here that I think make it much easier to generate this kind of music.

Firstly, it's modal harmony, not diatonic. In this case, dorian mode in A. If you've every improvised in say, a whole tone scale, you notice that you can play almost anything and it sounds good. Modal music works in a similar way. A lot of the things you find in diatonic harmony: the tension between tonic and dominant, chord progressions in general, key changes, chromatic inflections, etc....are all absent in this kind of music. Which isn't to say anything bad about modal music, just that it's much simpler. Because of the added complexity of diatonic harmony, there are many many more ways the music could "go wrong" so to speak. Most people are so grounded in diatonic harmony that they would easily perceive even small mistakes (or statistically speaking, deviations from the norm) without necessarily being able to explain what rule, exactly, is being broken.

It's also monophonic music; there's just the one voice and no accompaniment (other than a completely static rhythmic accompaniment that was added to the performance).

Finally, even within this much simpler framework, I'd argue this tune gets it wrong in a big way: it doesn't know how to come to an end. It just sort of stops, in medias res. In a lot of folk music, you'll find that the way a tune is ended still tends to hearken back to diatonic harmony: some sort of motion from dominant (e), maybe even with a raised 7th scale degree to tonic (a) that's outlined by the melody. That doesn't happen here, which is why the tune sounds like it just got cut off.

I find these NN experiments in music generation quite interesting conceptually, but so far the results--as music--have been pretty disappointing. I suspect that you could actually build a model that would allow for algorithmic generation of folk tunes that would produce music that would probably be more satisfying. The number of rules that govern a lot of kinds of folk music are small enough that you could encode many or at least most of them in your model. [1] However, at the end of the day you'd still just have a model that would only generate a fairly limited spectrum of folk music--say Irish gigues, reels, hornpipes, etc--whereas the dream with NNs, markov models and other statistical methods is that you could plug in any corpus of songs without understanding a thing about their harmony, form, structure, melodic patterns, etc. and get back music that sounds the same.

[1] and this is really massively simplifying on my part w/r/t the varied amount of folk music out there, some of it quite complex

Folk tunes are more complicated than they sound. Jigs, reels, etc are all classes of tune, with broadly similar features. But there are further sub-groupings based on date of composition, composer, and even location.

So if you feed an ML system a generic mix of folk tunes without understanding how the subgroupings work, you'll get a messy blob of musical data out. It will sound sort-of interesting in a work-in-progress way, but you will always have to hand-edit it to get something acceptable. And even then it will probably be mediocre rather than memorably great. And if it sounds at all good, you'll likely find you've created a mashup machine, not a true composer.

Really, it's like training an ML on "ballads". You'll get a few features that are similar, but everything else will be too noisy to be anything other than a crude attempt.

So I think good musical imitation is probably a lost cause, because the rules are so complex and contingent, even for "simple" music, that there simply isn't enough consistency to do the job.

At the same time the differences from the template create recognisable styles, which have emotional and other associations. So the differences are significant in their own way - but even noisier as a recognition problem.

People interested in the history of computer-generated music should look into David Cope's experiments in musical intelligence: http://artsites.ucsc.edu/faculty/cope/experiments.htm

This is really interesting, but it also makes the failure points of computer generated music evident. Namely: it works best with simple, "jammy" music; it requires a large existing base of music to pull from; and it has no way of using compositional techniques for rhetorical effect. (Increasing/decreasing tension, "going somewhere", expressing emotions, etc.) In other words, it can't create innovative music or music that has something interesting to say. It can only recombine the past.

I like the idea of using these tools for educational purposes, however. Can we derive the "rules" for music of different cultures by feeding them through this kind of algorithm? If so, it would give us a fantastic insight into different musical traditions around the world, even if it couldn't write that music for us.

> In other words, it can't create innovative music or music that has something interesting to say. It can only recombine the past.

So far. I hope we'll be seeing progress on that front in the future, too.

The generated sniplets have a lot of pentatonic intervals. Of course pentatonics makes music more harmonic, but we are used to more disharmonics adding color. I can not look at the notes sheets, but I guess the genenerated music has less disharmonics then original Irish fiddle.

Apologies if this is a stupid question, but what is meant by ABC tune in this context? It seems to be a method of musical notation?

it has a lot of similarity with human made trad music, but to my ear it sounds very different from music. something is definitely missing. Inflatable dolls look like the real thing, somewhat, on the surface... still a long way to go for synthetic music to fool a musician's ear, IMHO...

This type of stuff is super cool.

Submitting this twice simultaneously may be causing more confusion than anything. I didn't see this one at first and only commented the other one (which seems to be magically higher on the front page):


Thanks. We've now merged the threads.

You could use LZW or prefix trees for assisting music composition. Rather less mystification with those tools though.

I thought web pages that automatically played sound went out with the nineties.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact