Hacker News new | past | comments | ask | show | jobs | submit login
Classical music generation with recurrent neural networks (hexahedria.com)
443 points by hexahedria on Aug 8, 2015 | hide | past | favorite | 120 comments

As a student of classical I'd like to point out what I believe to be the main in-humanness in the generated music that makes it stray from "real" classical.

Some have cried violation of counterpoint, but I don't think that is such an issue here. CP is mostly about how notes are linked together on a small scale, and this is rather the network's strong point: it seems to concentrate on the associativity between notes and between chords, and any given moment seems to be composed with a vision that only extends to a few bars (or even just a few beats) around that moment.

The main problem is therefore large-scale structure. For one, recurrence of melodies (at key moments and transpositions) is crucial to creating the emotional value of classical music. None of this appears here.

And secondly, possibly the greatest shortfall of the present neural network (I'm ignoring performance, of course) is the harmonic structure. Classical music, let's say later than medieval and earlier than late-romantic in style, generally has the harmonic structure of a recursive cadence. Harmonic cadences are what give emotional power to harmony, but this NN is painfully incapable of creating any.

That being said, I don't think this problem is inherent to the approach of creating music with NNs. Right now it sounds like what you'd get with a well-crafted Markov chain, but NNs can go beyond this, and this article is exactly the kind of thing that will instigate this evolution.

I agree, I kind of feel like there should be a couple different layers of generation.

A "planning" layer that lays out the song plan (ABACABBA, etc.)

A composing layer that fills in those sections. And maybe even generates some slight differences between the same-named sections for variety.

A performance layer that plays it back with a simulation of human performance metrics (slight jitter to note placement, emotive crescendos, suggestive variations in note-length, etc.).

Maybe this kind of thing can also be learned by a secondary NN. It just needs to be trained with data collected over large scale sections of the example music.

But this NN doesn't solve the greatest problem in Classical music: that only 3% of people take the time to appreciate it.

Let's design a neural network to appreciate classical music, then spawn a few billion instances. Greatest problem: solved.

http://xkcd.com/1546/ comes to mind.

Why do you think that is the greatest problem? If a concern for audience numbers, what's the goal and why? In classical music, my experience is that there are many more highly accomplished practioners than can ever be supported by audience demand but that's only a problem if there's an argument for a set goal for audience figures. There are of course thousands of pursuits which demand time for their appreciation.

Greatly more than 3% of people appreciate classical music. I've been to countless sold out classical symphonies. It may not be their favorite genre, but people pay attention and enjoy it. It's only 3-7% of new music sales in stores, which is all that media wants you to think about.

For pop-rock and other styles of mainstream music there are massive libraries of chord progressions that analyse all the "hits" structure (or big picture). Those would be really helpful with the "planning" layer... I.e:


I bet you can get results much more accurate feeding all those progressions as one of the reference layers.

There is a series of works of Francoise Pachet at sony research labs in Paris and his collaborators that uses hmm with constraints to obtain something that preserves long-range structures. I don't have the links at the moment, but he got impressive results.

Will "NN Teacher" be a job title soon? In the present situation we have an NN who learned empirically and who needs a bit of structure: "Here are the sound plans you need to study, here are the melodies, here are the styles and trends".

You mention markov chains and as a student of English literature I'd like to point out what I believe to be the main in-humanness in generated prose that makes it stray from "real" prose.

Some have mentioned pointless sentences, but I don't think that is such an issue for modern literature. Modern prose mostly has to do with what words come after each other on a small scale, such as a single phrase, or the associativity between words and between sentences, and at any given moment most words are only used for a few sentences, then replaced by other words.

The main problem is therefore large-scale structure, chapter to chapter. For one, the recurrence of a theme throughout several volumes of work is crucial to creating its emotional value and cohesiveness. None of which appears in Markov Chains.

And secondly, possibly the greatest shortfall of Markov Chains (ignoring performance) is in plot. They're just not very imaginative. Prose, let's say later than ancient epics, has interesting, recursive plot devices. Plot is what gives emotional power to sentences, and most markov chains are painfully incapable of creating any.

That being said, I don't think this problem is inherent to the problem of writing prose using Markov Chains. Right now it sounds like what you'd get with a crowd-sourced writing prompt, but Markov Chains can go beyond this - and literary criticism is what it will take to bring Markov Chains up to the standard of popular writers like Stephen King.

It's pretty clear that in the near future, we will laugh at the idea of reading something that someone hand-typed from their own thoughts, instead of simply reading the output of one-dimensional random walk. I hope if I have grandchildren who are as interested in modern literature as I am, their degree will be a bachelor of science, with the only work they have to do being to quantify and remove any remaining guesswork from the science of computer-generated writing. Even today, it is hard to understand why anyone still reads writing written by someone. It's as quaint as a telegram.

The main path going forward in improving Markov chains is expanding the individual "unit" that are selected by these chains. Current Markov chains only randomly select words to string together into semi-coherent thoughts, but what if they instead randomly select paragraphs, or even entire chapters? This will promote large-scale structure within the book, and ensure the plot will always be varied and engaging.

I hope that your grandchildren's Bachelor of Science would train them in curating brilliant works of art that can then be feed into the Markov chain and perform the one-dimensional random walk.

I was actually making fun of this way of producing "art" (such as music.) Obviously since the algorithm has no human 'state of mind' or emotion, it cannot include any in the art whatsoever. My comment was completely satirical.

My comment was satirical too actually. The whole idea of thinking computers have created something "new and original" just by shoving together bits and pieces of human-produced works of art seems somewhat ridiculous. It's not entirely ridiculous, as the computer did do some manual labor of piecing together the words, but ultimately, you are relying on a corpus of pre-existing work and then reassembling it together. Any emotion (or indeed, most of anything good) comes from the corpus, not from the bot analyzing and imitating the corpus.

Precisely. Compare the example in the post with this: https://www.youtube.com/watch?v=ZevgEUVeZ9Y

Notice the overarching theme, the experience that the music "shows you", the arc it leads you through. In comparison, the generated music is but a chain of pleasant and nice-sounding moments, with no overarching emotional arc uniting them.

Now is that the exclusive domain of humans? Will NNs eventually write music that passes for a human composition to a skilled human jury (a musical Turing test)?

Yes. It sounds good at the scale of a few seconds, but then it's clear there's no higher level structure. That's about what you'd expect from a recurrent ANN. Nice demo, though.

> but NNs can go beyond this, and this article is exactly the kind of thing that will instigate this evolution.

Thought experiment: one day it is not uncommon for some deeply moving, emotional works to be composed entirely by computers (though performed by humans). How will we, as a society, react to that? Will be reject it as being 'unauthentic', or use it as an opportunity for introspection?

It would even probably not be unreasonable to think that these NNs will be given names, and different NNs will be characteristically different, so then you might see a Wikipedia page or an album from a specific NN in much the same way we'd see a human composer today. How weird would that be?

Edit: or, even, NNs trained on specific composers (and the music they'd have heard), to try and create 'new' works from existing long-dead composers, or even just to complete famous half-finished works. How blasphemous would that be? But for a sufficiently competent NN, I can imagine the output might be very interesting.

Edit 2: (apologies, but this is really out there) I have also wondered how much information is sufficient to 'recreate state'–that is to create an 'AI' that passably mimics a specific real person (kinda like a Turing Test), and in that sense creates a pseudo-immortality.

Arguably, what makes a worthy creative different from someone just spewing out content is their ability to filter output by applying their taste. There’s the never-ending loop where you create and evaluate and necessarily throw away.

A person with developed taste who ‘filters’ someone else’s creation, recognizes great work, helps the creator shape their output, is thus a part of the above loop, and depending on specifics may be of importance comparable to or greater than one of the creator.

The balance between contributions of different entities that cause a work to be created and known, or a style to be formed, is always fluid (among singers, performers, front-personas, authors, producers, mentors, labels, etc.). Replace one of the components with a computer and overall picture doesn’t change much.

If a piece of music was generated by software, then whoever set that software up and filtered its output is the creator. That may include the programmer, those who were using the software, other people who directly influenced the creation in significant way.

If a person using some generative algorithm doesn’t feel like their input was substantial enough, they might use a pseudonym. Attributing music to a computer explicitly would be purely marketing move and it doesn’t change the fact that author is always conscious being(s), which, unless we’re in singularity, a computer isn’t.

> Attributing music to a computer explicitly would be purely marketing move and it doesn’t change the fact that author is always conscious being(s), which, unless we’re in singularity, a computer isn’t.

I don't know. I'm sympathetic to this view (and for the record, I wasn't going anywhere near a hard AI/singularity argument), but on the other hand, I think after enough iterations you won't be able to find where that human input actually comes in. When we finally get a NN that produces something actually great, will we be able to point to a specific line of code, or a specific input, or a specific programmer whose taste resulted in that? We already struggle to understand the inner workings of neural nets.

So you can argue the 'taste' step comes into the selection process. Somebody has to sift through the output of the NN to choose what's good and what isn't. But what if that's automated? A different output to a different member of the population, so then the NN can test itself, and it decides what is worthy of output on a larger scale? Then you can't point to any one individual either.

So it's a semantic point. I think you're right, fundamentally. But I think we can very quickly reach a point where we have to travel through a very long rabbit hole to get back to that key human influence.

>Thought experiment: one day it is not uncommon for some deeply moving, emotional works to be composed entirely by computers (though performed by humans). How will we, as a society, react to that? Will be reject it as being 'unauthentic', or use it as an opportunity for introspection?

It would even probably not be unreasonable to think that these NNs will be given names, and different NNs will be characteristically different, so then you might see a Wikipedia page or an album from a specific NN in much the same way we'd see a human composer today. How weird would that be?

I know you're talking about Neural Networks. But I'd like to point you to Vocaloids. As far as names, personalities, and music go - it's a good match for what you're getting at. Hell - one of them is even famous for having concerts! [0]

Hatsune Miku, Rin Kagamine, Luka Megurine, Neru Akita

[0] https://www.youtube.com/watch?v=dhYaX01NOfA

I really love your critique, because it seems consonant with the work my lab does. We work with EM neuron images, we have a DNN trained on image patches that are quite small as the training cost is quite high. If I recall correctly, it was trained for months using 7x7x7 patches on several 100^3 voxel volumes. The output of the network was fractured, and at the time it was the best we could do. However, humans, because we can see the bigger picture, bigger than 7x7 patches on 2D image displays at a time, we can resolve the splits in the image segmentation that the AI is bluntly unable to. It's worth noting that my lab uses convolutional neural networks (CNNs) not recurrent neural networks (RNNs).

It seems to me that this is a technical problem of being able to train and run a large enough network to approach human abilities in pattern recognition. Sometimes this is easier than others.

However, a disclaimer, though I hang out with machine learning people, I'm not yet one myself.

The author could change the generation of piece by generating a short theme, then fixing that theme at particular points in the note sequence and then generate the gaps, and then repeat the thing as recursively as he wants.

This would provide some recurrence of melodies and would sound infinitely better. Problem with these kinds of models is the window based structure without the notion of the general theme. Theme should then be added by the human, and everything else can be filled using the model.

Yes, in short the music doesn't have sense - there is no story in it, or some message. But anyway, that's impressive that the network has generated this music.

Why can't it just be the story of a random walk through classical-music-space?

It can of course. The same way I can just produce random noise and say it is music (avant-garde), and those who don't like just understand nothing in art.

Or more illustrative analogy: if it was generating text instead of music, and produced a sequence of words - some unfinished sentences, or maybe even complete sentences, maybe even sharing subjects (names) sometimes between sentences. But the whole text does not deliver any story. But you say: "maybe it's a story, where all these words follow each other". Well, maybe, for someone. Maybe it is anvant-garde poetry.

But for me this opus has no real meaning.

Sort of like 'Finnegan's Wake' :)

Finnegan's Wake is packed-full of self-referential meta-data and has a highly coherent structure.

In this sense, it's relevant to the discussion because it may appear to have been created by Markov chains whereas in fact it's intelligently molded and makes (more and more interesting) sense from the many perspectives you start to have as you spend your life with it.

Excellent project here, which even from reading the first page you'll have an a-ha moment: http://www.wakeinprogress.com/2010/10/introduction-to-charac...

Because for it to be that you'd need a workably good - if not fully complete - definition of classical-music-space. And this clearly doesn't have that.

Because no message is not a message. Not even a medium.

Well, this was trained on select pieces from 25 composers. Maybe it would be better trained on a single artist's catalog or something like genre. Pandora could do interesting things with their Music Genome Project.

I would say the next step is analyzing structure, but this makes writing music stupid easy. Just wait for something interesting and "everything is a remix" it.

> That being said, I don't think this problem is inherent to the approach of creating music with NNs. Right now it sounds like what you'd get with a well-crafted Markov chain, but NNs can go beyond this, and this article is exactly the kind of thing that will instigate this evolution.

That's a very thoughtful conclusion.

Since you are a classical music student, I'd like to ask: did you think the sample piece on the page was truly classical?

I'm no expert, but to my ear it sounded more like a later Baroque period piece, or very early Classical at most.

In your opinion, is it possible to learn classical music theory without being a musician? Also, is it something that can be learned by reading and listening, or does it require a teacher to be present?

I think that the doctrine "do, and understand" applies particularly well to classical music theory and composition, though I'm very biased in that regard. My post above mostly concerns the "theory of harmony", and my recommended way of learning that is to grab a textbook on the subject (one with exercises) and do the composition exercises. No need to play any instrument, as a geek you can use any software that features a good pianoroll. And here's the most opinionated part: teacher useful but not strictly necessary.

As someone who studied music but never got the commitment and discipline to play it well, I can tell you that learning music theory can be easier than playing it - at least for people like me.

This is strange because I can paint and sculpt very well, but playing is such a struggle for me... I really envy people that can play or sing naturally.

It sounds kind of like a mashup of Philip glass and Mozart


It is hard to look at this post and its results and not remind ourselves Lady Lovelace's quote from nearly 200 years ago.

"[The Analytical Engine] might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine...

Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent."[0]

Edit: s/300/200/. Thanks to icebraining I stand corrected.

[0] https://en.wikipedia.org/wiki/Ada_Lovelace#Conceptual_leap

It's worth pointing out that the idea to have automata make music predates Lovelace. Music making automata were a staple of the renaissance. For example the mathematician and astronomer Johannes Kepler, when visiting the "Kunstkammer" of Rudolf II in 1598, was amazed at an automaton representing a drummer who could "beat his drum with greater self-assurance than a live one" [1].

Incidentally, Kepler corresponded with Wilhelm Schickard on the latter's "arithmeticum organum", the first ever proper mechanical calulator (could do addition, subtraction, multiplication and division).

Automating creativity was very much an idea with much currency in the renaissance. Indeed some of the key advances in mechanical automata, which later evolved into computers, where driven by the desire to automate creativity [2]. The "conceptual leap" that some people lazily ascribe to Lovelace, wasn't hers!

[1] Jessica Wolfe, "Humanism, machinery, and Renaissance literature".

[2] Douglas Summers Stay, "Machinamenta: The thousand year quest to build a creative machine". Associated blog: http://machinamenta.blogspot.com

Sorry for the well, actually, but it's nearly 200 years, not 300 :)

This is fantastic! I've been meaning to make an RNN to generate EDM.

For begineers, I'd like to summarize some excellent resources to start:

- Coursera course by Andrew Ng (he explains everything magically. The best course I've done online) https://class.coursera.org/ml-003/lecture

- Neural Networks and Deep Learning neuralnetworksanddeeplearning.com/ (I highly recommend trying to write your own backprop and MNIST dataset classifier. I wrote in JS and gave me a lot of confidence)

- Oxford ML class (2015) https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni... (perhaps the the most recent MOOC on ML online. Things are progressing so fast in deep learning that it's worthwhile to do multiple ML courses to get different perspectives)

- I also enjoyed http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for inspiration and the video https://www.youtube.com/watch?v=xKt21ucdBY0 is a FANTASTIC summary. HN earlier pointed our the course videos are out http://cs224d.stanford.edu/syllabus.html

- Again for inspiration into big challenges, this talk from Andrew Ng is a good one https://www.youtube.com/watch?v=W15K9PegQt0

When char-rnn was announced I deep some crude hacking and started throwing MIDI files at it. It worked surprisingly well (although, TBH, still quite badly), and it very quickly discovered avant-garde prog jazz:


This is something that has crossed my mind recently... Did you reached any conclusions/results? How did you layer the different midi tracks? It would be pretty cool to build a VSTi or AU with some AI implemented...

When people started switching to online streaming services, like Spotify, I always hoped that something like this would replace them. This isn't quite there yet, and all we really have is the 'classical station', but wouldn't it be great if there were just continuously generated genre-specific streams like this?

In the future we can call it 'radio'.

Edit: Not sure why this seems to be attracting downvotes. Just to clarify: I'm completely serious. I wish I could just tune in to this neural net for the next few hours, and it strikes me as a perfect form of what we call radio now.

I actually think for certain genres of music this could work.

Lots of pop, country, top-40 rock, certain jazz subgenres, could probably all just be generated continuously, streamed through some kind of low-bandwidth description language and a combination of local synthesis just build the song up.

In certain milieus (shopping malls, background music, elevators) nobody would even notice.




xoxos, who's created some of the nicest algorithmic music generators out there, asks us in a post [1] to imagine a future where creators of algorithms are considered in the same way as today's musicians, and their algorithms are considered much like today's recorded music. The audience/performers, instead of playing an audio file, could "play" an algorithm.

I think we're still a few generations away from such a thing becoming mainstream, but I'd love to be proven wrong!

[1]: http://www.kvraudio.com/forum/viewtopic.php?p=5210459#p52104...

When you said "play an algorithm", it reminded me of this idea I had where I wanted to play back the execution of a running program and map the assembly or IL to notes/frequencies/instruments/sound. Literally, "playing the code as if it were music".

Then I had this strange thought - what if you could monitor the cacophony of your running systems, and detect a problem, or a certain event, just by the presence of a particular audio theme or tune. I bet an infinite loop would be pretty annoying and obvious. Just as long as the server getting overloaded doesn't sound like getting rick-rolled ("Never Gonna Give You Up" by Rick Astley).

People did this with old computers that emitted lots of radio interference like the PDP-1. It was possible to debug in exactly the way you're describing by listening to a radio held up to the CPU.


This is really cool stuff - the network structure reminds me a lot of Graves' MDRNN[1] and Grid LSTM[2], as well as some work I helped with (ReNet [3])

I wonder if the structure over frequency/time is too "regular" - in general for sound the frequency correlation and the time correlation are on wildly different scales.

Also if you are looking to go farther you might reconsider adding NADE or RBM [4] on top, or latent variables in the hiddens[5][6] to add more stochasticity.

There was some alternate work by Kratarth Goel extending RNN-RBM to LSTM and DBN, it might give you some ideas to look at [7]. I know when we messed with bidirectional LSTM + DBN for midi generation it lead to this kind of "jumbled/dissonant" sound you seem to be having - don't know what to make of it here. You might consider bi-directionality over the notes, though it makes the generation way more annoying.

Awesome work! I will definitely be sharing around and checking out your code.

[1] http://arxiv.org/pdf/0705.2011.pdf

[2] http://arxiv.org/abs/1507.01526

[3] http://arxiv.org/abs/1505.00393

[4] http://www-etud.iro.umontreal.ca/~boulanni/ICASSP2013.pdf

[5] http://arxiv.org/abs/1411.7610

[6] http://arxiv.org/abs/1506.02216

[7] http://arxiv.org/pdf/1412.6093.pdf

The biggest reason I want this kind of thing to succeed is to see a piece of artificially-generated music pass the "musical Turing test". If you play a piece of generated music for someone, (no matter how good it is), as long as you tell them that an algorithm produced it, you'll always have that guy who tells you "it doesn't have the 'soul' that real music has".

So make them prove it. Put them through numerous statistically significant double-blind tests with human-vs-computer generated music. I think at some point in the near future no one will be able to legitimately claim that there's anything "magical" about human-produced art.

I'm pretty sure David Cope has passed that point already with his algorithmic techniques. However, the kind of formal double-blind that you suggest would be difficult to put together in a way that everyone agrees upon unanimously. It's also a problem that most of his algorithmically created music has never been performed and only exists as poor mock-ups of a performance.

I've thought a lot about this. It's a very interesting problem to construct a musical Turing test because music is such a relative and personal thing. The "best" works of all time are meaningless to someone with no relationship to the piece, the composer, or the time period. And the spectrum of material is very wide. You can have composers use all kinds of algorithms and methods as "composition tools" [1] to produce works that sound very algorithmic and repetitive to the unfamiliar [2] or seemingly random [3] all the way to the theorized "machine composer" that can also produce seemingly organic music or obviously algorithmic music. Which is more genuine: the algorithmic composition made by a human or the one by a machine/program? What if they use the same algorithmic tools?

Before one constructs a musical Turing test of music from machines vs humans, I'd like to see a test first about if people can discern whether algorithmic or formulaic composition techniques were used in a (human composed) piece or whether it was all "organic" (I don't even know how to define that musically, composition is rarely a purely inspirational and complete endeavor. It's lots of editing, transformations, permutations, cut/paste, selective "not caring" in some spots). I think the answer to that question is a more insightful place to start as I think the general public isn't aware of how many formulas and algorithms (conscious and subconscious) composers actually use to construct their music.

[1] https://en.wikipedia.org/wiki/Schillinger_System (interesting side note: Glenn Miller used the rhythmic tools of the Schillinger System to construct the catchy rhythm component and form of this little diddy you might have heard https://www.youtube.com/watch?v=xPXwkWVEIIw)

[2] https://en.wikipedia.org/wiki/John_Adams_%28composer%29

[3] https://en.wikipedia.org/wiki/Arnold_Schoenberg

I think that for some genres of music (or perhaps anything "not electronic"), another problem lies in audio synthesis. Part of that "soul" lies not just in the notes played, the volume they are played at, and the duration they are played, but also in manipulation of timbre, the type of attack, and all the other expressive factors that a skillful player of an instrument can bring out. As a brass and wind instrument player, it's painfully obvious to me whenever a song uses fake trumpets or cheesy synthesized shakuhachi. Even for something like violins providing chords behind a melody, it's not hard to discern when they are real and when they are fake. So if we really wanted to make it double-blind, we'd need the actual sound produced to be at a level of a professional musician -- I'd imagine having musicians play back generated sheet music wouldn't work, as they would necessarily impart their own interpretation.

But as another commenter said, part of what makes human art interesting is the life experience that forms it. If you reduce music to merely sound, then yes, nothing prevents artifically-generated music from competing against human music. Perhaps we will discover that there is nothing which distinguishes between the two can be heard. But the story of human life that produces the sound of Lee Morgan vs Miles Davis, or hearing the influence of a mentor like Katsuya Yokoyama in a pupil like Tajima Tadashi performing songs others have also performed, is to me as interesting as the music they produce.

Ultimately, any algorithm will possess biases present from its original authorship by a human who has tuned it to to "sound right," which is shaped by their own life experiences... so can artificially generated music ever be considered purely artificially generated? (You could also go into things like, for music in non-equal temperament scales, why did the algorithm's author pick a particular tuning?)

Finally, one thing I'd like to add is that I could see cases where abstract music would actually be more challenging than structured western music. An algorithm that could compose a compelling shakuhachi honkyoku would be impressive.

There's so much to love about this post - and I've only just glossed over the details.

I'm really impressed with the quality of the music that's come out of this.

It feels like there's not that much in the way of dynamics in it (every note is hit with the same force) - is that right? I suspect that these pieces, played by a professional who could add more of the human element to the feel, would sound really good. Obviously, that's sort of against the point, but then again, Ravel wasn't much of a pianist (apparently!) but he could compose amazing music – so it's not totally cheating.

Yeah, I simplified it by not using any dynamics in the generation process. I could probably add some version of dynamics using the MIDI velocity, actually, but I haven't done that yet. Also, I generated the mp3 files from MIDI using GarageBand, which doesn't help with the flat dynamics.

the article says it was trained on a collection of MIDI music, which tends to lack the dynamic feel of live performance

This is really a great application of RNNs. Could you comment on how long it took to train, once everything was set up?

Also, how long does it take to generate one of the songs, using the AWS instance you describe in the article?

I trained it for about 24 hours, although there didn't seem to be that much improvement after the first 12 or so. Generating a song with the trained network actually happens almost in real time. I'm tempted to try to make it continuously generate new music and stream it, but even the small cost for the instance would start to add up, so I haven't actually tried setting that up yet.

Another interesting data point: the learned set of weights ends up being about 15MB.

Nicely done but wait, 15MB?? Clearly the model isn't big enough :) Are you in under or overfitting regime?

As I was listening through the samples it seemed to me that it would start out quite energetic and then converge on repetitive, slow chords. Any ideas on why this could be happening? Or perhaps it's not true.

Also, you should label the samples with numbers so that it's possible to refer to them easily. I liked 4th from bottom quite a bit in the beginning.

What's wrong with 15MB? A Googlenet model is only ~50MB while being able to recognize objects in real world images with good accuracy.

Yes, and if you look at the images deep dream generates from that data, it's not even close. This isn't about recognizing music.

Actually, I was looking for a nice sound source for a pet project, gridmix.[1] If you'd be alright with it, I could continuously stream a live feed to one of the cells.

It would help demo the point of the service, and it would be a very fitting sound generator :)

What do you think?

Astonishingly good write-up, by the way. Very impressive!

[1] http://gridmix.fishing/

If you are ok with someone else setting up a service like this, could you please share the model data? I have some free computing power, and I wanted to play with deep learning for quite some time. (Also, If I set up a service like this, I'll link back to your blog post)

Not quite classical - "jazzical" perhaps? There is a freeform nature to the music that is more Free Jazz than classical, but the underlying classical style of the input is still readily apparent.

It might be interesting to feed it Jazz instead. Jazz in Jazz out.

Oddly enough it has the characteristics of bad jazz, like prono music but classical.

> It ended up costing about $5 for all of the setup, experimentation, and actual training.

This is the most amazing piece of information. I bet there are tons of low-hanging fruits that, thanks to the openness of academic papers on the subject, cheap hardware and computational power, can provide phenomenal results in the field, even for hobbyists.

I'll read this properly in the morning, but my first impression: I think if you want it to be classical music then it has to obey the rules of counterpoint, and the pitches should wander and resolve according to those rules. Its the polyphony and interaction between the voices that sounds wrong. I'm not sure that figuring out counterpoint is a suitable job for an NN to figure out.

> I think if you want it to be classical music then it has to obey the rules of counterpoint, and the pitches should wander and resolve according to those rules.

Uh, that's polyphonic music. There's plenty of classical music that isn't particularly polyphonic. Most of post-Baroque music is more homophonic: voices tend to move together in chords rather than independently. Counterpoint still appears but it's not the foundation.

counterpoint is also used for the individual voices in the chords. you don't just bang out one chord after another. that's why you switch between different inversions as the chords progress - to avoid parallel 5ths etc

also in this case he is not working with chords. they are just pitches moving.

Yes. Sorry, but as a music geek, this is actually pretty terrible - not even close to bad first year composition student pastiche, and a long way short of David Cope's EMI, which is probably the current state of the art.

I wish coders who are trying to do something in a creative domain would learn the basics and not just assume they can throw some simple algorithms at an artform and get anything close to an acceptable result.

No one is going to take a coder seriously if they can't code fizzbuzz.

Here's a thing to know: all the arts have their own equivalents. If you don't know what they are, learn them. Then maybe you can start thinking about non-toy algorithms and data structures that are going to impress an audience that cares more about quality of output than implementation details.

Most people who start working with domain-specific knowledge find it's much harder than they think.

This isn't interesting because it generates the best algorithmic music, it's interesting because he trained it on data, and didn't implicitly encode (very much) music theory into it. In other words, the fact that it sounds musical at all is due to the power of the neural network, and not to a carefully human-curated set of music theoretical constraints, onto which an RNG selects the remaining free variables. It's learning music from reading music and it's generating music based on a model constructed from the data itself.

But that's not actually true. It's a note sequence mash-up machine, not a music theory machine.

That's the point I'm making. You'd get similar-sounding results by taking semi-random snippets of the source data and splicing them together with a tiny bit of glue logic.

The NN is more or less doing that anyway, but by more roundabout means.

It's a long way from there to being able to say that it has a non-trivial model of classical theory.

The point of this type of modeling is to see how far a black box can get. I don't think anyone is claiming this LSTM is creating "state of the art" art.

Although I agree with you about the music part, truth is the post actually is about proving the oposite. Sure you have scales and counter point and harmony to care about, but given enough computational means, a machine will eventually figure them out from previous examples without knowing the theory.

The title of the post as it appears here on HN is misleading. Actually the original title does not mention classical music, and the original post as a whole mentions the word classical exactly once, when referring to the Classical Piano Midi Page[0], from which he took the material he used to train the neural network. So in a way the entire discussion here is misguided.

The post is interesting I guess for its discussion of neural networks, but I fear attempting to train an AI system for 24 hours to produce anything which purports to imitate an artform that was developed by countless generations of human beings, is a bit pretentious.

As an aside, I find it a bit alarming that people are seemingly so eager to generalize the term "classical music", without taking into account that it refers to a field which is almost infinite in its diversity of forms and styles.

[0] http://www.piano-midi.de/

I don't think the conversation is misguided. The meaning of a word varies with context, and this neural network was not trained on, say, Hindustani classical or twelve-tone music.

The vagueness of "classical music" can be annoying sometimes, but in this case people approximately agree on a contextual definition. In the top comment, a precision is brought: "later than medieval and earlier than late-romantic in style", which encompasses almost all of the set used to train the neural network (and consequently, the style that the network is trying to reproduce). Sure, it's a broad range, but the theory that can be used to study it is surprisingly detailed and generalizable.

For anyone needing a collection of MIDI files to experiment with, here is an excellent dump.


Starting with classical music is far too ambitious. Music can be viewed as having five major attributes:

- Melody

- Harmony

- Timbre

- Rhythm

- Form/structure

In classical music typically all five are important. I think it would be better to start with techno (the dance music subgenre, not electronic music in general), where only rhythm and timbre are important. I think this has much better chances of generating something enjoyable to listen to.

where only rhythm and timbre are important.

The problem is, the fewer elements you have, the higher importance each is and the higher quality that is needed to sound "good". When you look at electronic music genres that focus only on timbre (often called sound design) people spend years perfecting their craft and whilst we have some broad notions of how to construct pleasing melodies and harmonies without first listening to them, I doubt anybody can construct new sounds and have them sound good fist go without a human ear to guide the process (which is what we are asking a program to do). Sound design simply doesn't have the depth of analysis and understanding that melody and harmony have currently.

What composition algorithms lack is not the ability to compose like humans but a life that will give them perspective and a story a context to compose within.

Music is (dis)harmonies over rythmic patterns. There isn't anything inherently artistic about humans that computers can't replicate with time. Even the ability to compose an original song isn't beyond the alghoritms.

The irony is that performing musicians are actually striving for, but failing at, reaching the perfection level that computers so naturally have.

And so for computers to sound more human like they have algorithms that make them more "sloppy".

Then again a lot of music is really formulaic anyway and computers are used for most of it. There is nothing in a few years that will hinder some sort of computer star to be born. But it's probably never going to connect with us the same way another human can. Not for now at least.

Off-topic, but: that's a cool background and site design in general.


Very impressive. It could as well generate names for each song with some effort. That sample outputs are the exact same length highlights the fact that songs are "generated".

This is fantastic. Would love to hear training on something a bit more contemporary, i.e., 20th century or beyond. Fascinating stuff.

just awful. david cope has for years been computer-generating in-the-style-of (learning from scores) music that's actually decent (unless he's hoaxing us and just writing it himself). https://www.youtube.com/watch?v=PczDLl92vlc

Interesting - it's pretty linear right now, but it could be much more convincing if it also had some understanding of form, e.g. Sonata-Allegro form - themes, recapitulation, etc. I wonder if the training could involve a process that abstracts the seed data similar to Schenkerian Analysis and then extrapolating out from there.

If you're interested in this, https://www.youtube.com/watch?v=OTHggyZAot0 which was given at EuroPython 2012 on "Music Theory - Genetic Algorithms and Python" is also good.

In my mind the value of classical music is the emotional expression of a human being. Substituting the human with a neural network replaces the meaning of classical music for me. Therefore it does not make sense for me.

I've often wondered what would happen if someone developed a neural network system (or similar) that could reliably produce great melodies. It would have quite a profound impact on the history of music.

It would be interesting to hear what some of the sources are in the training set. My first impression was Goldberg variations meets ragtime.

it's absolutely incredible. I wonder if this says something particular about classical music versus popular music. Often classical music is appreciated for its compositional complexity, but it reflects on the possibility that sophisticated mathematical expression in music was idolized by a certain age, and particularly an age in which massive and cheap computational power was non existent.

What I am saying is that it would be harder for a neural net to produce a pop song, than a sophisticated classical work, and if true that suggests that pop music is sophisticated in ways that aren't easily quantifiable by mathematical expression. I wonder if this is an 'evolutionary' pressure on music creation.

I still think it would be a hugely easier to automatically produce a plausible pop song than a piece of music from the Classical era (c. 1770-1820). The output of this neural net is really neat but (at this point) there is no way that it would ever be mistaken for an actual classical piece by an educated listener, and it still seems pretty far from that. (David Cope has produced some really impressive output, but I still think it would have taken a lot less effort to produce pop music.)

I agree, if we are speaking about composition alone for example comparing two pieces played on the same piano. But what I think is interesting about the contrast is how pop music has evolved into a science of performance, and while I think a computer synthesizer and algorithm can render a convincing performance of Bach, the same can't be said of pop music. I know that we don't have the real Back here to perform, but that's kind of the point of what I am getting at, which is the idea of what these musics actually are in cultural math, if that makes any sense. I say it's easier to replicate the idoms of classical piece than of pop, because pop has complexity in performance, which actually means recordance, if that word exists.

I say you're wrong - and I'm saying that as someone who has worked on both.

Bach, Mozart, and Beethoven are so complex there are no living human composers who can produce convincing imitations - never mind machines.

There are a few people who can do clever improvisations in-the-style-of, but that music completely lacks the big structures, broad relationships, and metaphorical depth of the real thing.

In comparison, electronic pop styles are much more formulaic, and the forms and changes are much more predictable.

I think we're less than ten years away from completely generative pop, vocals, sound design, and all. Good computer generated classical music is going to take quite a while longer.

Fair enough! I agree with your statements on the classic composers, however modern composition in it's highest art forms has moved with the rest of art towards abstraction, accident, pure invention, and contemplation of pure noise. You might say the sound of reality is so complex that no living human composer can produce a convincing imitation either! (in the role of devil's advocate at the crossroads)

You might also say that all composers are doing clever improvisations in the style of the sound or reality :}

I agree that pop is simpler and more formulaic, but I'd argue that the practice can result in complexity that belies it's seeming simplicity. While simple, pop music is volatile, so a popular form one year might look entirely different than a popular form the next. It isn't driven by the same stylistic conformity as music in the classical periods was.

Pop music, when it well done, speaks in many layers, across cultural realities. I know that Bach for example, did the same, but I'd also argue, given the fact that composers are speaking on many levels, that many of those cultural realities are lost to the modern listener, some of whom are appreciating classical music in the mode you describe: an art of big structures, broad relationships, and metaphorical depth about ideas and things that they have no cultural basis to understand (and I think this idea goes a way towards explaining the increasing fragmentation and deconstruction of classical order in the high art of modern musical composition).

Therefore machinery might have an easier time reproducing classical music than popular music in spite of it's simpler melodic formulas.

Pop has actually ossified over the last ten years or so. Dubstep is more than fifteen years old now, and the Paul Van Dyk album I've just listened to sounds a lot like the last Paul Van Dyk album a few years ago.

Then you get this:


and this:


and the now famous this:


and you start to wonder how much entropy there is to analyse.

Modern classical has gone the way of modern art. It's now basically a marketing exercise. The musical experience is secondary.

But then I expect creative AIs to be better at marketing too...

Ahahah! Thanks for the clips and good points! And in spite of how formulaic pop music is at the composition level, I still think it would be harder to AI a convincing pop song than a classical piece, because so much of pop's information is encoded in performance, which ironically is thanks to the ability to record instead of notate. You are right that it would be much harder to fool an advanced listener. To me the videos you shared say something about this whole AI problem and the future of the music market, which is that interactivity and engagement with creation in musical forms and performances will replace the traditional performer/audience role format. I think the problem of artificiality, which we might as well call the ease by which music can be entirely reproduced by algorithm won't work in the traditional mode of performer/audience. Instead all the power to cast musical forms that are effective at entering and altering the psyche of the individual listener will be sublimated and abstracted just beyond their conscious intention, so that they can become a player in the musical work. This is what happens now too, and it is why the AI performer won't work unless as a hoax in our current format - the participation and identification with the singer are fundamental to the music experience. Those country singers are singing about manly things: mostly getting girls into bed. You may be able to program a compute algorithm to crank out shallow man tunes like that, however it will have to also be tuned in to the culture of the times, because I can hear a lot of change in those country tunes than country tunes of ten years ago. There are layers and layers. But primordially, there is identification with cultural realities, and these might well be generated by machinery, and already are, however they will fundamentally align with the desires of the human creator, unless someone creates an artificial consciousness to compete with our human one. So bar that, musical AI will enhance and entice us towards a deeper and better human experience, which the machine knows not!

No human composer will ever produce convincing imitations because the only test that will be accepted as "convincing" is the judgment of an expert. Experts are familiar with the complete works of those famous composers, so they will immediately know the imitation is an imitation. Their knowledge makes them incapable of unbiased judgment. A blind test is impossible. Even tricking the expert into thinking a new genuine work has been discovered won't help, because they will know that is much less likely than an imitation being written.

I've listened to a lot of music by those composers, but not all of it, and I'm confident a skilled imitator could fool me.

That's a different problem. Fooling an amateur isn't so hard - as this thread shows.

Fooling a musicologist is much harder. David Cope thinks he's done it already. I'm not entirely convinced, but the output of EMI does a reasonable impersonation of pastiche, which is about as good as it gets for now.

I think we're less than ten years away from completely generative pop, vocals, sound design, and all.

I highly doubt this, because of the vocals. Generating coherent appealing lyrics and synthesizing voice to sound natural isn't going to happen in 10 years time. Would be really cool if I was wrong, though.

See also, Vocaloid.

It's not so different to autotuned vocals now. The most recent versions are already significantly better than the original implementation.

I think the problem you're pointing to is not that pop music is somehow more difficult or more performance based - a lot of pop music is basically electronic music and has relatively little performance element. What you're noticing is that music (and not just pop music), and in fact, all art, is anti-inductive [1].

What I mean by that is that our expectations of music are constantly changing and evolving, and the minute something is created that captures our imagination, creating something similar becomes "not that interesting". It would probably be fairly easy to create an engine that produces pop music from the 80s or 90s or even 2000s, but the closer you get to today, the more the music is either a copy/rearrangement of an existing pop song, or it's not recognised as pop music.

The very nature of music is constantly reinventing itself, everyone always looking for that "new sound". These neural networks, as they are at the moment, can, it seems, pick up an existing "sound" and learn to reproduce it, but creating "new sound" is another matter, at the moment at least.

[1] http://slatestarcodex.com/2015/01/11/the-phatic-and-the-anti...

yep exactly what I am pointing to. I meant performance based in that its complexity comes from how it interacts with culture language, other music, ideas, etc: cultural production. True also of classical composition, but as you say, it's already been done, and so the pattern can be reproduced. I was also adding the idea that the mathematical and notational rule-logic of a lot of classical music lends itself to machine based reproduction in ways that modern pop pastiche does not, and that's counter intuitive as some comments noted, because the actual product itself is entirely machine mediated.

You mean like the sometimes sublime slow movements of Mozart piano concertos as in for instance the G major or A major - which virtually any piano student beyond the basic level can manage?

If they wish, idiots (like me) can use Lilypond (a free tool for creating musical scores) or similar software to create extremely complex music based if you like on some math. Any work has a level of complexity of course but to suggest that a measure of that is in any way a feature to be admired is not an argument that I can imagine going down very well in musical circles. To be frank, it would be laughable.

I implore you to delete this before simon cowell finds it

This is very cool. Thanks so much for sharing.

Haven't read, but that's amazing!

But I've listened for the music example and have heard previously of recurrent neural networks.

Great resource for indie game developers!

OK, just 'an idea'. Music is human impression. So one who want to train NN to impress humans with music, must first train NN to model human impression with all its limitations of perceived frequencies, reaction and tolerance to repeated patterns and so on. Then you may use the result as a limiting envelope for 'composing layer'. The good is that you will probably not need a human to estimate "humanness" of the result.

Thank you for the detailed essay and sharing the code.

Have you considered training with the works of one composer at a time?

I've definitely considered it, but so far I haven't found many single-composer datasets that are large enough to train with. I also think it would be pretty cool to try training it with music from specific musical periods.

I've wondered if it is possible to teach such subdivisions by simply including them as metadata and then using the same metadata as primes. So for example, you'd train your RNN on a big dataset of Bach, Mozart, etc, where each line of music is prefixed "BACH |" and then when you went to generate samples, you'd pass in as initial state "BACH |". Presumably the RNN would gradually learn that "BACH |" samples sound different from "MOZART |" samples and would adjust the conditional probabilities appropriately. Similarly if you wanted specific time periods. (And if the style metadata tends to be forgotten even with the LSTM, the metadata tag could be reinjected every _n_ steps.)

(The nice thing about this metadata hack, if it worked, is that you could deploy variants of it without having to rewrite or modify your existing RNNs, necessarily. For example, you could do this easily with 'char-rnn' by simply using 'paste' or 'sed' to prefix some metadata to each line of the input file, without any changes to 'char-rnn' itself, since it already reads in files and has a '-primetext' option in generating samples. I've been meaning to try this out.)

bach doesn't have enough music? it seems like his music would be particularly suited, based on the results i listened to

I was able to find quite a bit of Bach music, actually, and I used it when I was first experimenting with this idea. At the time, I hadn't added dropout, so the output wasn't as interesting. I'll definitely try retraining the updated network with Bach.

wow that is really cool. of course no one could confuse the results for a human, but they are interesting to listen to, so clearly you are on to something. every so often i see something along these lines, but this is by far the best result, and most interesting write up. well done

I mean, not to the best of humans, but someone that is intermediate, I'm sure most people couldn't notice anything "wrong" with the first sample if they weren't told in advance.

I actually found them really enjoyable, I've been actually listening to them, and as the author says, except the part where it stays for a really long time in one chord, it's eerily similar IMHO.

I wouldn't say no one. It sounds amazing to me, but I also like Wesley Willis.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact