Some have cried violation of counterpoint, but I don't think that is such an issue here. CP is mostly about how notes are linked together on a small scale, and this is rather the network's strong point: it seems to concentrate on the associativity between notes and between chords, and any given moment seems to be composed with a vision that only extends to a few bars (or even just a few beats) around that moment.
The main problem is therefore large-scale structure. For one, recurrence of melodies (at key moments and transpositions) is crucial to creating the emotional value of classical music. None of this appears here.
And secondly, possibly the greatest shortfall of the present neural network (I'm ignoring performance, of course) is the harmonic structure. Classical music, let's say later than medieval and earlier than late-romantic in style, generally has the harmonic structure of a recursive cadence. Harmonic cadences are what give emotional power to harmony, but this NN is painfully incapable of creating any.
That being said, I don't think this problem is inherent to the approach of creating music with NNs. Right now it sounds like what you'd get with a well-crafted Markov chain, but NNs can go beyond this, and this article is exactly the kind of thing that will instigate this evolution.
A "planning" layer that lays out the song plan (ABACABBA, etc.)
A composing layer that fills in those sections. And maybe even generates some slight differences between the same-named sections for variety.
A performance layer that plays it back with a simulation of human performance metrics (slight jitter to note placement, emotive crescendos, suggestive variations in note-length, etc.).
But this NN doesn't solve the greatest problem in Classical music: that only 3% of people take the time to appreciate it.
I bet you can get results much more accurate feeding all those progressions as one of the reference layers.
Some have mentioned pointless sentences, but I don't think that is such an issue for modern literature. Modern prose mostly has to do with what words come after each other on a small scale, such as a single phrase, or the associativity between words and between sentences, and at any given moment most words are only used for a few sentences, then replaced by other words.
The main problem is therefore large-scale structure, chapter to chapter. For one, the recurrence of a theme throughout several volumes of work is crucial to creating its emotional value and cohesiveness. None of which appears in Markov Chains.
And secondly, possibly the greatest shortfall of Markov Chains (ignoring performance) is in plot. They're just not very imaginative. Prose, let's say later than ancient epics, has interesting, recursive plot devices. Plot is what gives emotional power to sentences, and most markov chains are painfully incapable of creating any.
That being said, I don't think this problem is inherent to the problem of writing prose using Markov Chains. Right now it sounds like what you'd get with a crowd-sourced writing prompt, but Markov Chains can go beyond this - and literary criticism is what it will take to bring Markov Chains up to the standard of popular writers like Stephen King.
It's pretty clear that in the near future, we will laugh at the idea of reading something that someone hand-typed from their own thoughts, instead of simply reading the output of one-dimensional random walk. I hope if I have grandchildren who are as interested in modern literature as I am, their degree will be a bachelor of science, with the only work they have to do being to quantify and remove any remaining guesswork from the science of computer-generated writing. Even today, it is hard to understand why anyone still reads writing written by someone. It's as quaint as a telegram.
I hope that your grandchildren's Bachelor of Science would train them in curating brilliant works of art that can then be feed into the Markov chain and perform the one-dimensional random walk.
Notice the overarching theme, the experience that the music "shows you", the arc it leads you through. In comparison, the generated music is but a chain of pleasant and nice-sounding moments, with no overarching emotional arc uniting them.
Now is that the exclusive domain of humans? Will NNs eventually write music that passes for a human composition to a skilled human jury (a musical Turing test)?
Thought experiment: one day it is not uncommon for some deeply moving, emotional works to be composed entirely by computers (though performed by humans). How will we, as a society, react to that? Will be reject it as being 'unauthentic', or use it as an opportunity for introspection?
It would even probably not be unreasonable to think that these NNs will be given names, and different NNs will be characteristically different, so then you might see a Wikipedia page or an album from a specific NN in much the same way we'd see a human composer today. How weird would that be?
Edit: or, even, NNs trained on specific composers (and the music they'd have heard), to try and create 'new' works from existing long-dead composers, or even just to complete famous half-finished works. How blasphemous would that be? But for a sufficiently competent NN, I can imagine the output might be very interesting.
Edit 2: (apologies, but this is really out there) I have also wondered how much information is sufficient to 'recreate state'–that is to create an 'AI' that passably mimics a specific real person (kinda like a Turing Test), and in that sense creates a pseudo-immortality.
A person with developed taste who ‘filters’ someone else’s creation, recognizes great work, helps the creator shape their output, is thus a part of the above loop, and depending on specifics may be of importance comparable to or greater than one of the creator.
The balance between contributions of different entities that cause a work to be created and known, or a style to be formed, is always fluid (among singers, performers, front-personas, authors, producers, mentors, labels, etc.). Replace one of the components with a computer and overall picture doesn’t change much.
If a piece of music was generated by software, then whoever set that software up and filtered its output is the creator. That may include the programmer, those who were using the software, other people who directly influenced the creation in significant way.
If a person using some generative algorithm doesn’t feel like their input was substantial enough, they might use a pseudonym. Attributing music to a computer explicitly would be purely marketing move and it doesn’t change the fact that author is always conscious being(s), which, unless we’re in singularity, a computer isn’t.
I don't know. I'm sympathetic to this view (and for the record, I wasn't going anywhere near a hard AI/singularity argument), but on the other hand, I think after enough iterations you won't be able to find where that human input actually comes in. When we finally get a NN that produces something actually great, will we be able to point to a specific line of code, or a specific input, or a specific programmer whose taste resulted in that? We already struggle to understand the inner workings of neural nets.
So you can argue the 'taste' step comes into the selection process. Somebody has to sift through the output of the NN to choose what's good and what isn't. But what if that's automated? A different output to a different member of the population, so then the NN can test itself, and it decides what is worthy of output on a larger scale? Then you can't point to any one individual either.
So it's a semantic point. I think you're right, fundamentally. But I think we can very quickly reach a point where we have to travel through a very long rabbit hole to get back to that key human influence.
I know you're talking about Neural Networks. But I'd like to point you to Vocaloids. As far as names, personalities, and music go - it's a good match for what you're getting at. Hell - one of them is even famous for having concerts! 
Hatsune Miku, Rin Kagamine, Luka Megurine, Neru Akita
It seems to me that this is a technical problem of being able to train and run a large enough network to approach human abilities in pattern recognition. Sometimes this is easier than others.
However, a disclaimer, though I hang out with machine learning people, I'm not yet one myself.
This would provide some recurrence of melodies and would sound infinitely better. Problem with these kinds of models is the window based structure without the notion of the general theme. Theme should then be added by the human, and everything else can be filled using the model.
But for me this opus has no real meaning.
In this sense, it's relevant to the discussion because it may appear to have been created by Markov chains whereas in fact it's intelligently molded and makes (more and more interesting) sense from the many perspectives you start to have as you spend your life with it.
Excellent project here, which even from reading the first page you'll have an a-ha moment: http://www.wakeinprogress.com/2010/10/introduction-to-charac...
I would say the next step is analyzing structure, but this makes writing music stupid easy. Just wait for something interesting and "everything is a remix" it.
That's a very thoughtful conclusion.
I'm no expert, but to my ear it sounded more like a later Baroque period piece, or very early Classical at most.
This is strange because I can paint and sculpt very well, but playing is such a struggle for me... I really envy people that can play or sing naturally.
"[The Analytical Engine] might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine...
Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent."
Edit: s/300/200/. Thanks to icebraining I stand corrected.
Incidentally, Kepler corresponded with Wilhelm Schickard on the latter's "arithmeticum organum", the first ever proper mechanical calulator (could do addition, subtraction, multiplication and division).
Automating creativity was very much an idea with much currency in the renaissance. Indeed some of the key advances in mechanical automata, which later evolved into computers, where driven by the desire to automate creativity . The "conceptual leap" that some people lazily ascribe to Lovelace, wasn't hers!
 Jessica Wolfe, "Humanism, machinery, and Renaissance literature".
 Douglas Summers Stay, "Machinamenta: The thousand year quest to build a creative machine". Associated blog: http://machinamenta.blogspot.com
For begineers, I'd like to summarize some excellent resources to start:
- Coursera course by Andrew Ng (he explains everything magically. The best course I've done online) https://class.coursera.org/ml-003/lecture
- Neural Networks and Deep Learning neuralnetworksanddeeplearning.com/ (I highly recommend trying to write your own backprop and MNIST dataset classifier. I wrote in JS and gave me a lot of confidence)
- Oxford ML class (2015) https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni... (perhaps the the most recent MOOC on ML online. Things are progressing so fast in deep learning that it's worthwhile to do multiple ML courses to get different perspectives)
- I also enjoyed http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for inspiration and the video https://www.youtube.com/watch?v=xKt21ucdBY0 is a FANTASTIC summary. HN earlier pointed our the course videos are out http://cs224d.stanford.edu/syllabus.html
- Again for inspiration into big challenges, this talk from Andrew Ng is a good one https://www.youtube.com/watch?v=W15K9PegQt0
In the future we can call it 'radio'.
Edit: Not sure why this seems to be attracting downvotes. Just to clarify: I'm completely serious. I wish I could just tune in to this neural net for the next few hours, and it strikes me as a perfect form of what we call radio now.
Lots of pop, country, top-40 rock, certain jazz subgenres, could probably all just be generated continuously, streamed through some kind of low-bandwidth description language and a combination of local synthesis just build the song up.
In certain milieus (shopping malls, background music, elevators) nobody would even notice.
I think we're still a few generations away from such a thing becoming mainstream, but I'd love to be proven wrong!
Then I had this strange thought - what if you could monitor the cacophony of your running systems, and detect a problem, or a certain event, just by the presence of a particular audio theme or tune. I bet an infinite loop would be pretty annoying and obvious. Just as long as the server getting overloaded doesn't sound like getting rick-rolled ("Never Gonna Give You Up" by Rick Astley).
I wonder if the structure over frequency/time is too "regular" - in general for sound the frequency correlation and the time correlation are on wildly different scales.
Also if you are looking to go farther you might reconsider adding NADE or RBM  on top, or latent variables in the hiddens to add more stochasticity.
There was some alternate work by Kratarth Goel extending RNN-RBM to LSTM and DBN, it might give you some ideas to look at . I know when we messed with bidirectional LSTM + DBN for midi generation it lead to this kind of "jumbled/dissonant" sound you seem to be having - don't know what to make of it here. You might consider bi-directionality over the notes, though it makes the generation way more annoying.
Awesome work! I will definitely be sharing around and checking out your code.
So make them prove it. Put them through numerous statistically significant double-blind tests with human-vs-computer generated music. I think at some point in the near future no one will be able to legitimately claim that there's anything "magical" about human-produced art.
Before one constructs a musical Turing test of music from machines vs humans, I'd like to see a test first about if people can discern whether algorithmic or formulaic composition techniques were used in a (human composed) piece or whether it was all "organic" (I don't even know how to define that musically, composition is rarely a purely inspirational and complete endeavor. It's lots of editing, transformations, permutations, cut/paste, selective "not caring" in some spots). I think the answer to that question is a more insightful place to start as I think the general public isn't aware of how many formulas and algorithms (conscious and subconscious) composers actually use to construct their music.
(interesting side note: Glenn Miller used the rhythmic tools of the Schillinger System to construct the catchy rhythm component and form of this little diddy you might have heard https://www.youtube.com/watch?v=xPXwkWVEIIw)
But as another commenter said, part of what makes human art interesting is the life experience that forms it. If you reduce music to merely sound, then yes, nothing prevents artifically-generated music from competing against human music. Perhaps we will discover that there is nothing which distinguishes between the two can be heard. But the story of human life that produces the sound of Lee Morgan vs Miles Davis, or hearing the influence of a mentor like Katsuya Yokoyama in a pupil like Tajima Tadashi performing songs others have also performed, is to me as interesting as the music they produce.
Ultimately, any algorithm will possess biases present from its original authorship by a human who has tuned it to to "sound right," which is shaped by their own life experiences... so can artificially generated music ever be considered purely artificially generated? (You could also go into things like, for music in non-equal temperament scales, why did the algorithm's author pick a particular tuning?)
Finally, one thing I'd like to add is that I could see cases where abstract music would actually be more challenging than structured western music. An algorithm that could compose a compelling shakuhachi honkyoku would be impressive.
I'm really impressed with the quality of the music that's come out of this.
It feels like there's not that much in the way of dynamics in it (every note is hit with the same force) - is that right? I suspect that these pieces, played by a professional who could add more of the human element to the feel, would sound really good. Obviously, that's sort of against the point, but then again, Ravel wasn't much of a pianist (apparently!) but he could compose amazing music – so it's not totally cheating.
Also, how long does it take to generate one of the songs, using the AWS instance you describe in the article?
Another interesting data point: the learned set of weights ends up being about 15MB.
As I was listening through the samples it seemed to me that it would start out quite energetic and then converge on repetitive, slow chords. Any ideas on why this could be happening? Or perhaps it's not true.
Also, you should label the samples with numbers so that it's possible to refer to them easily. I liked 4th from bottom quite a bit in the beginning.
It would help demo the point of the service, and it would be a very fitting sound generator :)
What do you think?
Astonishingly good write-up, by the way. Very impressive!
It might be interesting to feed it Jazz instead. Jazz in Jazz out.
This is the most amazing piece of information. I bet there are tons of low-hanging fruits that, thanks to the openness of academic papers on the subject, cheap hardware and computational power, can provide phenomenal results in the field, even for hobbyists.
Uh, that's polyphonic music. There's plenty of classical music that isn't particularly polyphonic. Most of post-Baroque music is more homophonic: voices tend to move together in chords rather than independently. Counterpoint still appears but it's not the foundation.
also in this case he is not working with chords. they are just pitches moving.
I wish coders who are trying to do something in a creative domain would learn the basics and not just assume they can throw some simple algorithms at an artform and get anything close to an acceptable result.
No one is going to take a coder seriously if they can't code fizzbuzz.
Here's a thing to know: all the arts have their own equivalents. If you don't know what they are, learn them. Then maybe you can start thinking about non-toy algorithms and data structures that are going to impress an audience that cares more about quality of output than implementation details.
Most people who start working with domain-specific knowledge find it's much harder than they think.
That's the point I'm making. You'd get similar-sounding results by taking semi-random snippets of the source data and splicing them together with a tiny bit of glue logic.
The NN is more or less doing that anyway, but by more roundabout means.
It's a long way from there to being able to say that it has a non-trivial model of classical theory.
The post is interesting I guess for its discussion of neural networks, but I fear attempting to train an AI system for 24 hours to produce anything which purports to imitate an artform that was developed by countless generations of human beings, is a bit pretentious.
As an aside, I find it a bit alarming that people are seemingly so eager to generalize the term "classical music", without taking into account that it refers to a field which is almost infinite in its diversity of forms and styles.
The vagueness of "classical music" can be annoying sometimes, but in this case people approximately agree on a contextual definition. In the top comment, a precision is brought: "later than medieval and earlier than late-romantic in style", which encompasses almost all of the set used to train the neural network (and consequently, the style that the network is trying to reproduce). Sure, it's a broad range, but the theory that can be used to study it is surprisingly detailed and generalizable.
In classical music typically all five are important. I think it would be better to start with techno (the dance music subgenre, not electronic music in general), where only rhythm and timbre are important. I think this has much better chances of generating something enjoyable to listen to.
The problem is, the fewer elements you have, the higher importance each is and the higher quality that is needed to sound "good". When you look at electronic music genres that focus only on timbre (often called sound design) people spend years perfecting their craft and whilst we have some broad notions of how to construct pleasing melodies and harmonies without first listening to them, I doubt anybody can construct new sounds and have them sound good fist go without a human ear to guide the process (which is what we are asking a program to do). Sound design simply doesn't have the depth of analysis and understanding that melody and harmony have currently.
Music is (dis)harmonies over rythmic patterns. There isn't anything inherently artistic about humans that computers can't replicate with time. Even the ability to compose an original song isn't beyond the alghoritms.
The irony is that performing musicians are actually striving for, but failing at, reaching the perfection level that computers so naturally have.
And so for computers to sound more human like they have algorithms that make them more "sloppy".
Then again a lot of music is really formulaic anyway and computers are used for most of it. There is nothing in a few years that will hinder some sort of computer star to be born. But it's probably never going to connect with us the same way another human can. Not for now at least.
What I am saying is that it would be harder for a neural net to produce a pop song, than a sophisticated classical work, and if true that suggests that pop music is sophisticated in ways that aren't easily quantifiable by mathematical expression. I wonder if this is an 'evolutionary' pressure on music creation.
Bach, Mozart, and Beethoven are so complex there are no living human composers who can produce convincing imitations - never mind machines.
There are a few people who can do clever improvisations in-the-style-of, but that music completely lacks the big structures, broad relationships, and metaphorical depth of the real thing.
In comparison, electronic pop styles are much more formulaic, and the forms and changes are much more predictable.
I think we're less than ten years away from completely generative pop, vocals, sound design, and all. Good computer generated classical music is going to take quite a while longer.
You might also say that all composers are doing clever improvisations in the style of the sound or reality :}
I agree that pop is simpler and more formulaic, but I'd argue that the practice can result in complexity that belies it's seeming simplicity. While simple, pop music is volatile, so a popular form one year might look entirely different than a popular form the next. It isn't driven by the same stylistic conformity as music in the classical periods was.
Pop music, when it well done, speaks in many layers, across cultural realities. I know that Bach for example, did the same, but I'd also argue, given the fact that composers are speaking on many levels, that many of those cultural realities are lost to the modern listener, some of whom are appreciating classical music in the mode you describe: an art of big structures, broad relationships, and metaphorical depth about ideas and things that they have no cultural basis to understand (and I think this idea goes a way towards explaining the increasing fragmentation and deconstruction of classical order in the high art of modern musical composition).
Therefore machinery might have an easier time reproducing classical music than popular music in spite of it's simpler melodic formulas.
Then you get this:
and the now famous this:
and you start to wonder how much entropy there is to analyse.
Modern classical has gone the way of modern art. It's now basically a marketing exercise. The musical experience is secondary.
But then I expect creative AIs to be better at marketing too...
I've listened to a lot of music by those composers, but not all of it, and I'm confident a skilled imitator could fool me.
Fooling a musicologist is much harder. David Cope thinks he's done it already. I'm not entirely convinced, but the output of EMI does a reasonable impersonation of pastiche, which is about as good as it gets for now.
I highly doubt this, because of the vocals. Generating coherent appealing lyrics and synthesizing voice to sound natural isn't going to happen in 10 years time. Would be really cool if I was wrong, though.
It's not so different to autotuned vocals now. The most recent versions are already significantly better than the original implementation.
What I mean by that is that our expectations of music are constantly changing and evolving, and the minute something is created that captures our imagination, creating something similar becomes "not that interesting". It would probably be fairly easy to create an engine that produces pop music from the 80s or 90s or even 2000s, but the closer you get to today, the more the music is either a copy/rearrangement of an existing pop song, or it's not recognised as pop music.
The very nature of music is constantly reinventing itself, everyone always looking for that "new sound". These neural networks, as they are at the moment, can, it seems, pick up an existing "sound" and learn to reproduce it, but creating "new sound" is another matter, at the moment at least.
If they wish, idiots (like me) can use Lilypond (a free tool for creating musical scores) or similar software to create extremely complex music based if you like on some math. Any work has a level of complexity of course but to suggest that a measure of that is in any way a feature to be admired is not an argument that I can imagine going down very well in musical circles. To be frank, it would be laughable.
Have you considered training with the works of one composer at a time?
I actually found them really enjoyable, I've been actually listening to them, and as the author says, except the part where it stays for a really long time in one chord, it's eerily similar IMHO.