As usual* the result sounds awfully unstructured and unenjoyable and could be aswell be achieved by some random walk through a musical scale, since if you put some basic music theory in program form, you can get these small harmonic structures pretty easily.
(*since projects like this seem to pop up every couple of months)
Like other people already mentioned, the part that usually gets neglected is the overarching dramatic structure of a musical piece. Compare a complete shakespeare piece to a pile of randomly thrown together half-sentences.
I don't fully understand the fascination of music generation with some ai-neural-learning buzzword bingo technique that always gets kickstarted by dumb-force-analysing a human made music corpus to achieve it.
What in a musical sense is much more interesting is to generate _new_ music that cannot be composed by a human, and cannot be played by a human. That's playing to the strength of the machines. Sonification of large datasets, sonification of function behaviour. Sonification of the binary world, that's so different to ours. This is much more interesting than the 10th failed emulation of a simple folk song.
Nevertheless, as a students piece about programming neural networks, it's certainly ok, the presentation is nice, but the result is uninspiring, like building a car tire out of bananas, just because it's possible. Just let the folk songs belong to the actual folk.
As a side note: what would happen if the result were millions of super nice catchy folk tunes on a button press? Would it be the end of pop music as we know it? Maybe i redact my opinion.
That is certainly more musically interesting from a perspective of what can be achieved with computers, however the translation into electronic music of that tennis data involved lots of artistical freedom and it needs much more precise information on how that was translated.
How else, if not by human interference, could there be some 4/4 techno beat and offbeat hihat. A much better interpretation would take the quite irregular time-based counting structure of tennis 'point'&'advantage' 'game''set' and interpret that as a rythm without the boring techno-bed. The sounds here are also much more fitting, since they are synthetic and adequately mixed.
Maybe all the cases of 'music done with some neural-machine-learning presented as awful sounding midi piano renderings' are just there because music seems to be a universally liked phenomenon simply presented as pitch over time and its appealing to research students of this specific field of programming to take this as an anchor for their experiments.
As a general tip for these projects:
If you take music as a main plot point, then learn some basic music dsp and render your experiment with well behaved sinewaves, or get some daw and put some nice preset sound to it - basically put some minimal effort to the actual musical presentation. Awful music is much more unbearable than awful graphics.
OP here, you bring up some fair points and I wanted to respond.
Definitely agree with you that the result of these experiments and others similar to it so far have produced unstructured and unenjoyable music, when compared to human-level compositions. I also agree that you could achieve similar results by some rule-based system involving random walks through a musical scale, common chord progressions, etc.
However, to me what's exciting about this and similar projects is that the network is learning these rules on it's own. The whole point of machine learning is that we don't have to explicitly state or even understand the underlying structure or music theory, because the algorithm figures it out on it's own. And sure, right now, it is only learning structure that we can codify manually in rules. But who's to say that as neural network techniques become more advanced, they won't be able to learn more abstract concepts such as the overarching dramatic structure of a musical piece?
I'm quite sure I have seen other attempts to generate music with RNNs recently, although I don't remember exactly anymore. You don't cite that many references to other approaches, only the one from Boulanger-Lewandowski from 2012.
I did a quick search, and I probably miss a lot, but I found these:
I haven't really looked into any of these, so I'm not sure about the differences. But it would be good if you cite some relevant other works and point out the differences.
In my opinion, anyone who works on music generation should take a look at Karma (http://karma-lab.com/) for a baseline of what can be achieved by simple math and plain old programming. Probably not particularly interesting from programmer's perspective (it's closed-source and to the best of my knowledge doesn't use anything fancy), but the end results are spectacular and used in real music.
It is an integral tool for my own music. My chamber music project is essentially a real-time electroacoustic music generator that analyzes microphone input. So, a soloist plays along and influences what the tool plays, and the two play in "harmony."
This was new around late 90s i think, yes its a good example how 80s style programming with knowledge of music theory can give quite nice mega-arpeggios.
But this also misses the point, since the strength of our digital machines is not generating the same old musical variation that we could aswell compose ourselve if we weren't so lazy, but to show us music and sounds that we cannot compose and play by hand. The machine music should tell something about itself, not try to mimic us.
A similar approach, but with a different goal - that of finding and showing the sound patterns that make music attractive to humans and other animals could be very useful.
I would love to see a real-time bebop improvisation generator in the style of Charlie Parker, Sonny Rollins, Bill Evans, et. al. - bebop is definitely a musical (jazz) language that I bet would be well suited to RNNs.
I'd be interested in jazz harmonisation of an input melody line. Not band-in-a-box level results but something that actually sounds like it was done by a competent arranger, like this - https://www.youtube.com/watch?v=Eaqf-wRSx7E
It feels like it would be an achievable goal, given the right kind of training material.
Maybe hiring a session pianist for a few days to harmonise a bunch of key/tempo normalised jazz standards on a midi keyboard, so that the harmonisations and melody input are separate, labeled data?
Both of the pieces at the top of the article sound like off-key broken record renditions of the main refrain of "Jesu, Joy of Man's Desiring" to me. It's like the RNN cannot hold enough state to express the structure of a real musical piece and it just emits riffs here and there of main themes from its training set.
What would be somewhat impressive is if it spontaneously figured out the note sequence I hear from observing its re-expression in bits and pieces from various jigs and folk pieces in its training set, kind of like this:
Cool. Anyone have an opinion on "state of the art" for music generation? I realize this is entirely subjective. This one sounds pretty interesting! It'd be awesome to get something like this on a top 10 list and start influencing man made music. We can't be so far off from that. The kids these days love techno and that is easily synthesized relative to music with original lyrics and voices.
Well there are a couple different paths, if I can offer up a bit of perspective.
There's the "generated" music concept sort of like this, that basically creates the piece from zero to finished product. As in, there are tones and sounds and maybe some rhythm in it. Basically it makes a track. There was a post here recently about a 'brain support' music generation program/service thing, and I'm pretty sure the sounds they use would fit in the above description.
The other concept is "element" music generation. This would be a plug-in or software piece that works for a specific instrument. Apple's GarageBand has Drummer[1] and I've had good results using it so far. I think there are others on the market and different examples of a similar concept, like Instant Haus[2]. These aren't stand-alone music generation pieces, but resources upon which to build into a whole.
Cool thanks. I find this area really interesting and relatively unexplored to the degree that profitable ventures are sometimes pursued. Yet it seems to me there would be a market for it. Not that there needs to be. But could be, and will be at some point.
Any musical program running on a machine, one that outputs a continuous number stream, shall be written so that on execution the program gives the machine the ability to interpret and express a part of its current inner state.
That current inner state shall as much as possible be un-colored by human prejudice and mood and so must the program be written (dilemma).
The human shall fully acknowledge its role as initiator to a universe of binary logic unfolding over time, thus enabling the machine to be the only composer, the only conductor and the only performer of itself without any further interference.
Fascinating piece of research and the details in the write-up managed to click mostly even though I know it's a level far above my head. Well done and glad to have come across it.
Like other people already mentioned, the part that usually gets neglected is the overarching dramatic structure of a musical piece. Compare a complete shakespeare piece to a pile of randomly thrown together half-sentences.
I don't fully understand the fascination of music generation with some ai-neural-learning buzzword bingo technique that always gets kickstarted by dumb-force-analysing a human made music corpus to achieve it.
What in a musical sense is much more interesting is to generate _new_ music that cannot be composed by a human, and cannot be played by a human. That's playing to the strength of the machines. Sonification of large datasets, sonification of function behaviour. Sonification of the binary world, that's so different to ours. This is much more interesting than the 10th failed emulation of a simple folk song.
Nevertheless, as a students piece about programming neural networks, it's certainly ok, the presentation is nice, but the result is uninspiring, like building a car tire out of bananas, just because it's possible. Just let the folk songs belong to the actual folk.
As a side note: what would happen if the result were millions of super nice catchy folk tunes on a button press? Would it be the end of pop music as we know it? Maybe i redact my opinion.