Hacker News new | past | comments | ask | show | jobs | submit login
Music Composition with Deep Learning: A Review (arxiv.org)
125 points by pramodbiligiri 19 days ago | hide | past | favorite | 87 comments

I'm working on audio AI, both academic research and as founder of Spooky Labs. We don't have a webpage yet, but we do have clients. We are using deep learning to create rich new synthesizers that sound like they were designed by aliens, as well as novel vocal manipulation techniques.

We've spoken to musicians and producers who are excited about new tools, new sounds, and assistants that automate boring parts of the workflow.

But when the problem is framed as "music composition", it just leaves me scratching my head. Like, who's clamoring for that? I'm unaware in the history of music any automatically generated music that isn't seen as an oddity. Even if techniques improve, it's not a really sexy sell. People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand.

I understand that you can move the goalposts and say: "This isn't about total AI composition, it's about co-composition!" But honestly, I think it's just framing the problem wrong to talk about composition, and its lead to some really strange solution-in-search-of-a-problem research agendas. People should thinking about it through the lens of: How do you use AI to create tools that musicians want?

Other than for research purposes (extending the capabilities of AI) — I 100% agree with your sentiment regarding the relative worthlessness of AI composition.

As an aside, it is my understanding that there is no measure of something like consonance/dissonance in complex musical forms. There are models of dissonance in 2 and 3 note chords but even these have gaps. I'm suggesting that the research on "what sounds good to people" is surprisingly immature in contemporary science. This is surprising because that question played a major role in the history of science. For instance, many of the very first experiments conducted at the Royal Society (c1660s) investigated harmony — and arguably the first scientific experiment was designed to evaluate a mathematical model of music (Viz, the 5th century BC Pythagoreans demonstrating their integer ratio theory of harmony by casting bronze chimes at those ratios).

So, I'm surprised that there isn't more interest today. That said, there was a recent breakthrough in the science of harmony: an integrated model of vertical and horizontal harmony. https://spj.sciencemag.org/journals/research/2019/2369041/#:....

There are measures, and they're not particularly complex. But consonance/dissonance and "sounds good" are not the point.

Music is evocative and social, and it's the semantics of both that are under-researched.

You won't get a grasp of either by throwing a corpus into a bucket and fishing things out of it with statistics.

> You won't get a grasp of either by throwing a corpus into a bucket and fishing things out of it with statistics.

We can agree to disagree on that. Or, in any case, it's an empirical question. If we had a large corpus of music labeled by annotated "feels", I think we'd learn an immense amount about how music evokes feelings. (I'm not sure I feel comfortable with the term "semantics" applied to music)

In the meantime, I've been building a corpus of the EEG response to music: https://ui.adsabs.harvard.edu/abs/2020arXiv200908793S/abstra...

Regarding the measures of dissonance, I'm only familiar with things like measures of roughness and harmonic entropy. If you know more, I'd appreciate sharing.

We don't have a large corpus of music labelled "feels" because "feels" and "evocations" are not atomic objects.

It's not obvious they're objects at all.

Debussy's La Mer is an excellent evocation of the sea, but you're not going to learn anything useful by throwing it into a bucket with a sea shanty. Or with Britten's Four Sea Interludes.

The absolute best you'll get from this approach is a list of dismally on-the-nose reified cliches - like the music editor who dubs on some accordion music when a thriller has a scene in Paris.

It's also why concepts like harmonic entropy don't really help. You can't parse "dissonance" scientifically in that way, because the measure isn't the amount of dissonance in a chord on some arbitrary scale, even if that measure happens to have multiple dimensions.

It's how the dissonance is used in the context in which it appears. There are certainly loose mappings to information density - too little is bad, too much is also bad - but it's not a very well explored area, and composers work inside it intuitively.

So there is no fitness/winning function you can train your dataset on. Superficial similarity to a corpus is exactly that, and misses the point of what composition is.

Music has invariant temporal forms that reliably communicate feelings, based on the context. A musical cadence versus none, a change in rhythm, lingering on a note... The common nature of these forms lends themselves to common feelings about them. When two people are open to music, with a similar experience, they roughly feel the same thing. Perhaps not exactly, but music as a technology for exchanging non-verbal experiences, feels, is surprisingly consistent — why does film music have such a common effect on the emotional vibe of the scene?

In the future, if we could gather and annotate people's feelings in response to musical forms (including but far more than consonance and dissonance), I'm sure this would enable an AI-based model of the emotional resonances of various musical elements (and their multi-level representations in the neural network). Then, compositional models could be trained using real-time aesthetic rating devices (e.g., reporting on pleasure/uncomfortabble and interestingness/boringness).

Now, this system would hypothetically be able to manipulate emotions, at least to the extent that a composer can now.

Is that useful in anything but a creepy way? Well, maybe you could add filters to existing compositions to change their vibe... like, the "humanize" button in logic pro gives a looser feel. You might be able to apply filters that could make a song feel more longing or hopeful.


James Tenney has a book on this - "history of consonance and dissonance"

Great! I don't know that one—AND it is on LibGen. Looks good so far (1988).

I think apart from the layman's resistance to the idea of nonhumans being creative, the other problem is that the basic theories of composition musicians use are pretty well understood and based on applying some surprisingly simple maths to some instruments whose timbres are very complex (and "what works" is a moving target bound to cultural familiarity with instruments, intonation, musical phrasing and comfort with dissonance; not an AlphaGoZero situation where the ML process can expose the wealth of existing human theory is objectively inferior at achieving a win condition.)

Sure, a Deep Learning process with a suitably curated dataset can rediscover the basic principles of Western music theory... but why? Humans understood these things when creating the music that makes up the corpus, and non-AI software can already turn those patterns into musical structures which can be played by accurate synthesis, which only lacks a bit of nuance.

The human musician looking for AI-powered shortcuts doesn't want a lot of computation thrown at learning how to approximate a 12 bar blues or arpeggiate chord tones, they want it thrown at a post-production process (or better still, a pedal) that makes their mechanical playing of simple licks approximate BB King or an autotune that makes their average voice in a lower pitch sound like Whitney Houston.

>I think apart from the layman's resistance to the idea of nonhumans being creative,

What I think is funny is the resistance to the idea that a lot of what people do isn't all that special.

It all makes me think of the evolution of supermarket cashiers. It used to be a fairly skilled profession, now people are just used because of their abilities in object manipulation.

I disagree that there isn't already generative music that is enjoyed just for its aesthetic qualities. I'm a huge fan of Brian Eno's work, which is highly generative (and he's even published apps that will simply generate ambient music for you 24/7).

"Since I have always preferred making plans to executing them, I have gravitated towards situations and systems that, once set into operation, could create music with little or no intervention on my part. That is to say, I tend towards the roles of planner and programmer, and then become an audience to the results" - Brian Eno

I think generative music is particularly effective as ambient background or as part of a larger art installation. Also, if you're working on developing new synthesizers I'd highly recommend you get to know Eno's work.

As a musician, I'd love automated composition of human level music. Especially if you could navigate and blend and transform the space of styles, artists, and instruments. Co-composition has a place.

The reason such tools have historically been viewed as oddities is because they were and are objectively bad. 3 year old prodigies can exceed even the best output of software. In recent years, the models have improved, but the problem space is huge and the algorithms aren't human level yet.

When the tools reach human performance levels, people won't care if it's human or software. They'll only care if it's enjoyable.

All creative endeavors are at risk of being automated by ai in the next few decades. Software will exceed the best of human creativity, and that's a good thing.

No, it isn't. It'll make people stop making music. Even if all you care about is one particular way to evaluate the end product, it means there will be no new inputs, resulting in the death of creativity. And once that's been reached, it's almost impossible to revive.

People will most definitely not stop making music, if only because the physical act of playing an instrument can never be replaced by software.

Agreed. Moreover, if a genuinely sentient AI or MI as Minsky called it, comes to pass then it would then be fascinating to listen to it if it cared to compose music; in its absence musical composition is always the result of someone's human experience of being in this universe - necessarily as a human organism. And this is true even if the music is a research exploration of evocative (to a human of course) sound patterns.

Very True! I play guitar but i’ll never be a Tommy Emmanuel. I don’t care, I play for the joy of it :)

> if only because the physical act of playing an instrument can never be replaced by software.

As an instrumentalist, I would love to believe this but more and more this does not appear to be the case.

I have talked to many people of different ages who really don't see any difference between someone actually playing an instrument in front of you and listening to pre-recorded music.

It's funny because it's in the context of me liking to play live music, and they never seem to realize they're saying, "What you do is meaningless."

I think it's that they have no personal experience with playing an instrument, combined with the fact that the dominant forms of pop music now don't have much if any place for instrumentalists.

Nobody is talking about playing music. And physical playing has been replaced by software quite a while ago. Not if you play it for your own fun, but almost nobody can distinguish a computer rendering from the real thing on a recording.

Computers can already generate convincing telephone calls and news articles. Yet people haven't stopped answering the phone or believing the news. /s

Actually, I have stopped answering the phone.

In my view there's already a sufficient quantity of perfect classical music, that for most people, they could listen endlessly, and it would always seem new to them. Especially if hearing some familiar material once in a while is part of the enjoyment.

Yet musicians and audiences got violently sick of perfect classical music, and started creating "bad" music, just for the experience of trying something new. And this has happened over and over in music and other art forms.

I won't say computers won't ever be able to do that, but it suggests a level of AI beyond simply curve fitting the existing sequences of notes in recorded or written tunes. At the same time, what music evolves into with the use of AI may open up new areas for exploration of new human music.

As a kid, I played music that my mom didn't recognize as music. Eventually the AI will be motivated to create music that humans don't like, but that the computers prefer. That will be the singularity. ;-)

This is cynical. The existence of skilled grandmasters and unattainably higher skilled computers hasn't killed chess.

Chess is a great analogy. Now, because of great computers, folks can choose to play against a “master” chess player whenever they like, whether they have an internet connection and a proper rank or not.

Being able to jam with an AI group of your favorite jazz bests seems like a great stand-in for when you can’t get a real trio or quartet together, and if sufficiently good at attending to the ideas of the “live player”, would probably “raise all boats,” making better composers overall.

I agree that what you propose would be better than nothing, but it will still be lacking.

Assuming the software can get there, we also need to add more sensors. The head nods, eye contact, subtle facial gestures, and other body language that are such an important part of a collaborative jazz ensemble will have to be sensed. However, even if the computer can be enhanced with sensors, now you've got to communicate from the computer(s) back to the other players. So, either advanced robotics, or ensembles adapt other signaling schemes that the computer player(s) can engage. It is not a simple problem even if AI can made to be "creative" and "musical," that is not the whole story.

I don’t think “equal or better than human” is the bar we’re trying to beat.

I was more responding to both the claim that “AI will kill human composition” (it won’t) and “talking about computer-based music composition is framing the question wrong, since it doesn’t provide value” (it does)

> This is cynical.

Just because it doesn't subscribe to the utopic "computers will create the best music ever" fantasy?

> computers hasn't killed chess

Because there's very little monetary value in seeing computers play chess. That's not the case in music: the entire infrastructure, from tools to conservatoires, comes from selling music. If that chain crumbles, music will be dead.

The infrastructure isn't about the music, though. It's about people, whether they are pop stars, ferociously good classical virtuosi, temperamental maestros, introverted but sympathetic innovators, or even just amateur bands/ensembles that enjoy the social aspects of making music.

That's where the big money is, not elevator music generated by computers.

The big money is where mass audiences are. Music's infrastructure is for a large part dependent on it. It encompasses everything from instrument manufacturers to piano teachers to concert organizers to masterclasses. If --as assumed the post I responded to-- software starts producing better music than people, or does so cheaper, or gets more clicks, etc., the infrastructure will disappear. It'll take a few decades, but that can kill off teaching and affordable instruments, taking everything except amateur music for the wealthy with it.

>> People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand. I understand that you can move the goalposts

With new tech, especially automation, moving goalposts isn't cheating. If successful, it almost always gets used in totally different ways than the human equivalent.

A common pattern is imagining automation through "robots," like in the jetsons. An automated hotel has a robot desk clerk, robot maid, etc. IRL, an automated hotel may have no check in, rooms designed to be self cleaning and guests must make their own bed. Maybe it's not even a hotel anymore, but a room or pod here and there. Robotic maids act as a stand in, for things that are hard to imagine.

>> AI to create tools that musicians want?

So... this is a good compass for advanced stages, "productizing." Describe your goals in terms of what it can do for users. For earlier, research oriented stages, the more abstract definition is good. Composition works for that:

Music composition (or generation) is the process of creating or writing a new piece of music, defined as a succession of pitches or rhythms, or both, in some definite patterns.

How that is ultimately used/sold/productized is certainly up for debate, but that definition sends you on a path that's different from "tools for musicians."

I made https://stevenwaterman.uk/musetree which is a custom frontend for OpenAI's MuseNet. I've used that to generate music that I use on my Twitch streams, games, and other things. I'm not kidding myself that it's as good as human-written music, nor is it truly ai-written music (as there's a lot of human selection in the process), but people frequently comment on the music and enjoy it, without realising it's AI-written

Some examples:



* Edited to fix the link, my bad

That site is not working for me. Port 443 isn't open.

No one is clamoring for it because it isn't good enough yet, same reason no one is asking for AI designed houses, or business logos, or food. But that doesn't mean it can't get there eventually. It's also a very interesting scientific question, wrapped up with big questions about the nature of creativity and the capabilities of machines.

> Like, who's clamoring for that?

> even if AI music were perfect

I remember Milli Vanilli (who I quite liked, to be honest), and a thousand manufactured bands since who at least make enough of an effort not to get caught out like that again. If that part of the industry (which seems like a large chunk) could get rid of the inconvenience of having to write/find good songs, I expect they would, and get back to exploiting pretty young things with not enough talent.

In short, those who value money ahead of music. On the other hand, if it improved the rubbish in the pop chart then I might embrace it!

As a music-theory-unsophisticate, I foresee a lot of potential for deep learning in creating musical arrangements for existing compositions using all kinds of available instruments. How might a Beethoven symphony be arranged for a typical 4 part rock band? What if you added a piano? What other instruments might add further value, esp. toward certain ends, like when learning to play an instrument? Could a deep net learn to accompany adaptively, as only a capable human can today? Could it offer instructional suggestions for improvement, especially subtleties in timing or dynamics?

I also see value in transferring musical style for a piece from one genre to another (let's say, from jazz to rock & roll). Or from one set of instruments (or voices) to others. If done well, that could provide a lot of potentially entertaining interpretations or mashups, or at worst some novel medleys. Sinatra raps. Nat King Cole does bluegrass.

Because examples of uses like these aren't likely to be plentiful, I suspect these won't be so much supervised learning tasks as reinforcement learning (RL) tasks. The trick then will be to somehow construct useful (and insightful) reward functions for each new use -- basically aesthetic meta-learning for RL.

If RL reward functions can learn to write winning advertising jingles, it's hard to imagine what creative doors they won't open, eventually.

> People simply want to listen to music created by people

I don't think people care who composed the music, I believe people want to see music played and sang by people. Going to a concert where you're just looking at a computer generating sound won't inspire the croud, but if the band plays something composed by an AI, who cares? Even nowadays music has multiple collaborators on a composition, and I almost never pay attention to who they all are.

Well I agree the term "music composition" is not marketable, I disagree on all other points. Actually I think the only thing pulling for AI is the idea that people would prefer to listen to something generated by a computer. People no longer view human sources as trustworthy, they no longer experience natural things as authentic. They want computer-certified facts, machine-performed actions.

You don't even need an AI to make the kind of music people like. You just need to dumb down the music--which is what has happened gradually since the gilded age. Now the charts are topped with monotone "robot" melodies and little to no chord structure, synths using standard basic wave forms, all rhythms quantized to a grid and all voices auto-tuned. You don't need an AI to generate this type of music by machine; ping pong balls in a tumbler would suffice.

The role of AI, then, is not that you need it to make the music, but you need an AI to claim authenticity. In the minds of today's people, the computer-made music must be better, must be smarter, than human-made music.

The band Yacht used AI to write an album called Chain Tripping that's really good IMO. They transcribed their old music to MIDI, fed it to an AI, had it generate short snippets of new MIDI, and then cut and paste those bits to get a score, which they performed.

I suspect the drums and bass are human-written, as they seem more coherent and "whole song-aware" than the other parts, but I have no way to confirm that.


I believe one of the problems is that we don't know what the goalposts are. People experience music differently, ranging from barely tolerating it, to pursuing it as an academic study, to making massive sacrifices in order to be immersed in it. The conscious lifestyle choice of many full time musicians is bewildering to many techies. This suggests we don't know the "customer" or the "market" at all, or at the very least, that they are fragmented.

I can imagine a future where AI can come up with new stylistic concepts, maybe even "predicting" how some music genre will evolves. which is certainly something I'd be interested in seeing explored. Imagine an AI that was able to come up with something remotely close to rock music after being fed 1930s-1940s blues music. It'd be pretty cool to see what it came up with after being fed music from the 2000-2010s.

It might be centuries before AI is writing good novels, but music seems a much easier nut to crack. Hell I've seen people sit down at a drum machine and make something enjoyable for minutes in literally seconds. Music doesn't have to be complicated, let alone deeply meaningful, to sound good.

AI already makes cool art -- e.g. the Google Deep Dream videos.

The same was said about synthesised music in the 40s and early 50s. It was mostly a lab curiosity.

Now look at it.

In some ways you're not wrong. But only in some ways.

>who's clamoring for that?

One use I can see is assist software for someone churning out commercial music. Maybe a standalone music generator (composition plus synth) would be useful for content generators avoiding copyright.

As both copyright violation policing gets easier and heavier and programatically generated music becomes easier I have to wonder how those two things will collide.

Current techniques are pretty successful at improvisation or 'noodling', not quite what most folks would call "proper" composition but a good inspiration for it. The paper is not very comprehensive, there's plenty of interesting stuff that it doesn't mention.

I don't know, I'd pay for something that gave me music I loved, 100% of the time (or, hell, even 2% of the time). Beats sifting through the infinite trove of music just so I can stumble upon something I like by chance. My success rate there is more like 0.1%.

> People simply want to listen to music created by people

People also want a chunk of that Bob Dylan/Beatles fame without necessarily having the talent or the luck to compose songs like they did.

Let the AI do the work and then only be there for the fun part.

This is an interesting and informative article but to be a bit meta I'm concerned when I see articles like this on HN because they don't usually do a great job of summarising the work that has been done in a certain area before the advent of deep neural nets. This is important because very often, especially when it comes to generative art, the standard approaches used before deep neural nets could do thins that modern deep neural nets cannot do, in particular when it comes to structured generation.

For example, alorithmic music is the subject area of generating music with algorithmic approaches, not necessarily using a computer. The wikipedia page seems to be a bit poor in detail but it lists a number of different approaches most of which are not machine larning approaches:


I'm by no means an expert but that's the point. When a non-expert reads an article like the one above, I fear they may get an impression that neural nets are the first approach ever to generate music, or that they are the best approach ever to generate music, or anyway some kind of misunderstanding that is natural to draw from incomplete information.

The thing to try and keep in mind is that computer scientists, and other scientists and creative people, had been able to do amazing things with the tools they had in their disposal long before the advent of deep neural nets. And that there are many such tools that are not deep neural nets. Somehow these amazing things flew under the radar of technies - until deep neural nets came along and suddendly everyone is amazed that "wow, neural nets can do X!". Well, what else can do X? That's something worth trying to find out.

are you referring to the historic pre-classical period prior to 0 BIM (Before ImageNet)

I'm talking, for example, about the difference between images of plants generated by L-systems [1] and images of plants generated by GANs [2].


[1] https://en.wikipedia.org/wiki/L-system

[2] https://www.easyzoom.com/imageaccess/2128f27845ed4921b314300...

Decent review! I've been fascinated by this topic for a while. I think the real magic is of course in the specifications of the details, but I do think it's innevitable that AI-generated (or assisted) music will come to dominate the field as it democratizes what is currently a more specialized skill set.

I've always found the resistance to the idea of AI-generated music a bit odd, but I think it stems from a more philosophical idea about where beauty comes from. For example, it's tempting to imagine that the beauty of a Beethoven symphony comes from Beethoven himself, thus his music facilitates an intense personal emotional connection across time and space. The idea of AI music challenges this notion a bit; if AI "composed" something beautiful, and its decisions were not founded on emotions, where is the beauty coming from? Of course, I'd say the beauty is in the natural phenomenon of music itself; the beauty is in the human brain's ability to have a sense of "music" at all. In this way, a connection to Beethoven through his music is a shared recognition of the beauty of certain natural musical phenomenon. The beauty did not come from Beethoven, rather it was "captured" and shared by Beethoven.

As far as the business side goes, I've definitely found that there's interest in at least AI-assisted music composition, both from a melody-generating app I used to sell some years ago (limited as it was), and just collecting emails for a new web app I'm working on at https://www.tunesage.com/ ... I know there are already other services out there too for AI music, and I expect the space will continue to grow. I think it's gonna be awesome. :)

Im afraid that AI will overwhelm us with perfect amazing music (and other creations) that we will come to a familiar situation that Alan Watts describes : "And you would, naturally as you began on this adventure of dreams, you would fulfill all your wishes. You would have every kind of pleasure you could conceive."

And then in this world overwhelmed with perfection we will fall back to that untraceable aspect of human creation that keeps us in awe. Of course we will be much more comfortable in that advanced world in a sense of fulfilled basic needs, but self actualisation (from the pyramid of needs) i guess will never be achieved by AI :)

The beauty in human creation doesn't rest entirely in the artefact. A large part of it rests in the knowledge (of the beneficiary) that another skilled human has dedicated time, effort, and intelligence to uniquely produce what is in front of them.

It doesn't stop at music. You see it everywhere: fashion (hand-made), food (famous chefs), music (live performances), coffee (moka pots), sex (orgasm stimulated by other humans), etc. It isn't a pursuance of the best (or perfection), but of what has more soul in it. And unfortunately AI music lacks any soul.

> A large part of it rests in the knowledge (of the beneficiary) that another skilled human has dedicated time, effort, and intelligence to uniquely produce what is in front of them.

True, but I think this is subjective guessing rather than "knowledge". After all, if I don't find a piece of art beautiful or even interesting, I don't care how much time and effort and intelligence went into it (at least in so far as an assessment of beauty goes); as an artist, I can only wish that such things were a guarantee of beauty.

Subjective guessing could be converted into knowledge by information sharing. I believe one could be progressively persuaded to (1) sympathize with the artist, then (2) sympathize with the artefact, and maybe (3) come to cherish it. Depending on how the information is revealed, & in the hope that it is meaningful to the customer. Restaurants have long known & used this procedure—they include source of material and cooking method in the name of the food. You may have no affection for the pasta in front of you but maybe sharing the "home made" information with you could elicit a little more excitement.

I don't think AI music necessarily lacks soul, if it is used as a tool by a human. Just the same as a photograph can carry soul, not just a painting.

Good point. I don't have an immediate response, and making a "degree of soul" argument sounds lazy to me. I'll ruminate over this for a while along the lines of what role is played by the tool, and within what range of excellence it immediately throws any piece of work.

>I've always found the resistance to the idea of AI-generated music a bit odd

There was a section in 1984 about how the proles are fed machine-generated music. I think a bit of the resistance comes from this.

It all comes down to generating tokens. The dominant paradigm is to compose the music by generating the next token based on the last few seen tokens, then take that and generate the next token, and so on. No one has been able to significantly improve upon that. All development in the last ten years or so has been about creating ever larger models with more data so that emitted sequences will stay coherent for longer. The problem with this paradigm is that it works amazingly well for short sequences, but often fails to stay coherent for longer sequences. So composing interesting ten seconds long instrumental music is a solved problem - composing full songs three-to-five minutes long is far beyond the reach of current state-of-the-art.

Text-generating models suffer from the same problem. E.g. GPT-3 will generate two to three-hundred tokens or so and then it will begin to spout nonsense. That it still works so well is because it's trained with a massive corpus and because its tokens are richer. It operates on the world level, while most music composition models operate on the note level which are more analogous to characters than words. If someone were to be able to figure out how to represent multi-instrumental compositions using "music vectors" (analogous to word vectors) that would probably lead to a major breakthrough.

By 'token' do you mean a note, a chord, general structure (verse, bridge, etc.), a well known phrase?

Music is so full of cliche I don't doubt that someone will write software (probably already has) to assist a composer, generate drum parts or harmony, etc.

I like the idea of a Rhythm changes generator. With enough monkeys the Flintstone Theme eventually pops out.

A token can be almost anything as long as a sequence of tokens represent something meaningful. For music, I and many others have represented music as sequences of notes interleaved with rests of varying length. E.g "A--3 R3 C#-2" Three tokens to be interpreted as "play an A note in the 3rd octave at tic 0, wait 3 tics, play a C# note in the 2nd octave at tic 4". As you can imagine, the number of tokens grow very fast especially for polyphonic music.

You might say that that level of detail is unnecessary for most music. For a 12-bar blues in E you might only need five tokens, "blues 12 bars key E", to represent the whole song. But what if you want to add some cool pentatonic riffs to it and maybe a key change in the middle? Either you add token types for "key change in the middle" and "cool pentatonic riffs" or you will need many more tokens to represent your song.

One can think of it as compression. English text is trivial to compress because you just match each word to a word index in a lookup table. Compositions are way harder to compress.

An important reason why it is so hard to compress them is that errors frequently don't really matter in the same way they do with language. You can really play a melody with many, many variations and it is still (to a point) the same melody. This is much harder to do with language, and has important implications when thinking about what a composition "actually is".

Probably these would be harmonic units. Unfortunately notes aren't just characters in this. There's also voice leading, and so harmony "words" run into each other and can't be mutually exclusively tokenized. The better analogy might be poetry, and I'm not sure if there are good poetry generators, either.

> It all comes down to generating tokens.

So counterpoint is just out the window, is it?

Counterpoint is a special case of multiple voices and is handled by interleaving. Let A_i be what instrument A does at time t and B_i what instrument B does at the same time. Then with five tokens look back: f(A_1 B_1 A_2 B_2 A_3) = B_3 and f(B_1 A_2 B_2 A_3 B_3) = A_4 and so on. There are variations to this theme. For example, you could generate the previous token at each time step instead of the next one but, essentially, it is the same idea.

Counterpoint is just a different way to answer the question "what note should follow the one I just played". There's nothing particularly special about it from an ML perspective.

A key question imho is this: will algorithmic music composition ever produce anything else than more or less sophisticated musac, effectively recycling a particular musical corpus of a particular music tradition? Producing a memorable, emotionally moving piece is not something that is generally reducible to prescription. If the DL based pattern matching is sufficiently intelligent and produces something worthwhile, will it actually feel different to an existing piece?

Music perception and appreciation is a deeply human feature (is there a "purpose" to it or is it actually a piece of redundant brain code? https://brianjump.net/2020/11/02/why-does-music-exist/) that has both a biological and cultural basis. There is a feedback loop of contemporary musical sounds a young brain is exposed to and the perception machinery and associated innate pleasurable responses.

Think of the countless cultural innovations that underpin human music: tunings, scales, rhythms, genres and the deep link to voice and song. None of those has any (ex-ante) algorithmic feel (to the chagrin of Pythagoreans and other ex-post numerologists). Coming up with music feels literally like plucking beautiful sounds from thin air.

In any case its a fascinating research domain...

> A key question imho is this: will algorithmic music composition ever produce anything else than more or less sophisticated musac, effectively recycling a particular musical corpus of a particular music tradition?

A related question is: how many human composers manage do this? It seems that we are fairly OK with a large body of composers "effectively recycling a particular musical corpus".

There is indeed a lot of "filler" type production and even the most talented creators are far from always producing unique pieces that touch audiences (let alone endure).

But this misses the point: some creators do occasionally hit the magic sweet spot and invent something that makes the rest worthwhile. Heck, some creators invent new genres.

A DL type algorithm will not produce a new "classic" except by a "thousand monkeys hitting keyboards" type accident... at least not until we dig far deeper into what makes some music stand-out. And an algorithm as conventionally defined is unlikely to ever produce a new musical style.

The true opportunity (as always with algorithms) is the augmentation of creative talent.

I'm not convinced (though I'm also not insisting that you're wrong).

Studies of the human creative process that I've read don't provide a lot of support for the idea that there is a magic step here. You probably know the quote: "good artists borrow, great artists steal".

The best definition I've seen for the success of a piece of music is this: "What emotion is the artist trying to convey and how well does it convey it?"

Throughout the composing, arranging, recording, mixing, and mastering process, there are thousands of choices to be made, and the correctness of each choice is entirely linked back to that goal: Does the choice help to convey the emotion, or does it detract from it?

To that end, there is no correct choice, no correct or optimal harmony, no correct note, no correct rhythm, no correct timbre. It's all contextual in relation to conveying the desired emotion.

I'm really not sure how you could ever train a NN to make choices in that regard without first trying to teach them how to understand the impacts of their choices on the emotions conveyed.

At best, you may be able to train a NN to reproduce emotionally-void works in a particular style, and perhaps assign some emotion through the timbres selected (ambient music comes to mind here). Still, this isn't much of an achievement. You could easily codify the rules taught in Music 101 about harmonization and melody composition to a computer and have it spit out bland but pleasant excerpts, no deep learning required.

I’ve been working on an IDE for music composition http://ngrid.io. Launching soon.

The article starts really badly with "Music is generally defined as a succession of pitches or rhythms, or both, in some definite patterns."

The big three in music are pitch, rhythm and TIMBRE, guys, TIMBRE (the "sound" or "tone" of the piece).

Why is Beethoven's Ode To Joy somewhat annoying on the recorder, and can drive a room of hardened Germans to tears when played by a top-notch symphony orchestra? Timbre!

In the construction of classical musical pieces, this is controlled with orchestration. It's why a Hans Zimmermann score sounds punchy or etheriel or whatever is called for by the script.

I basically stopped there, because, well, if you don't care how your music _sounds_ it isn't going to sound very good.

I personally disagree. Timbre will give the music a certain quality, and while it's no doubt a big part of the experience, I think the dynamics of piece will be much more central to the emotional response.

If a song has a sax solo that really touches your heart, and you replaced it with a trombone playing the exact same solo instead, the timbre would be completely different, but the emotions that it would evoke would likely be almost identical. The timbre would mainly affect how well one or the other might blend with the rest of the orchestration. If someone played Ode To Joy on a recorder with proper dynamics, I can guarantee they could make it sound good. All my personal opinion of course.

> The big three in music are pitch, rhythm and TIMBRE,

The big three are normally described as melody, harmony and rhythm, and yes, it should be four because timbre is clearly central to the construction and experience of music.

The paper begins with David Cope. He evidently has a Youtube channel with algorithmic music.


David Cope has written several books and journal publications about his automated composition methods, some examples below:



I wonder why they don't mention the problem of the audio quality of the output. As far as I know the best models work on magnitude spectrograms and have issues with recreating the phase information. Sub-par algorithms like Griffin-Lim are used instead

The review is about composition in the score domain, not synthesis in the audio domain.

Check Differential Digital Signal Processing for state-of-the-art data driven synthesis in th audio domain.

It's a very poor "score" domain with no timbral information at all.

See my comment elsewhere: https://news.ycombinator.com/item?id=28355582

I was talking about musical scores. They shouldn't have any timbral information.

It will only be a matter of time until the AI learns that what sounds pleasing to most humans is often some form of, or extract from, Pachelbel's Canon in D.

It is sickening how often I hear it

ever since I saw the VAE learn to synthesize simple tunes followed by listening to the Hello World by Skygge, I contemplated about the inflection point of the music industry [1][2]. At first some of the music would be a close collaboration of a machine and an artist, but at some point, I wonder if we’d be primarily listening to tunes generated to our specific taste similarly to spotify’s discover weekly. Though it would be all be much closer to what we want to listen to (exploit & explore) at a given time.



I'm not particularly familiar with music creation, but it always seemed to me neural nets would be better at handling music production more than composition. That is, changing the sound to be, say, a trent reznor style output given the original.

Does that make any sense?

As someone with a fair amount of experience in both domains, I'd agree with this.

I see a lot more utility in sculpting sound and helping streamline the production side of things than the uses of NNs on the composition side.

There already are tools for automatically composing melodies and harmonies, and some of them are actually quite good, but they're hardly used.

Automated music (art) generation + NFTs = new printing money aka new trading bubble?

I am eager to be able to choose a classical music piece (e.g. Bach Cello Suite #1 in G) and ask an AI to generate an infinite sequence very close to its style but variating slightly.

Not a single mention of Jukebox.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact