We've spoken to musicians and producers who are excited about new tools, new sounds, and assistants that automate boring parts of the workflow.
But when the problem is framed as "music composition", it just leaves me scratching my head. Like, who's clamoring for that? I'm unaware in the history of music any automatically generated music that isn't seen as an oddity. Even if techniques improve, it's not a really sexy sell. People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand.
I understand that you can move the goalposts and say: "This isn't about total AI composition, it's about co-composition!" But honestly, I think it's just framing the problem wrong to talk about composition, and its lead to some really strange solution-in-search-of-a-problem research agendas. People should thinking about it through the lens of: How do you use AI to create tools that musicians want?
As an aside, it is my understanding that there is no measure of something like consonance/dissonance in complex musical forms. There are models of dissonance in 2 and 3 note chords but even these have gaps. I'm suggesting that the research on "what sounds good to people" is surprisingly immature in contemporary science. This is surprising because that question played a major role in the history of science. For instance, many of the very first experiments conducted at the Royal Society (c1660s) investigated harmony — and arguably the first scientific experiment was designed to evaluate a mathematical model of music (Viz, the 5th century BC Pythagoreans demonstrating their integer ratio theory of harmony by casting bronze chimes at those ratios).
So, I'm surprised that there isn't more interest today. That said, there was a recent breakthrough in the science of harmony: an integrated model of vertical and horizontal harmony.
Music is evocative and social, and it's the semantics of both that are under-researched.
You won't get a grasp of either by throwing a corpus into a bucket and fishing things out of it with statistics.
We can agree to disagree on that. Or, in any case, it's an empirical question. If we had a large corpus of music labeled by annotated "feels", I think we'd learn an immense amount about how music evokes feelings. (I'm not sure I feel comfortable with the term "semantics" applied to music)
In the meantime, I've been building a corpus of the EEG response to music:
Regarding the measures of dissonance, I'm only familiar with things like measures of roughness and harmonic entropy. If you know more, I'd appreciate sharing.
It's not obvious they're objects at all.
Debussy's La Mer is an excellent evocation of the sea, but you're not going to learn anything useful by throwing it into a bucket with a sea shanty. Or with Britten's Four Sea Interludes.
The absolute best you'll get from this approach is a list of dismally on-the-nose reified cliches - like the music editor who dubs on some accordion music when a thriller has a scene in Paris.
It's also why concepts like harmonic entropy don't really help. You can't parse "dissonance" scientifically in that way, because the measure isn't the amount of dissonance in a chord on some arbitrary scale, even if that measure happens to have multiple dimensions.
It's how the dissonance is used in the context in which it appears. There are certainly loose mappings to information density - too little is bad, too much is also bad - but it's not a very well explored area, and composers work inside it intuitively.
So there is no fitness/winning function you can train your dataset on. Superficial similarity to a corpus is exactly that, and misses the point of what composition is.
In the future, if we could gather and annotate people's feelings in response to musical forms (including but far more than consonance and dissonance), I'm sure this would enable an AI-based model of the emotional resonances of various musical elements (and their multi-level representations in the neural network). Then, compositional models could be trained using real-time aesthetic rating devices (e.g., reporting on pleasure/uncomfortabble and interestingness/boringness).
Now, this system would hypothetically be able to manipulate emotions, at least to the extent that a composer can now.
Is that useful in anything but a creepy way? Well, maybe you could add filters to existing compositions to change their vibe... like, the "humanize" button in logic pro gives a looser feel. You might be able to apply filters that could make a song feel more longing or hopeful.
Sure, a Deep Learning process with a suitably curated dataset can rediscover the basic principles of Western music theory... but why? Humans understood these things when creating the music that makes up the corpus, and non-AI software can already turn those patterns into musical structures which can be played by accurate synthesis, which only lacks a bit of nuance.
The human musician looking for AI-powered shortcuts doesn't want a lot of computation thrown at learning how to approximate a 12 bar blues or arpeggiate chord tones, they want it thrown at a post-production process (or better still, a pedal) that makes their mechanical playing of simple licks approximate BB King or an autotune that makes their average voice in a lower pitch sound like Whitney Houston.
What I think is funny is the resistance to the idea that a lot of what people do isn't all that special.
It all makes me think of the evolution of supermarket cashiers. It used to be a fairly skilled profession, now people are just used because of their abilities in object manipulation.
"Since I have always preferred making plans to executing them, I have gravitated towards situations and systems that, once set into operation, could create music with little or no intervention on my part. That is to say, I tend towards the roles of planner and programmer, and then become an audience to the results" - Brian Eno
I think generative music is particularly effective as ambient background or as part of a larger art installation. Also, if you're working on developing new synthesizers I'd highly recommend you get to know Eno's work.
The reason such tools have historically been viewed as oddities is because they were and are objectively bad. 3 year old prodigies can exceed even the best output of software. In recent years, the models have improved, but the problem space is huge and the algorithms aren't human level yet.
When the tools reach human performance levels, people won't care if it's human or software. They'll only care if it's enjoyable.
All creative endeavors are at risk of being automated by ai in the next few decades. Software will exceed the best of human creativity, and that's a good thing.
As an instrumentalist, I would love to believe this but more and more this does not appear to be the case.
I have talked to many people of different ages who really don't see any difference between someone actually playing an instrument in front of you and listening to pre-recorded music.
It's funny because it's in the context of me liking to play live music, and they never seem to realize they're saying, "What you do is meaningless."
I think it's that they have no personal experience with playing an instrument, combined with the fact that the dominant forms of pop music now don't have much if any place for instrumentalists.
Actually, I have stopped answering the phone.
In my view there's already a sufficient quantity of perfect classical music, that for most people, they could listen endlessly, and it would always seem new to them. Especially if hearing some familiar material once in a while is part of the enjoyment.
Yet musicians and audiences got violently sick of perfect classical music, and started creating "bad" music, just for the experience of trying something new. And this has happened over and over in music and other art forms.
I won't say computers won't ever be able to do that, but it suggests a level of AI beyond simply curve fitting the existing sequences of notes in recorded or written tunes. At the same time, what music evolves into with the use of AI may open up new areas for exploration of new human music.
As a kid, I played music that my mom didn't recognize as music. Eventually the AI will be motivated to create music that humans don't like, but that the computers prefer. That will be the singularity. ;-)
Being able to jam with an AI group of your favorite jazz bests seems like a great stand-in for when you can’t get a real trio or quartet together, and if sufficiently good at attending to the ideas of the “live player”, would probably “raise all boats,” making better composers overall.
Assuming the software can get there, we also need to add more sensors. The head nods, eye contact, subtle facial gestures, and other body language that are such an important part of a collaborative jazz ensemble will have to be sensed. However, even if the computer can be enhanced with sensors, now you've got to communicate from the computer(s) back to the other players. So, either advanced robotics, or ensembles adapt other signaling schemes that the computer player(s) can engage. It is not a simple problem even if AI can made to be "creative" and "musical," that is not the whole story.
I was more responding to both the claim that “AI will kill human composition” (it won’t) and “talking about computer-based music composition is framing the question wrong, since it doesn’t provide value” (it does)
Just because it doesn't subscribe to the utopic "computers will create the best music ever" fantasy?
> computers hasn't killed chess
Because there's very little monetary value in seeing computers play chess. That's not the case in music: the entire infrastructure, from tools to conservatoires, comes from selling music. If that chain crumbles, music will be dead.
That's where the big money is, not elevator music generated by computers.
With new tech, especially automation, moving goalposts isn't cheating. If successful, it almost always gets used in totally different ways than the human equivalent.
A common pattern is imagining automation through "robots," like in the jetsons. An automated hotel has a robot desk clerk, robot maid, etc. IRL, an automated hotel may have no check in, rooms designed to be self cleaning and guests must make their own bed. Maybe it's not even a hotel anymore, but a room or pod here and there. Robotic maids act as a stand in, for things that are hard to imagine.
>> AI to create tools that musicians want?
So... this is a good compass for advanced stages, "productizing." Describe your goals in terms of what it can do for users. For earlier, research oriented stages, the more abstract definition is good. Composition works for that:
Music composition (or generation) is the process of creating or writing a new piece of music, defined as a succession of pitches or rhythms, or both, in some definite patterns.
How that is ultimately used/sold/productized is certainly up for debate, but that definition sends you on a path that's different from "tools for musicians."
* Edited to fix the link, my bad
> even if AI music were perfect
I remember Milli Vanilli (who I quite liked, to be honest), and a thousand manufactured bands since who at least make enough of an effort not to get caught out like that again. If that part of the industry (which seems like a large chunk) could get rid of the inconvenience of having to write/find good songs, I expect they would, and get back to exploiting pretty young things with not enough talent.
In short, those who value money ahead of music. On the other hand, if it improved the rubbish in the pop chart then I might embrace it!
I also see value in transferring musical style for a piece from one genre to another (let's say, from jazz to rock & roll). Or from one set of instruments (or voices) to others. If done well, that could provide a lot of potentially entertaining interpretations or mashups, or at worst some novel medleys. Sinatra raps. Nat King Cole does bluegrass.
Because examples of uses like these aren't likely to be plentiful, I suspect these won't be so much supervised learning tasks as reinforcement learning (RL) tasks. The trick then will be to somehow construct useful (and insightful) reward functions for each new use -- basically aesthetic meta-learning for RL.
If RL reward functions can learn to write winning advertising jingles, it's hard to imagine what creative doors they won't open, eventually.
I don't think people care who composed the music, I believe people want to see music played and sang by people. Going to a concert where you're just looking at a computer generating sound won't inspire the croud, but if the band plays something composed by an AI, who cares? Even nowadays music has multiple collaborators on a composition, and I almost never pay attention to who they all are.
You don't even need an AI to make the kind of music people like. You just need to dumb down the music--which is what has happened gradually since the gilded age. Now the charts are topped with monotone "robot" melodies and little to no chord structure, synths using standard basic wave forms, all rhythms quantized to a grid and all voices auto-tuned. You don't need an AI to generate this type of music by machine; ping pong balls in a tumbler would suffice.
The role of AI, then, is not that you need it to make the music, but you need an AI to claim authenticity. In the minds of today's people, the computer-made music must be better, must be smarter, than human-made music.
I suspect the drums and bass are human-written, as they seem more coherent and "whole song-aware" than the other parts, but I have no way to confirm that.
AI already makes cool art -- e.g. the Google Deep Dream videos.
Now look at it.
In some ways you're not wrong. But only in some ways.
One use I can see is assist software for someone churning out commercial music. Maybe a standalone music generator (composition plus synth) would be useful for content generators avoiding copyright.
As both copyright violation policing gets easier and heavier and programatically generated music becomes easier I have to wonder how those two things will collide.
People also want a chunk of that Bob Dylan/Beatles fame without necessarily having the talent or the luck to compose songs like they did.
Let the AI do the work and then only be there for the fun part.
For example, alorithmic music is the subject area of generating music with
algorithmic approaches, not necessarily using a computer. The wikipedia page
seems to be a bit poor in detail but it lists a number of different approaches
most of which are not machine larning approaches:
I'm by no means an expert but that's the point. When a non-expert reads an
article like the one above, I fear they may get an impression that neural nets
are the first approach ever to generate music, or that they are the best
approach ever to generate music, or anyway some kind of misunderstanding that is
natural to draw from incomplete information.
The thing to try and keep in mind is that computer scientists, and other
scientists and creative people, had been able to do amazing things with the
tools they had in their disposal long before the advent of deep neural nets. And
that there are many such tools that are not deep neural nets. Somehow these
amazing things flew under the radar of technies - until deep neural nets came
along and suddendly everyone is amazed that "wow, neural nets can do X!". Well,
what else can do X? That's something worth trying to find out.
I've always found the resistance to the idea of AI-generated music a bit odd, but I think it stems from a more philosophical idea about where beauty comes from. For example, it's tempting to imagine that the beauty of a Beethoven symphony comes from Beethoven himself, thus his music facilitates an intense personal emotional connection across time and space. The idea of AI music challenges this notion a bit; if AI "composed" something beautiful, and its decisions were not founded on emotions, where is the beauty coming from? Of course, I'd say the beauty is in the natural phenomenon of music itself; the beauty is in the human brain's ability to have a sense of "music" at all. In this way, a connection to Beethoven through his music is a shared recognition of the beauty of certain natural musical phenomenon. The beauty did not come from Beethoven, rather it was "captured" and shared by Beethoven.
As far as the business side goes, I've definitely found that there's interest in at least AI-assisted music composition, both from a melody-generating app I used to sell some years ago (limited as it was), and just collecting emails for a new web app I'm working on at https://www.tunesage.com/ ... I know there are already other services out there too for AI music, and I expect the space will continue to grow. I think it's gonna be awesome. :)
And then in this world overwhelmed with perfection we will fall back to that untraceable aspect of human creation that keeps us in awe. Of course we will be much more comfortable in that advanced world in a sense of fulfilled basic needs, but self actualisation (from the pyramid of needs) i guess will never be achieved by AI :)
It doesn't stop at music. You see it everywhere: fashion (hand-made), food (famous chefs), music (live performances), coffee (moka pots), sex (orgasm stimulated by other humans), etc. It isn't a pursuance of the best (or perfection), but of what has more soul in it. And unfortunately AI music lacks any soul.
True, but I think this is subjective guessing rather than "knowledge". After all, if I don't find a piece of art beautiful or even interesting, I don't care how much time and effort and intelligence went into it (at least in so far as an assessment of beauty goes); as an artist, I can only wish that such things were a guarantee of beauty.
There was a section in 1984 about how the proles are fed machine-generated music. I think a bit of the resistance comes from this.
Text-generating models suffer from the same problem. E.g. GPT-3 will generate two to three-hundred tokens or so and then it will begin to spout nonsense. That it still works so well is because it's trained with a massive corpus and because its tokens are richer. It operates on the world level, while most music composition models operate on the note level which are more analogous to characters than words. If someone were to be able to figure out how to represent multi-instrumental compositions using "music vectors" (analogous to word vectors) that would probably lead to a major breakthrough.
Music is so full of cliche I don't doubt that someone will write software (probably already has) to assist a composer, generate drum parts or harmony, etc.
I like the idea of a Rhythm changes generator. With enough monkeys the Flintstone Theme eventually pops out.
You might say that that level of detail is unnecessary for most music. For a 12-bar blues in E you might only need five tokens, "blues 12 bars key E", to represent the whole song. But what if you want to add some cool pentatonic riffs to it and maybe a key change in the middle? Either you add token types for "key change in the middle" and "cool pentatonic riffs" or you will need many more tokens to represent your song.
One can think of it as compression. English text is trivial to compress because you just match each word to a word index in a lookup table. Compositions are way harder to compress.
So counterpoint is just out the window, is it?
Music perception and appreciation is a deeply human feature (is there a "purpose" to it or is it actually a piece of redundant brain code? https://brianjump.net/2020/11/02/why-does-music-exist/) that has both a biological and cultural basis. There is a feedback loop of contemporary musical sounds a young brain is exposed to and the perception machinery and associated innate pleasurable responses.
Think of the countless cultural innovations that underpin human music: tunings, scales, rhythms, genres and the deep link to voice and song. None of those has any (ex-ante) algorithmic feel (to the chagrin of Pythagoreans and other ex-post numerologists). Coming up with music feels literally like plucking beautiful sounds from thin air.
In any case its a fascinating research domain...
A related question is: how many human composers manage do this? It seems that we are fairly OK with a large body of composers "effectively recycling a particular musical corpus".
But this misses the point: some creators do occasionally hit the magic sweet spot and invent something that makes the rest worthwhile. Heck, some creators invent new genres.
A DL type algorithm will not produce a new "classic" except by a "thousand monkeys hitting keyboards" type accident... at least not until we dig far deeper into what makes some music stand-out. And an algorithm as conventionally defined is unlikely to ever produce a new musical style.
The true opportunity (as always with algorithms) is the augmentation of creative talent.
Studies of the human creative process that I've read don't provide a lot of support for the idea that there is a magic step here. You probably know the quote: "good artists borrow, great artists steal".
Throughout the composing, arranging, recording, mixing, and mastering process, there are thousands of choices to be made, and the correctness of each choice is entirely linked back to that goal: Does the choice help to convey the emotion, or does it detract from it?
To that end, there is no correct choice, no correct or optimal harmony, no correct note, no correct rhythm, no correct timbre. It's all contextual in relation to conveying the desired emotion.
I'm really not sure how you could ever train a NN to make choices in that regard without first trying to teach them how to understand the impacts of their choices on the emotions conveyed.
At best, you may be able to train a NN to reproduce emotionally-void works in a particular style, and perhaps assign some emotion through the timbres selected (ambient music comes to mind here). Still, this isn't much of an achievement. You could easily codify the rules taught in Music 101 about harmonization and melody composition to a computer and have it spit out bland but pleasant excerpts, no deep learning required.
The big three in music are pitch, rhythm and TIMBRE, guys, TIMBRE (the "sound" or "tone" of the piece).
Why is Beethoven's Ode To Joy somewhat annoying on the recorder, and can drive a room of hardened Germans to tears when played by a top-notch symphony orchestra? Timbre!
In the construction of classical musical pieces, this is controlled with orchestration. It's why a Hans Zimmermann score sounds punchy or etheriel or whatever is called for by the script.
I basically stopped there, because, well, if you don't care how your music _sounds_ it isn't going to sound very good.
If a song has a sax solo that really touches your heart, and you replaced it with a trombone playing the exact same solo instead, the timbre would be completely different, but the emotions that it would evoke would likely be almost identical. The timbre would mainly affect how well one or the other might blend with the rest of the orchestration. If someone played Ode To Joy on a recorder with proper dynamics, I can guarantee they could make it sound good. All my personal opinion of course.
The big three are normally described as melody, harmony and rhythm, and yes, it should be four because timbre is clearly central to the construction and experience of music.
Check Differential Digital Signal Processing for state-of-the-art data driven synthesis in th audio domain.
See my comment elsewhere: https://news.ycombinator.com/item?id=28355582
Does that make any sense?
I see a lot more utility in sculpting sound and helping streamline the production side of things than the uses of NNs on the composition side.
There already are tools for automatically composing melodies and harmonies, and some of them are actually quite good, but they're hardly used.