Hacker News new | past | comments | ask | show | jobs | submit login
My approach to automatic musical composition (flujoo.github.io)
224 points by hackoo on Feb 13, 2022 | hide | past | favorite | 39 comments



Most of this writeup is just a reinvention of Schenkerian analysis, and suffers from the same problem, in that you exercise a lot of editorial judgement in deciding which parts are the core/structural ideas and which parts are embellishment. That undermines the whole idea that this is automatic composition, because you are deciding a heck of a lot upfront. https://en.wikipedia.org/wiki/Schenkerian_analysis

The worked example also doesn't follow the stylistic rules that would make it satisfyingly authentic as a variation in the style of Beethoven. To calibrate: if you were to submit it as an exercise for music A-Level in the UK (age 16 pre-university) I don't think you would get a passing mark unless your teacher was feeling particularly generous.

But on the other hand, Classical style is incredibly refined and specific. They could have had much more success producing something following the example of Messiaen, who wrote a specific set of harmonic and rhythmic rules he was going to use for all his compositions which would be relatively easy to encode in a program. ("The technique of my musical language" is the 2-volume book I'm talking about and it's completely amazing btw. He really was an incredibly extraordinary person).

https://www.scribd.com/document/355450046/Messiaen-Olivier-T...


Schenker is pretty questionable anyway. He was a mediocre composer (at best) and it's perfectly possible to create credible multiple analyses of the same piece of music.

So while it's true that classical style(s) are very specific, it's also true they're all different. Bach, Mozart, Beethoven, and Chopin are not speaking the same language. They don't even speak the same language across their careers.

So while there are common techniques - like the idea of elaboration - they're really just the most obvious surface features, and implementing them is not nearly enough to produce decent pastiche.

AI/Music research is littered with the the corpses of projects that have attempted to do this. It's an incredibly hard problem - because unlike a board game, the "rules" change all the time, and there is no clear definition of "winning."

You can certainly mass-produce musical content in a fairly simple style - say minimal techno - and then cherry pick the best output. But even that turns out to be harder than it seems, and it's not real AI unless you can formally define why some output is better than other output. Beyond some fairly minimal perceptual and psychoacoustic basics, there's almost no research into that.

Really, we know much less about music than we think we do. Theory, Schenker included, is a very basic introduction. The interesting bits happen elsewhere, and we really don't understand how they work.


Thank you for the link to Messiaen's book!


This is brilliant- nicely written and a great introduction into a neat approach for generating music. The resulting piece is a very convincing piece of classical music too! The simplicity in developed Python framework for this kind of music generation is inspiring- I like the functional approach with defined core elements.

Reverse engineering music is hard, music has patterns but it also has patterns of breaking patterns. I think the heuristics here for compostable repetitive elements that repeat, reduce, and elaborate is a neat approach-very fractal-like in a way.

I'm looking forward to reading the other articles submitted by you!


You may also find this interesting, about the composer David Cope, the programs he created, and the way they were received:

https://psmag.com/social-justice/triumph-of-the-cyborg-compo...


Just skimmed that so far but it's really interesting. But it seems to me that it takes nothing away from Bach that Cope managed to duplicate his rules in software. Bach's genius was that he invented those rules.


I agree. It has been a while since I read the article through, but as far as I can recall, no-one is arguing against Bach's genius.


Oops! I see Cope was arguing that the great composers were "just clever mathematical manipulators of notes", at least according to the author.


What a beautiful write up and description. In naive pursuit of related ideas with less of a grounding in music theory, I use something called a "fractal sequencer," in my modular rack which is like a normal linear sequencer you would find on a synth, but with mutation, recursion, and iteration. This idea of subperficial and deep structure the author talks about, and a fractal tree with trunks and branches, are closely related, and it maps very well to this description. I wanted to see if I could use fractals to generate consonant melodies that were indistinguishable from pop music using patterns called 1/f noise. The idea was I wanted to see whether by just adding musical entropy in the form of 1/f noise to a repetitive pattern if it would be interpreted as persuasively "new" and "interesting," to our brains. My own result is just a rough live take from messing around with it ( https://soundcloud.com/n-gram-music/beatrice ) but if you want to know what a consonant autogenerated pop fractal sounds like, the upper solo line over top of the three note melody is produced by on a Qu-Bit Bloom sequencer iterating over a bunch of 4ths and 5ths and driving its own oscillator. Similarily, I recorded one with with a more early 80s electronica tone that is less fun, but more atmosphereic in iterating through the space of possible melodies (https://soundcloud.com/n-gram-music/decade)

This ch0pi1n python library looks supremely interesting, as when we listen to music, we're really expressing structures and shapes in consonance and dissonance with each other where each elaborates facets of the others. These are if not functions, at least algorithms composed over types. The author's description of these is just the right level for deriving and applying a logical architecture without diving into some bonkers numerological gematria. The post is a beautiful way of thinking about these forms. I look forward to revisiting it and playing with the library.


The analysis was interesting but the end result was pretty terrible. Of course music in the past was often considered terrible by the following generation, a trend that still exists today.

What I would really like to see is an attempt to write multipart contrapuntal works like Bach with some kind of AI. The rules are fairly well understood, but Bach knew how to adapt and even violate them all but still wind up with amazing pleasant music.


This comment should not be disregarded so easily. The reason why deep sequence learning has the best results in generating complex, highly contrapuntal music (it's more like noodling or improvisation than an actual compositional process, but it is generally compelling at its best) is precisely because of the loosely grammar-like structure mentioned in OP. The algorithmic operations they play with are not very well defined but the background theory is sound, and closely reflects what music theorists and composers in general have written about the subject in the 500 years or more it has been seriously studied.

As for deep learning models which create good contrapuntal music, see e.g. 'Biaxial RNN' https://github.com/danieldjohnson/biaxial-rnn-music-composit... by Daniel D. Johnson, who is now at Google Brain but wrote this as an independent(!) researcher. (Note that the existing code requires Python 2.x It would be interesting to forward-port it so it can work with Python 3.x and a maintained version of Theano. Replicating the model using Tensorflow would also be quite worthwhile.)

If you're interested in Bach's work specifically, the "BachBot" and "DeepBach" projects are also interesting but less accessible.

Example output for all of these models can be found on the Internet, just look around for it. The proprietary system AIVA is also worth mentioning because even though it's so proprietary and secretive, the compelling and "serendipitous" music it manages to come up with is a tell-tale sign that it's actually doing well-founded deep learning stuff behind the scenes, much like the aforementioned open systems. Note that much of the released output has been orchestrated (AFAICT) manually by humans, but at some point I was able to find some piano-format reductions that are most likely very close to what the AI actually created, somewhere on the official site.


I was inspired by your comment and took a look at Daniel's work.

Forked it and spent the day upgrade to Aesara (the maintained Theano fork) and python3.

Haven't fully trained/tested the model, but feel free to take a look: https://github.com/kpister/biaxial-rnn-music-composition


Whoa, thanks! You could try issuing a pull request once the work is complete. Most likely the author will accept it, or else it can linger as an "open" request on github and others might find it.

Also, the old version could use a special incantation:

  theano.config.experimental.unpickle_gpu_on_cpu = True
to run CPU-based inference on a GPU-trained model, even without a supported hardware GPU. It would be nice to check how this stuff works with the newer packages, and document it properly. Most likely nothing much has changed.


Deep learning needs a lot if training data. There’s not enough MIDI encoded music to train something like a GPT-3 model to generate MIDI sequences. A better way is to train on and generate raw audio, see OpenAI JukeBox. Unfortunately it’s extremely compute intensive (even compared to GPT-3), so it will probably be a few more years until they (or some other big player) releases JukeBox-2.


The amount of data you need depends on the model architecture you're using. A generic model like GPT-3 is neither here nor there, but something specifically intended for music can make do with very little data.


What do you mean “neither here nor there”?


The structure of something like GPT-3 is far too weak and general to achieve good results for something as structurally complex as music. It's designed to generate text - and then mostly natural language text. Music is very different, as OP hints in the linked post.


I thought you wanted to use deep learning. In DL transformers are the best we got currently. Both JukeBox and MuseNet use transformers. What makes you think they are not up to the task?


"Transformers" is a general technique, comparable to "LSTM" or "attention". The details of just how much prior information about the domain you reflect in the model architecture are just as relevant, and it's not clear just how well MuseNet addresses this compared to the earlier work I mentioned above. As for JukeBox, it's trying to solve audio generation at the raw samples level, which is quite literally a complexity increase of several orders of magnitude.


A transformer architecture does not change whether you use word tokens, musical chord encodings, audio waveform samples, image patches, or video frames. You can change a number of layers or attention heads, but the architecture remains the same - it does not care what is the nature of sequences it encodes/decodes. That's kind of the point - to have a universal algorithm. Human neocortex does not care which sensory receptors the signals are coming from, there's evidence that it processes all of them in the same manner.

Any prior information about the domain can be encoded in a way you prepare the input samples, and this would be equally useful and relevant regardless of the DL architecture you decide to use (RNNs, autoencoders, GANs, transformers).

Using raw audio samples has two advantages: First, you provide all available training data to a model, and trust that it will learn to extract what it needs from it. Second, unlike any other music representation formats, you have huge amounts of training data. So essentially all you need is a huge transformer and many thousands of GPUs to train it on, using all mp3s you can get your hands on, and you can reasonably expect a similar level of quality as the quality of text generated by GPT-3. What do you think?


> Any prior information about the domain can be encoded in a way you prepare the input samples

Adding features to provide further information about the input data can be helpful, but not nearly as much as ensuring that strong priors are reflected in the actual model architecture, whenever relevant to the domain. Johnson's paper about biaxial RNN provides one example of how to do this in the context of music.


The reason the end result is pretty terrible is because classical harmony and melody are closely related, and you can't job-lot-replace one without ruining the other.

This project is quite similar to something I've been working on, but I realised early on that you can't split up features like this and get credible results - credible meaning "appropriate for the style grammar."

It's a bit - only a bit, but let's go with it - like trying to generate sentences by swapping out nouns and adverbs. You end up with something that is grammatically correct in theory but makes no sense in practice.

Classical music particularly is fundamentally integrated in a way that textbook analysis doesn't fully explore.


See e.g. https://github.com/feynmanliang/bachbot. The companion site is no longer available, but here are some results on soundcloud: https://soundcloud.com/bachbot

Here is another very good one: https://openai.com/blog/musenet/

I'm a trained musician myself and interested in automatic music composition following the progress for the last thirty years, but only recent work (like the ones referenced) produce convincing results (besides Cope's work of course, but which required manual selection and editing).

You might also be interested in this survey paper: https://arxiv.org/abs/1709.01620


Is "the end result" the Beethoven's sonata in Japanese scale?

If so, the most important reason that it sounds terrible is the music is generated with MuseScore, without adjusting dynamics, tempos, etc. Actually, the Beethoven's original sonata in this blog is also generated with MuseScore, and it sounds not so good even with dynamics added.

However, this is why I agree that deep leaning is more promising than this manual approach, since too many variables you need to adjust to make music sound good rather than syntactically correct.


Meh. There’s plenty of deep learning music generation stuff out there, this is still a really cool approach

I think it would be cool to combine the two. Instead of generating raw midi, your GAN or reinforcement learning agent or whatever could try to generate sequences of transformations to melodic fragments. Neural program synthesis type stuff.

Or maybe one could build an automatic music analysis tool that can start from the score and try to infer the program that generated them. (Is that a thing already?)


"Pretty terrible" is awfully strong. I've heard a lot of terrible music and this doesn't make that cut. What you're reading is a nice writeup of some theory, with a worked example to showcase how to utilize some features of a library which is a work in progress. The author even notes that the music is simplified for the example. It isn't meant to be Bethoven. It isn't even AI. If you want to see AI, other people have done that already.

What's cool about this library isn't the quality of the (midi, retch) output -- what's cool is the actual Python library behind it, and the writeup, which are both super easy to read and follow.


I don’t agree at all, and it’s not even experimental.

And I think that is not a very nice thing to say about other people’s work.


I thought it sounded really good!


sid meier had a software like that he created for the 3do


I don't think fully automatic musical composition is possible, any more than fully automatic novel writing. A story has to start somewhere and go to some other place, and a musical piece is a story.

Art needs intent.

Actually, art IS intent. (Contemporary art is pure intent: art without artifacts.)

With music, what works (for me) is a mix of algorithmic generation and then selection / arrangement.

Here's a piece I made following that approach: automatic generation of ideas, then me doing the selection, ordering and interpretation.

https://open.spotify.com/track/5TxVfIf9JUAhCEL3O5cWXT


The author has given a very thorough look at the almost purely algorithmic melodic variations in a classical sonata, although I would think starting with Beethoven (and Chopin) may be a bit ambitious. For music of the classical era, the meaning of the music depends on not just melody and harmony, but also articulation and dynamics. In the Beethoven example, the ascending arpeggio is staccato which contrasts against the descending figure which is slurred. The sforzando in bars 5 and 6 creates an intensification in comparison to the first four bars which is all in piano; the harmony is the same as before but moves twice as fast and culminates in a rolled fortissimo chord that dies away to a half close. Without the articulation and dynamics, the meaning of the music is changed and its clarity is weakened considerably, which is why I think separating out the melodic aspect is a risky endeavour.

The music of earlier composers, Bach especially, may be more robust when put under this type of algorithmic manipulation since much less sense is lost in Bach even if you only have the melody.


https://flujoo.github.io/en/my-approach-to-automatic-musical...

Here are my responses to some comments on Hacker News.

Limitations

To be completely automatic, ch0p1n should be able to

1. analyze 2. generate 3. manipulate 4. select

musical materials to generate music.

Specifically, it should be able to analyze musical structures, generate core musical materials, manipulate these materials to produce more, and select musically good materials to make semantically rather than syntactically correct music.

For now, ch0p1n can at best provide only a framework for manipulating given materials to generate music.

Deep Learning

I have only very general knowledge of deep learning, but I think it is more promising than any other manual approach in automatic composition. I will spend time study it.

Terrible Result?

Some comments say the final generated music sounds terrible, which I can only partially agree.

All music pieces in this blog are generated with MuseScore. To make the music sound less mechanical or less terrible, you need to carefully adjust dynamics, tempos, pedals, etc. And even so, the music may still sound unsatisfactory. The original Beethoven’s sonata generated with MuseScore sounds bad even with dynamics and articulations added.

However, ch0p1n can only deal with pitch and durational aspects of music for now, but to make music sound good, you need to adjust a lot of variables. This is also why I said deep learning is more promising.

Some comments think the terribleness results from that ch0p1n can only generate syntactically correct music which may be musically meaningless. While I agree that ch0p1n has this limitation, I do believe in most cases, syntactically correct music is good enough to sound good.

I will generate more convincing music with ch0p1n in the future.


<off topic>Just dawned on me, today you can generate Mozart's avatar with a GAN, voice with TTS, replies with GPT-3 and music composition on demand with transformers. Or you could demand its opinion on other music, or have it teach you piano. That would be a nice App idea, a MozartBot. Bonus if it can guide you to appreciate the composer's music.

Would this be an early version of digital upload?


No.

You're not replicating Mozart's mind. You're replicating your interpretation of some painters' interpretations of what he looked like; a selection of word sequences bearing some statistical similarity to what he wrote down; a fantasy of your own devising as to what his voice sounded like, and you don't have rules for producing Mozart's music -- as this article demonstrates, pretending that you can produce satisfying music from a set of rules is not convincing, in the same way that GPT-3 output cannot convincingly pass a Turing Test as soon as you ask questions about meaning.


Reminds me of the KeyKit algorithmic MIDI composition language (first version, 1985): https://nosuchtim.com/tjt/particles/talks/keykit.pdf


I'd like tools that would do these transformations in a piano roll, a la FL Studio


I’m working on an IDE for music composition. Launching soon https://ngrid.io


I looked at the home page and it sounds interesting! In what key ways will your software be different than existing ones?


It will actually understand music theory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: