
Generating music with expressive timing and dynamics - iansimon
https://magenta.tensorflow.org/performance-rnn
======
contingo
It's refreshing to hear generated piano music that isn't either strictly
metrical or entirely freeform, but with patches where you do get a somewhat
natural sense of rubato and sensitive dynamic shaping. It's sort of
convincingly improvisatory. The constantly shifting harmonic idiom is
disorienting in a not very pleasant way – the worst kind of Chopin + Ligeti
mashup – especially when you raise the temperature. It would be interesting to
use period/style-specific training sets.

To my ears the 5:00 clip does have a larger structure, there are clearly
extended passages of building up to and ebbing away from large climaxes, where
you get a real sense of sustained intensification, but of course if you follow
the detail everything is built up from lots of fleeting and unrelated ideas.

~~~
iheartmemcache
> "It's sort of convincingly improvisatory."

I'm not sure if it was the dynamics specifically, but it was clear to me that
A was human. Within 30 seconds I was so sure, I hit pause and loaded the
answer to see if I was right (I was, and I'm likely the worst pianist on these
forums and only a casual fan of music that falls into the 'classical' genre.)

Here's[0] a fabulous physics paper that analyses the 16th notes by a studio
drummer widely considered one of the best in his field. IIRC, the paper
mentions he couldn't record with a click because it'd throw him off. That
being said, the quality of the recording didn't suffer (his 2nd take of the
track was more than good enough for the rest of the musicians to record
against). So his own 'internal metronome' was more than good enough. The
interesting thing wasn't that his syncopation was incompatible with a click
track, but rather the skew which evolved throughout each phrase had a
mathematical model that fit well against it. The study compared his recording
against a corpus of user submissions of the same track and all of these
drummers _universally_ followed a similar set of dynamics. So presumably _all
humans_ (or at least, all western drummers who elected to submit their
recordings) have that _same skew_ intrinsically.

It's interesting whether it's a byproduct of culture (like an accent) or a
feature intrinsic to humans. In fact, that itself would be an interesting
study -- compare the patterns of a traditionally schooled western jazz drummer
vs a tribal African drummer with vs an Indian tabla drummer. The end of the
paper suggests additional avenues to explore, but who knows, maybe soon
drumming will be 'solved' ?

I'm totally with you on seeing how it would do training against a specific set
of recordings from a specific region and/or era. The results would be terribly
interesting ! Or training just against some particular virtuoso like Gould on
Bach or Horwitz on Chopin.

As I understand it, there are basically just a handful of songwriters out
there (Shane McMcAnally is a prime example) who write songs for the major
country-pop artists. If you have a listen to this[1], you can really see how
similar each song is. (This isn't exclusive to country music - the 90s pop I
grew up loving is pretty much the same, as demonstrated by Rob Paravorian[2].)
There's probably a lot of money in automated songwriting for Katy Perry & her
entourage. Startup idea for any of you kids.

IIRC, there's a startup which is already using pinterest, tumblr, and more
obscure sites like lookbook to analyze and generate trends for clothing and
interior design which design houses can pay semi-nominal fee to gain access
to. H&M is great pumping out high-street fashion copies within a season, but
imagine being able to actually beat Tom Ford to market.

There are also interesting sociological implications for this. The Culture of
Chess changed with BlueGene. When I first read about AlphaGo I was floored. (I
mean really. I had previously thought would be intractable within my lifetime
due to the huge configuration space.) As we see these 'good enough' models
emerge, this has wide implications on human culture as a whole.

I wonder how it will affect the value of artists (in any genre). An ex of mine
who hated basketball (this was during the Kobe/Paul Pierce days) still managed
to recognize the genius when I showed her some Michael Jordan clips. Certainly
an artist in his craft. I'm not a fan of Lady Gaga[3] but when I saw this
performance I could immediately see a significant amount of talent. Walter
Murch is an absolutely amazing film editor, will he be reduced to a Final Cut
Pro plugin? If I manage to get my hands on the all-22 recordings (for every
NFL game, there's an overhead camera which records the whole field to let
coaches analyze their opponents) of every American football team, can I out-
tactic Bill Belichick ?

==

[0] journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902
(Seriously, it's a fantastic paper.) [1]
[https://www.youtube.com/watch?v=FY8SwIvxj8o](https://www.youtube.com/watch?v=FY8SwIvxj8o)
[2]
[https://www.youtube.com/watch?v=JdxkVQy7QLM](https://www.youtube.com/watch?v=JdxkVQy7QLM)
[3]
[https://www.youtube.com/watch?v=oP8SrlbpJ5A](https://www.youtube.com/watch?v=oP8SrlbpJ5A)

~~~
fenomas
> but it was clear to me that A was human

I believe you're thinking of a different project/article that was on the HN
front page a week or two ago (but which I can't seem to find).

This article doesn't have any "which is human" bits (and the results here are
a lot more impressive than the one you're thinking of).

------
henearkr
It seems that this model does not have any notion of "cadence" (the
punctuation in musical grammar, given by harmony and tonality). The
"expressivity" must be correlated to the harmony grammar, else it does not
make sense. Unfortunately the samples in the article do not sound very good to
me, and I am pretty sure that it is because of that.

------
kastnerkyle
This is stunning! Great stuff.

Since the input and prediction is a single sequence, did you experiment with
beamsearch/stochastic beamsearch decoding (maybe with additional diversity
criteria)?

I found that even simple models (markov chains) got a big diversity boost with
a stochastic beamsearch - it might avoid the problems with low temperature
repetition that could happen in a standard beamsearch. However, my music
models are much, much, (much) worse than this, so my relative improvement
might be related to that.

Similarly, I am finding really nice results in text (RNN-VAE) with scheduled
sampling, it might be worth experimenting with.

I am amazed at how good this next-step sampled output is. The above ideas
might just hurt the result, I am having a hard time imagining how it could be
better.

What soundfont/midi rendering package is used for this? The piano sound is
really rich.

Looking forward to hearing what creative things users will do with this model.

~~~
iansimon
Hey Kyle, we didn't try anything more advanced than next-step sampling. You
probably have a better sense than I do how much improvement such techniques
are likely to yield. My unfounded suspicion is that we're close to the limit
of generation quality from this dataset, and so I'm most interested in trying
to gather 10-100x more skilled performances, one way or another.

There's also no consensus on whether the high- or low-temperature samples
sound better. I've heard both opinions from several people.

Sageev did the final rendering, not sure what he used but I'm pretty sure it
was nothing too fancy.

~~~
kastnerkyle
A bigger dataset of MIDI with velocity information and performance timing
would be really, really great.

High temperature versus low is tough to compare - I find that sometimes low
temperature seems better, then I change the random seed and my opinion flips.

Same for stochastic versus deterministic beam search, length/diversity
scoring, and so on. I have been meaning to blog on this, will send it your way
when I get it posted.

For character text, stochastic seems nicer broadly (maybe due to limited size
of markov space, see [0] deterministic vs. [1] stochastic) but for music it
depends on the representation I use. However at least in this cherrypicked
example, I find the repetition of the deterministic beamsearch hilarious even
though it is "worse".

Interesting, I will have to ask him what it was. With that render, at least my
bad samples will sound prettier.

Great job on the model again!

[0] [https://badsamples.tumblr.com/post/160767248407/a-markov-
arg...](https://badsamples.tumblr.com/post/160767248407/a-markov-argument)

[1]
[https://badsamples.tumblr.com/post/160777871547/stochastic-s...](https://badsamples.tumblr.com/post/160777871547/stochastic-
sleazy-shakespeare)

------
DomreiRoam
Could it mean that you could generate music for games that would follow the
action and help build up tension?

~~~
pasta
This is already done in a lot of games. But those are precomposed parts that
are dynamically morphed into each other when action changes.

~~~
DomreiRoam
Thanks, it make sense to do so. Do you know it they use use this approach only
in game with a scenario or also in multi-user settings? Because in one case
they can use static triggers where in the other case they would need to
measure user's sensation or stress. I mean the music could be more intense if
you are in abttle and your armor and health are quite low.

------
the_cat_kittles
that first example is jaw dropping. its just like what good musicians do when
they are noodling. damn. well done! probably the best results i've ever heard
for this type of effort.

~~~
3131s
I'm curious how many tries it took to get that. I've tried chopping up samples
from piano music using onset detection and then recombining samples
programmatically. The results were more interesting musically to me actually,
but also not as reminiscent of a traditional classical / romantic piano piece.

So, this is probably the best RNN generated music that I've heard too but
overall I'm still not extremely impressed.

~~~
divenorth
I'm passionate about this subject so here's my take on it. Until neural
networks can create music better than humans it will be nothing more than
table talk.

Usually for something to gain any real traction it needs to solve a problem or
do it better than current solutions. AI generated music does neither.

From a music nerd who loves programming and neural networks I find this stuff
very interesting. But I feel that neural networks could be much more useful to
composers in other ways.

~~~
3131s
Yeah, I make electronic music and I'm really interested in algorithmic
techniques as an aid to the composer too.

I have python scripts that just generate many minutes worth of music in an
instant, and then I comb through the result and cut out the interesting parts
for further processing. It's a really productive technique and you hit on
melodies and rhythms that normally a human wouldn't.

~~~
divenorth
If AI is used as a tool to help composers become more productive it would be a
massive hit. The ability to create new interesting stuff in less time is
valuable.

------
hakcermani
This can generate elevator music that will never repeat. I am up for that !
(Just getting into ML with Udacity, Coursera courses. This is just
fascinating)

~~~
divenorth
You don't need a neural network for that.
[https://www.youtube.com/watch?v=esRdmKYucIw](https://www.youtube.com/watch?v=esRdmKYucIw)

