
Deepjazz: AI-generated 'jazz' - mattdennewitz
https://github.com/jisungk/deepjazz
======
rryan
This is really neat! But I think it's a stretch to call it AI-generated jazz
music.

As I understand it, the author has trained an LSTM on a single MIDI file --
"And Then I Knew" by Pat Metheny. The network is then asked to generate MIDI
notes in sequence.

What this network has been asked to do is to produce an output stream that is
statistically similar to the single MIDI input file it has been trained on. It
would be more accurate to call this an "And Then I Knew" generator. Its "cost
function" \-- the function the network is trying to minimize during training
-- is exactly how well it reproduced the target song.

Neural networks are "universal function approximators". It's not surprising
that given a single input, a network can produce outputs that are
statistically similar to it.

A network that could compose novel MIDI jazz would look like this:

* Train a network on a corpus of thousands to hundreds of thousands of MIDI jazz files.

* Add significant regularization and model capacity limits to prevent the network from "memorizing" its inputs.

* Generate music somehow -- the char-RNN approach described here is fine. There are other methods.

You want the network to build representations that capture the patterns of
jazz music necessary to pastiche them but not high-level enough
representations that the network is exactly humming the tune "And Then I
Knew". This is so much of a problem that any paper presenting a novel result
in generative modeling pretty much must include a section presenting evidence
their model is not memorizing its inputs.

I can hum a few classic jazz tunes from memory but that mental process is not
jazz music composition -- it's reproducing something from memory. If we're
going to call a model "AI-generated jazz" you need some way to tell the
network to not hum a tune it knows and instead compose a new tune with the
principles/patterns it knows. Since we can't speak to our models and tell them
to think one way and not the other, part of the trick in this field is to come
up with models that can only do one thing and not the other.

~~~
aczerepinski
Collective improvisation is the core of jazz's identity, more so than any of
its other defining traits (swing, syncopation, blues-derived harmony, etc).

Generating random patterns that sound jazz-ish is interesting, but until
multiple generators can react to what the other is doing in real time (or to a
human participant), it isn't exactly jazz.

I'd equate it to a basketball playing robot. Teaching it to shoot free throws
is interesting, but doesn't really take a step towards approximating what
basketball is. Can it call for picks, lead passes to cutting teammates, box
out for rebounds, force bad shots, etc?

~~~
Balgair
Well, given enough time and resources, then yes, the b-ball-bot could, and
probably better than a human could. I know this is a cop-out answer, but look
at the DeepMind Go games. The computer beat a top 100 (I don't know the
rankings, actually) Go player, something that was thought of as nearly
impossible in this decade.

The most interesting thing was if you read the commentary on the matches. The
announcers were mystified by the computer's moves. 'Alien' comes up a lot in
describing the play-style. Us humans can't play Go and evaluate each stone in
the game. We have to 'chunk' the game. Exp: These 3 stones are a 'wall' or a
'platoon', this stone is 'hot' and can take your stones, this stone is 'down'
and will be used in 3 turns, etc. The computer doesn't have to do that
chunking, each stone is evaluated individually. As such, the play-style was
totally foreign to people. It did things no player had tried or, importantly,
could have thought of given the limits of our brains having to 'chunk' the
information.

I would predict that a b-ball-bot would play the same way, in totally strange
ways that a human can't think of. Exp: Calculating a reasonably high
probability that the ball will bounce off your nose and go into the left hand
of it's team-mate, throwing the ball as hard as it can at it's own head to
make a shot, not trying to get past just 1 opponent but the entire team's
right thighs 57 seconds from now, etc.

Similarly with jazz, the computer is a dumb machine that will just do strange
things because humans have to 'chunk'. In music, we play in chords and notes
and with rhythm and timing. The computer can evaluate the whole song, and
every other song at the same time and can borrow from all those. You and I can
pull in the feelings of loss of a child, or the joy of strawberry ice-cream
bars in a Memphis summer, things a computer will never. But we cannot pull in
the obscure Tuvan throat singing techno-remixes on Youtube , the Afro-Thai
heavy metal Vimeo channels, or the terrible pre-teen angst poems set to crappy
guitar, etc, all at once. It can only see what you feed it, but you can feed
it the life-outputted-into-music of billions of humans with live updates. The
computer will know more.

But music is emotional and about feelings. The feel of music is most important
to us. And I think that a human songwriter is therefore essential, one that
cares and puts effort into the work. It connects us, and that is what is
important, not the sounds.

~~~
cdr
> But music is emotional and about feelings. The feel of music is most
> important to us. And I think that a human songwriter is therefore essential,
> one that cares and puts effort into the work. It connects us, and that is
> what is important, not the sounds.

Children can play music very emotionally (or rather, in a way that adults
associate with emotional) without having any experience of or real
comprehension of the emotions. Imitation and training is sufficient to be
convincing. A program doesn't need to experience emotion, only know that
certain characteristics of the sound are associated with certain emotions.

------
JamilD
This sounds to me like the "uncanny valley" of music. It's close to being
pleasant, but it's very discordant and hard to listen to…

~~~
Moshe_Silnorin
That's jazz for you.

~~~
erikpukinskis
Not really.

~~~
dbcurtis
I think he is confusing jazz with Bartok.

~~~
erikpukinskis
God I love Bartok. When I played piano in high school I basically only played
Bartok and Brubek.

I think people find unfamiliar music difficult to listen to. I don't think
it's really about genre or artist.

I suppose some genres are trying to be difficult on some level (rock and roll,
punk, metal, and rap each took up that mantle) but all of those were meant to
be easy to listen to for a target audience.

Bartok never struck me as super combative. Brainy, perhaps.

~~~
dbcurtis
Yes, I agree with you about people not liking the unfamiliar. My comment was
tongue-in-cheek. Though, you must admit, badly played Bartok is gruesome. My
daughter plays violin, and discovered his 44 violin duets and also the
Hungarian Dances suite. Luckily she is good enough that it is fun to listen
to. But my standard joke is: "All teenagers seek out music that will drive
their parents crazy. Mine found Bartok."

~~~
erikpukinskis
Truth. The combo of amateur violin and Bartok must be another level of
torture.

------
neurobuddha
Coming from an avid Jazz listener, this is awful. Not even close.

I don't mean this as a slight at all, but definitely raise the bar on your
experiments.

~~~
Uehreka
I thought it was neat for a few seconds, but then it got stale really quickly.

But then I listened to the original (the track used to train the network) and
realized the problem: the network only knows how to write one song. What you
hear on SoundCloud is the equivalent of giving someone a 5 paragraph essay,
and then telling them to write a 10,000 word paper using only sentences
contained in that essay.

Supposing that this program can accept more than 1 song in its training data,
I expect it could produce really interesting stuff.

~~~
mpdehaan2
Yeah it's kind of pixelizing an existing song. What I really want to do is
teach a program to jam and know what sounds good - which is a lot harder.

But there's a part of music where human soul needs to be, and that is
interesting too, and some of the expression stuff is harder to do in MIDI
land, you can modulate a filter cutoff or velocity or something - but compared
to a live player there is a LOT of work to do.

------
daviddaviddavid
One of the central features of jazz (or any music) is rhythm. In the case of
swing-based jazz, including bebop you have the upbeats of 2 and 4 emphasized.
It's the opposite of rock. The Metheny track here has a typical rock beat, so
it's a very odd target.

Also, unless I missed something the clips just play the network's attempt at
duplicating the "head" of the track; not the soloing.

As a jazz musician I find this cool but I also feel safe that it won't be
stealing gigs from me anytime soon.

~~~
aczerepinski
To clarify your first paragraph, rock and jazz both emphasize 2 and 4. Swing
is about the relative duration and weight of the first and second eight notes
in a single beat.

In fast tempo bebop they tend to have relatively equal durations, and in other
jazz styles they trend closer to 2/3 + 1/3 of the beat respectively.

~~~
daviddaviddavid
That's not correct.

In a typical jazz swing drum beat the high-hat is closing on 2 and 4 (the
upbeats).

In a typical rock drum beat the bass drum is on 1 and the snare drum is on 3
(the downbeats). There's almost never emphasis on the upbeats.

The two styles are almost completely opposite in feel and that Metheny track
is using the rock style.

~~~
duderific
Huh? In typical rock, the bass drum is on 1 and 3, and the snare drum is on 2
and 4. Think "boom...bap...boom...bap". 1 and 3 are the downbeats and 2 and 4
are the upbeats.

Maybe you are thinking about counting eighth notes on the high hat -- in that
case the bass drum would be on the first high hat hit, and the snare would be
on the third. However the counting should always be on the quarter note, i.e.,
two high hat hits per count -- "One and two and three and four and."

~~~
daviddaviddavid
No, I'm talking quarter notes in 4/4 time here. Bass drum on one, snare drum
on 3 and timekeeping hand playing on all four quarter notes.

In your "boom...bap...boom...bap", the ellipses are quarter-note rests. Listen
to any simple rock tune, say, AC/DC's Back in Black. With BD == Bass Drum, SD
== Snare Drum, HH == High Hat, what you get is:

    
    
        Note | 1    2    3    4
        -----|-----------------
        HH   | x    x    x    x
        BD   | x
        SD   |           x

~~~
6581
That's only half a measure. What you transcribed as 1, 2, 3 and 4 are eighth
notes.

------
devin
As a card-carrying jazz nerd, I am impressed. If there were more dynamics,
some of these soundcloud examples would sound significantly better.

ETA: The default midi sound font doesn't do it any favors, either. I have some
software instruments I could throw at this that would make it sound a whole
lot better.

------
brandonmenc
Anyone interested in algorithmic jazz should check out Al Biles:

[http://igm.rit.edu/~jabics/](http://igm.rit.edu/~jabics/)

~~~
kevinmgranger
More specifically, GenJam:

"GenJam (short for Genetic Jammer) is an interactive genetic algorithm that
learns to improvise jazz."

[http://igm.rit.edu/~jabics/GenJam.html](http://igm.rit.edu/~jabics/GenJam.html)

------
newobj
The best part is that the resultant "jazz" sounds more like vaporwave[1].

[1]
[https://www.youtube.com/watch?v=PdpP0mXOlWM](https://www.youtube.com/watch?v=PdpP0mXOlWM)

------
alexc05
That's funny, I was just researching this last week.

I stumbled across some music generators. A downloadable one
[http://duion.com/link/cgmusic-computer-generated-
music](http://duion.com/link/cgmusic-computer-generated-music)

And [http://www.abundant-music.com/](http://www.abundant-music.com/)

Both are "procedurally generated music" so I'm not sure where that falls in
the AI spectrum.

I found that the quality was interesting and there was some potential there
but at least in these cases, there were some issues with the quality of the
midi instruments and song structure was very "same-y"

Anyways, Looking forward to poking around in the DeepJazz code.

------
mpdehaan2
Always good to see more computer music projects.

I started on recently - and need to do more work on it - to do some things in
a bit more of an object-oriented way trying to model more music theory
concepts (like scales) as objects, not so much analyzing existing files but
making the primatives you might need to build a sequencer (and eventually some
generative stuff).

If people are interested check out:

[https://github.com/mpdehaan/camp](https://github.com/mpdehaan/camp) (in the
README, there is mailing list info).

The next thing for me is to make an ASCII sequencer so it's a program that can
also be used by people who can't code, and then I'll get back more into the
generative parts.

------
shams93
George Lewis wrote a realtime improv AI in forth back in the 90s it used midi
so the sounds were like general midi at the time but the interplay between
human trombone and the machine listening to his playing on the fly was amazing
given the limitations of the machines at the time. To be AI jazz it has to be
able to jam with humans or other machines.
[https://en.wikipedia.org/wiki/George_Lewis_(trombonist)](https://en.wikipedia.org/wiki/George_Lewis_\(trombonist\))

~~~
shams93
[https://muse.jhu.edu/article/20320](https://muse.jhu.edu/article/20320)

------
ARothfusz
I'd be more impressed if they had trained it on Pat Metheny and then given it
"Mary Had a Little Lamb" and said "jazz this up"

~~~
tjr
I'd be more impressed if they had trained it on Kenny G and Pat Metheny liked
the results.

------
trsohmers
Serious question: Who is the copyright holder on generated works? The program
author? The person who wrote it? Do you have to give any sort of authorship
credit to those who created the works in the mined data set? Copyright law in
the 21st century is just getting more and more complicated...

~~~
mbrock
I read an interesting argument related to this topic in a Jehovah's Witness
pamphlet. There was an article about how human inventions mimic God's
creation, and the silliness of our squabbles over copyrights and patents.

~~~
6stringmerc
Quite interesting. If I ever get in a situation where one believer would like
to engage in a dialog with me, that sounds like a good subject to discuss.

------
twic
There's an enjoyable summary of some other efforts in neural network music
synthesis here:

[https://highnoongmt.wordpress.com/2015/08/11/deep-
learning-f...](https://highnoongmt.wordpress.com/2015/08/11/deep-learning-for-
assisting-the-process-of-music-composition-part-1/)

The same author's Endless Traditional Music Session supplies all the Irish
session music you could ever need, by mechanical means:

[http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/inde...](http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/index.html)

------
phatbyte
Awesome work, and this is quite interesting, something worth exploring with
more depth that an hackaton can't provide.

Having said that, and as a Jazz fan, the generated music is horrible. Keep
feeding it more jazz tunes :P

------
gluelogic
One thing that comes to mind is that, to me, it sounds like all of the notes'
velocities are equal. It would sound a lot more natural if volume differences
were incorporated

------
granttimmerman
I built a very similar project for classical music using Theano and MusicXML
for a Sound Capstone Project at UW.

Blogpost + music: [https://medium.com/@granttimmerman/algo-rhythm-music-
composi...](https://medium.com/@granttimmerman/algo-rhythm-music-composition-
using-neural-networks-f89897ff2df7)

GitHub: [https://github.com/grant/algo-rhythm](https://github.com/grant/algo-
rhythm)

------
desireco42
I respect the criticism of people who love and listen to jazz quite a bit.

As someone who maybe is not as sophisticated in his taste for jazz, this
sounds good enough for me. Especially this can be passed as elevator music.

On the other hand, it would be more valuable if there were more than a single
file used for seeding. This way this is a theme that is listenable but will
always have the same style of it's seed.

I intend to play with it and see if I can get more interesting melodies.

------
imaginenore
It's rendered with some really shitty sounding instruments. Run it through
Ableton Live at least. Or even better, a specialized piano engine.

------
pjdorrell
When human composers attempt to compose original music, they have immediate
access to their own subjective judgement of the quality of the music.

Until such time as we discover an algorithm that replicates human taste in
music, any AI-based approach to composing music will fail because it will not
have any feedback about the quality of the music.

------
return0
It sounds like with a few epochs it captured some rhythmicity. The notes still
sound random, but overall its promising. This is only a hackathon project, I
'm pretty sure we ll see more elaborate networks in the future that make
acceptable jazz. Its gonna be a bit more difficult for other kinds of music, i
guess.

------
I_HALF_CATS
Can someone explain to me the difference between this and the computer
generated music David Cope of the early 1990s?
[https://youtu.be/yFImmDsNGdE?t=44s](https://youtu.be/yFImmDsNGdE?t=44s)

It seems like the word 'AI' is getting thrown around.

------
jbmorgado
An improvement that should be quite straightforward and take you no more than
a couple of hours is to use sampled sounds for recording the play.

It would massively improve the quality of the output and make it sound more _"
humane"_ IMO.

You can use the samples from www.freesound.org for instance.

------
ryanmarsh
Was expecting to hear some Blue Note, got frantic muzak. Humans are safe...
for now it seems.

~~~
imaginenore
Not really. AI wrote some compelling classical music, and years ago:

[https://www.youtube.com/watch?v=QEjdiE0AoCU](https://www.youtube.com/watch?v=QEjdiE0AoCU)

[https://www.youtube.com/watch?v=2kuY3BrmTfQ](https://www.youtube.com/watch?v=2kuY3BrmTfQ)

[https://www.youtube.com/watch?v=mnBUxG-
wSVg&t=13s](https://www.youtube.com/watch?v=mnBUxG-wSVg&t=13s)

~~~
soundwave106
That's better than the jazz works (which is fine considering the jazz works
were a hackathon). But I wouldn't call these masterpieces.

From my perspective, AI generated music at the present often falls really
short on two areas. The first is instrumentation and dynamics. AI music often
sounds "robotic". Probably better soundsets for some AI examples would help,
but beyond that, I find a lot of AI music "overly quantized" sounding. Humans
often don't play the music exactly as written(see:
[https://en.wikipedia.org/wiki/Expressive_timing);](https://en.wikipedia.org/wiki/Expressive_timing\);)
this "non-perfect timing" is a large part of many music works' expressive
element.

The second problem to me is that AI music often falls short on overall
coherent musical themes. A lot of AI pieces tend to sound "structureless" with
no real direction, no thematic elements, nothing that could be called a motif
or hook, etc. There are definitely some established "rules and patterns" for
music, so it's not like some of this could be fed into the AI. The best
composers however bend and play with convention a bit, though.

------
genolilie
[https://www.youtube.com/watch?v=Fq6lypuUPeg](https://www.youtube.com/watch?v=Fq6lypuUPeg)

------
squeaky-clean
Even if it is a very limited model and the tracks get boring quickly like
everyone is saying, this is still extremely cool. I really need to buy a new
GPU that I can run Theano on.

------
KON_Air
Knowing next to nothing about musical terms I couldn't figure out the workflow
of the AI. Does it generate note after note trying to follow the learned
"structure"?

------
sengork
This reminds me of AWK Music [http://kmkeen.com/awk-
music/](http://kmkeen.com/awk-music/)

------
fiatjaf
I like this because I don't like jazz.

------
DonHopkins
Hook it up to a speech synthesizer, to make Deep Scat!

I played around with looping different speech synthesizers back into different
speech recognizers, kind of like audio or video feedback, but with chaotic
noise injected like quirks of the synthesizer, the voice, speech speed and
pitch, and the audio environment around the microphone (you could talk over it
to interfere with the words it was speaking and lay down new words in the
loop), working against the lawful pattern matching and error correction
behavior of the speech recognizer, and the HMM language model it was trained
with.

It was a lot like beat poetry, in that it tended to rhyme and have the same
number of syllables and use plausible sounding sequences of words that didn't
actually make any sense, like Sarah Palin.

You can start it out with a sensible sentence, and it will play the telephone
game, distorting it again and again. If you slow down the speech rate, words
will split into more words or syllables, and if you speed it up, words will
collapse into fewer words or syllables, or you can tune the speech rate to
maintain the same number of syllables. Its analogous to zooming the video
camera in and out with video feedback.

It would wander aimlessly randomly around poetic landscapes, sometimes falling
into strange attractors in the speech recognizer's hidden markov model and
repeating itself with little or no variation.

At any time you can join in with your own voice and add words during the pause
at the end of the loop, or talk over its voice, much the way you can hold
things in front of the camera during video feedback to mix them in.

Different speech recognizers are better at recognizing different vocabularies,
and therefore like to babble about different topics, depending on which data
they were trained on, which we could guess by attepmting to psychoanalyze
their incoherent babbling.

IBM's ViaVoice was apparently trained on a lot of newspaper articles about the
Watergate hearings, as it was quite paranoid, but business like, as if it were
dictating a memo, and would start chanting and fixating on phrases like
"congressional investigation," and "burglary and wiretapping," and "convicted
of conspiracy".

Microsoft's speech recognizer had obviously been trained on newspaper articles
about the Clinton Lewinsky scandal, since it was quite obsessed with
repeatedly chanting about blow jobs (just like the news of the time), and
whenever you mentioned Clinton this or Clinton that, it would rapidly converge
on Clinton Lewinsky, Clinton presidency, Clinton impeachment, etc.

What I'd love to have would be a speech recognizer that returns a pitch
envelope and timing that you could apply back on the synthesized words, then
it could sing to you!

------
aaronlevin
If you're interested in making deep-jazz more discoverable, consider applying
to our Search team! :)

[https://soundcloud.com/jobs/2016-02-19-search-engineer-
berli...](https://soundcloud.com/jobs/2016-02-19-search-engineer-berlin-
germany)

~~~
dang
Job posts aren't allowed in regular threads on HN.

~~~
reitanqild
Sorry, honestly didn't remember that. Thought it was just usually frowned
upon.

------
SubiculumCode
Sorry. Not impressed.

~~~
kafkaesq
People need to stop knee-jerkedly downvoting stuff. The above comment might
not sound very civil or friendly in response to posting about an AI project --
but it's a perfectly reasonable gut-level reaction to have to an (alleged)
piece of _music_. Particularly this "music".

And it happens to be mine, also, in regard to the SoundCloud samples. Sure,
the project behind it might be mathematically interesting and all... but
really now, this ain't _music_ , let alone jazz. In fact, if I came across
those samples whilst flipping between radio stations, I would probably hover
for at most a second or two, before giving the dial another turn... or turning
the damn thing off.

Absolutely unlistenable, in other words.

~~~
mrspeaker
No one downvoted the OP because they thought the music was good - they
downvoted because the comment was garbage and didn't express any reasoning. It
was a useless "-1" reply.

Your comment on the other hand (aside the complaining about downvoting, which
is discouraged in the HN guidelines) was fair and interesting - and I
personally upvoted it.

