
Music and Machine Learning - zappau
http://ai.sensilab.monash.edu/2018/08/23/Neural-Music/
======
fenomas
> The music generated by wavenet clearly sounds like a piano, but lacks
> compositional structure that most people might be able to follow. I suspect
> a significant architectural change will be needed for music for reasons
> discussed in this article.

As someone who's worked a lot on procedural music, I think this is definitely
true. I'm always surprised to see ML-based approaches where someone has just
trained a system on a bunch of songs, and then hopes the system will produce
music with a recognizable structure - even though all the training songs will
have had (in general) different chord progressions, different numbers of
voices or melodic lines at any given time, etc. Such approaches strike me as
akin to training a system on a bunch of short stories and then hoping it will
produce a new story with a recognizable plot.

It seems like it would make a lot more sense to remove these hidden
dimensionalities, e.g. by annotating the source data with chord or structural
information, or by training on lots of different melodies that all share the
same chord progression, etc. But it's hard to imagine that with enough layers
the network will eventually grok all these hidden details.

~~~
lwansbrough
I believe the first company to create a very successful cloud based DAW will
have the greatest opportunity to vacuum up this data. If producers were
pushing real original music source data into your neural network, you
basically eliminate the need to do any sort of waveform analysis. Everything
turns into discrete numbers, which neural networks are really good at managing
vs. the destructive noise of common music files. (120BPM music could
conceivably be 2 inputs per second vs. 44100)

Edit: As a matter of fact, you don't even need a whole DAW, really. You just
need to be able to read existing DAW files and give users a reason to upload
them.

~~~
jerrre
I think a lot of DAW work is still done with wave-data not note data...

~~~
igotsideas
Are you talking about the creation of Music or gathering the data of the Music
for ML purposes?

------
8bitsrule
This is a set of valuable, experienced reflections on the task anyone faces
who wants to make music (let alone get a machine to do it).

Much to digest here, but I especially like this thought: "There are no words
in music." A whole lot of people think that music is nothing more than a
carrier they have to modulate with some important message.

------
bhrgunatha
Enjoyed the article very much, especially the part about music not being a
language and the dive into syntax and semantics, but this...

> The only way to recognise which pop song (from one popular training set)
> this score comes from is the lyrics. The melodic line has little similarity
> to the original recording.

...strikes me as plain nonsense. I can play that line on my guitar or keyboard
and know immediately which tune it is even without the lyrics.

Am I misunderstanding something?

~~~
tommymachine
The notes / rhythm in the particular score aren't accurate to the recording,
except for the lyrics. Supposing if you played it as written there, it
wouldn't be so recognizable. Personally, I don't much fancy calling something
so inaccurate a 'score'. More like, a 'rough suggestion' or something. A score
is supposed to be like a thing that someone like Hans Zimmer would write for
an orchestra. Pop songs typically don't have scores involved in their
production anyways, which may be partially responsible for the inaccuracy. But
the real responsible party is the person who transcribed it wrong!

~~~
sacado2
Well, take any jazz standard. As you say, scores in jazz are at most a rough
guide, not transcription of how the music should be played. Yet, if you take a
score and make a MIDI player play it as it is written, any jazz fan will
recognize the tune instantly.

For instance, take

[http://i17.tinypic.com/4uneoft.jpg](http://i17.tinypic.com/4uneoft.jpg)

Remove the title, add a mistake or two in the score, and give it to some jazz
fan who can read music, he'll recognize the tune instantly, yet nobody ever
played it this way, so blandly.

~~~
tommymachine
They'll recognize the chords instantly, without even having to glance at the
printed notes. In practice, these things are usually called "charts" or lead
sheets (if they have a melody) because they chart out the changes, which you
solo over. They usually are found in "fake books", where it's assumed that the
melody transcription is relatively awful, but it's close enough that a decent
musician can quickly figure out the actual tune from it (assuming they've
heard it before).

You'll almost never hear any seasoned jazz players refer to a chart like this
as a "score", unless they have a very very classical background, or are sort
of making a joke about the quality of a particular lead sheet. Scores are for
orchestras and films and things like that. It's strange to see people keep
referring to these sh*tty transcriptions as 'scores'.

It's like calling Kraft Mac 'n' cheese "pasta". Like, ok maybe technically it
could qualify as a pasta, but you don't really refer to it as that in
practice.

------
montalbano
> Music is not a language

I think almost anyone who's spent significant time jamming/improvising with
other musicians will disagree.

The statement is an oversimplification. It is an oversimplification that may
be necessary for music to become tractable to machine learning algorithms, but
it does more to highlight the limitations of current understanding of AI than
anything else.

Otherwise a very interesting read!

------
toolslive
> Most people can’t tell you why they like the music that they like. Not with
> enough resolution to accurately predict their preference for new tracks

Won't most people will be able to tell you with good resolution why they don't
like the music they don't like?

~~~
nerdponx
Sometimes. They might be able to identify which aspect of the music they don't
like, but I don't know if they are able to explain "why" they don't like it.
But I feel the same is true on the positive side. Maybe it's worth trying to
figure out what part of the music people like ("John Denver has a great
voice"), rather than some nebulous reason why they like it.

------
okonomi
Not machine learning but this is an extremely cool generative approach for
emulating sounds on a synthesizer
[https://fo.am/midimutant/](https://fo.am/midimutant/)

------
BillBohan
These are some interesting observations regarding music and machine learning.
It has been my experience that the majority of the output of ML music
generators falls into the category I would classify as noise.

I briefly experimented with procedural music generation many years ago and
will relate my experience in the hope that some may find it interesting or
take inspiration from it.

I had read the Byte magazine article called "A Travesty Generator for Micros."
which works with text files and realized that Markov chains could be applied
either to whole word or individual letters. Sufficiently long chains of
letters almost always produce actual words. Sufficiently long chains of words
generally produce complete (although nonsensical) sentences. Excessively long
chains copy the input to the output. See [1] and [2]

At the time I was playing LOTRO [3][4] which uses ABC files [5] which are a
text representation of music. I used the .abc files as input to the travesty
program and got very interesting output. I used the rescan method which reads
the input file for each note to output. It is slow but uses far less memory
than the array method which reads the input once and generates a complete
table of all transitions.

Running travesty on a single .abc file produces an output which is very
similar to the input and only mildly interesting. Chaining together 2 or more
input files is when it gets more interesting. It did not work well unless the
input files had the same key signature.

I considered the possibility of transposing all input files to a common key
signature but did not implement it. Nearly all music representation is an
abstraction of the music. Music is generally quantized into notes of the even
tempered 12 note scale. The tune is recognizable regardless of the instrument
it is played on. I wondered whether there were further abstractions which
could be used similar to the way that either letters or words could be used
for text but am not sufficiently musical that I could discover them.

If you try this I think you will quickly get results which encourage you to
continue.

[1]
[https://en.wikipedia.org/wiki/Parody_generator](https://en.wikipedia.org/wiki/Parody_generator)

[2] [http://runme.org/project/+travesty/](http://runme.org/project/+travesty/)

[3]
[https://en.wikipedia.org/wiki/The_Lord_of_the_Rings_Online](https://en.wikipedia.org/wiki/The_Lord_of_the_Rings_Online)

[4] [http://www.lotro.com/en](http://www.lotro.com/en)?

[5] [http://abcnotation.com/](http://abcnotation.com/)

