
What makes Bach sound like Bach? New dataset teaches algorithms classical music - huan9huan
http://www.washington.edu/news/2016/11/30/what-makes-bach-sound-like-bach-new-dataset-teaches-algorithms-classical-music/
======
savanaly
The question of what makes Bach sound like Bach is, needless to say, not
addressed.

The actual thing they're reporting is:

'“You need to be able to say from 3 seconds and 50 milliseconds to 78
milliseconds, this instrument is playing an A. But that’s impractical or
impossible for even an expert musician to track with that degree of accuracy.”

The UW research team overcame that challenge by applying a technique called
dynamic time warping — which aligns similar content happening at different
speeds — to classical music performances. This allowed them to synch a real
performance, such as Beethoven’s ‘Serioso’ string quartet, to a synthesized
version of the same piece that already contained the desired musical notations
and scoring in digital form.

Time warping and mapping that digital scoring back onto the original
performance yields the precise timing and details of individual notes that
make it easier for machine learning algorithms to learn from musical data.'

It also mentions that they attempted to apply existing deep learning
algorithms designed for speech recognition to their new dataset, hoping to be
able to accomplish a task such as predict a single missing note from a long
string of notes. It does not say whether this worked.

~~~
jthickstun
Hi savanaly,

When we talk about "what makes Bach sound like Bach," the technical concept we
have in mind is the recent work in computer vision on style transfer. For
example,

[https://arxiv.org/abs/1508.06576](https://arxiv.org/abs/1508.06576)

We are excited to work on adapting these models to the musical domain!

As for note prediction, you can see our results in our paper:

[https://arxiv.org/abs/1611.09827](https://arxiv.org/abs/1611.09827)

Our are results for simple (2-layer, not very "deep" models); we were
interested in understanding the low-level "features" of music rather than
building a model that maximizes performance. Nevertheless, the results are
quite promising; I'm confident that someone using our dataset with a deep
network and a lot of gpus could blow our numbers out of the water! :)

Tutorials on how to set up and evaluate this task are available on our
website:

[http://homes.cs.washington.edu/~thickstn/start.html](http://homes.cs.washington.edu/~thickstn/start.html)

~~~
dharma1
I'm an (ex-)musician playing with machine learning, this is very interesting,
will check it out! Kudos for curating the dataset. So your goal initially is
to basically build polyphonic transcription with CNN's?

I am starting to record my own dataset for solo jazz piano - all midi though.
Monophonic melodies, and matching chord voicings and voice leading from one
chord to the next. With the goal of learning to generate a good sounding jazz
piano arrangement to a given melody with nothing except monophonic input.

Style transfer is good at essentially texture transfer - I suspect it won't
work that well for understanding music theory (or text), especially with long
time series dependencies, but will be very curious to see what emerges.

I'd like to hear more generative music samples from DeepMind's WaveNet too,
the piano samples they published sounded very good, but it was unclear what
the model had learned or generalised - and how much was semi-randomised
recall. I haven't seen the open source implementations of WaveNet produce as
good results yet - probably because it's computationally very expensive to
train and run, and that limits experimentation. I saw Aäron give a talk on it
a couple of weeks ago which helped me understand the stacked dilated
convolutions - but would still like to hear more music examples :)

~~~
jthickstun
Yes, we're starting with the transcription task. CNNs for local prediction are
interesting, and we're also curious about capturing the temporal structure of
music with something recurrent. It seems like a time series model that
understands something about western music should help with music transcription
just like language models help with speech transcription.

The style transfer stuff comes later and as you observe, we'll probably need
some new ideas to make that work well. I haven't thought about this deeply
yet, but my intuition is that maybe instrumental timbre is an audio analog of
visual texture, so maybe a reasonably direct "port" of style-transfer to the
audio domain would let us construct demos that, for example, rewrite a cello
recording to sound like trombone.

Let us know when your dataset is complete! I love jazz.

------
ktRolster
Albert Schweitzer pointed out (in
[https://www.amazon.com/dp/0486216314](https://www.amazon.com/dp/0486216314) )
that in many cases it's hard to understand Bach's music without understanding
the lyrics. The mood of the music will change from cheerful to somber (or
whatever) seemingly randomly, but if you understand the lyrics, it's not
random.

~~~
gumby
Schweitzer may have been a thoughtful guy but I think this case is a reach;
after all the vast majority of Bach's work had no libretto at all (including
for example all the organ fugues, and the Goldberg variations and Brandenburg
concertos).

However a counterexample from my own life: I only learned German starting in
my late 20s and when, 10 years later I heard the Matthäus-Passion and could
understand the lyrics I wept...and I don't even know much about christianity.

~~~
Bud
Professional Bach singer here: member of American Bach Soloists, Philharmonia
Baroque Orchestra, Bach Collegium San Diego, Carmel Bach Festival.

It's actually not true that the "vast majority" of Bach's work had no texts.
Bach wrote over 200 cantatas (with around 5-10 movements and separate texts
each) plus an assortment of masses and 4 extremely large choral/orchestral
works: the St. Matthew and St. John Passions, the Christmas Oratorio, and the
B Minor Mass.

Looking at the catalog of all of Bach's works, which I have here, BWV (Bach
Werke-Verzeichnis) numbers 1 thru 1071, you get all the way to BWV 525 before
you even get out of the vocal works. Numbers 1 thru 524 are all cantatas,
masses, oratorios, lieder, the many chorales of course, secular and comic
cantatas, etc. And of course many/most of these are far larger than the
individual organ works.

(Bach actually wrote a lot more cantatas than this; but only around 224 of
them survived. Another hundred or so were lost.)

~~~
gumby
Thanks for correcting my hyperbole, which I suppose is due to my bias for his
keyboard works which I like to play (at home -- I doubt anyone suffer through
my playing). I do like the masses and passions though, so you are spurring me
to listen to more choral work!

~~~
Bud
Totally check out the cantatas; many of us feel that the cantatas are the real
heart of Bach's work. Bach was fundamentally a church musician.

~~~
sumpmonster
If you have an iPhone or iPad I would reommend the Bach cantatas app to learn
about Bachs cantatas:

[http://www.cantatasapp.com](http://www.cantatasapp.com) and
[https://appsto.re/de/HecH5.i](https://appsto.re/de/HecH5.i)

------
haberman
This is really interesting, but it confused me at first. Given the title and
the problem posed in the article's intro, I figured this would be a dataset of
sheet music, ie. the notes and durations specified in some printed music.
However, reading more it appears to be focused on recordings (ie. audio) and
annotating those recordings with information about where each note starts and
stops.

So to me this seems more directly applicable to transcription (ie. taking
audio and turning it into sheet music) or synthesis (taking sheet music and
turning it into audio of a human-sounding performance) than it does to
composition or finishing unfinished works by famous composers. The output of
the compositional process is generally sheet music, not audio, so it seems to
make more sense that problems around composition would be trained and learn in
the sheet music domain.

I'm not a machine learning researcher though! This is just my impression as a
musician.

~~~
jthickstun
Hi haberman,

I'm one of the authors on this paper. You're right that the most direct
applications of this dataset are transcription and synthesis. One of the cool
aspects of end-to-end learning models is that they discover a "representation"
of data that can be useful when applied to other tasks. We speculate about
some tasks like recommendation and composition on our website:

[http://homes.cs.washington.edu/~thickstn/musicnet.html](http://homes.cs.washington.edu/~thickstn/musicnet.html)

We're also interested in music like jazz and pop, for which good scores are
often unavailable. Classical music is nice for training models because we can
use sheet music as labels to learn a representation. Many aspects of this
representation, such as rhythm and harmony, may transfer to other musical
genres. Learning about classical recordings could bootstrap learning for other
kinds of musical audio.

So while you're right that it's probably easier to learn a model to complete
Bach using symbolic sheet music, we feel that addressing complex tasks
directly from raw audio is worthwhile!

------
haberman
This news reminded me of this gem, from back in the day (I think 1996-ish):

[http://www.markheadrick.com/midi/absmfaq.txt](http://www.markheadrick.com/midi/absmfaq.txt)

In section 1.4 they very emphatically state that "with current technology, IT
CAN'T BE DONE."

They conclude: "Think of it this way: If you don't mind spending more than the
US national debt on computer equipment and waiting a few years for the job to
complete, you can have a system that MIGHT accurately convert the digital
waveform data of a 5 minute song into a small, compact MIDI file.

Otherwise, you can blow a couple of thousand dollars hiring a professional
band of studio musicians and engineers who can probably give you what you want
in about one day."

It is humorous for its emphatic-ness, but also educational for being a picture
into how we've historically thought about this problem.

------
pierrec
This announces a new dataset where recorded performances are precisely
synchronized to MIDI transcriptions. Obviously the article doesn't seem to get
the implications quite right (it's very useful for performance-related
research, not so much for AI composition).

As a composer, the coolest potential I see here is training a model to create
realistic mockups from MIDI compositions. For that purpose, though, it would
be better to start with a fully monophonic/solo-instrument dataset, which
would simplify the learning. Also, MIDI data is not entirely sufficient:
annotations on dynamics and playing technique would be necessary to make a
good mockup tool, since this is the kind of information one might even give to
human performers.

Anyways, it would be tough for such a tool to catch up with current state-of-
the-art, sample-based mockup tools, which are already baffling in their
realism, although they usually require a lot of work to get good results. But
one can always dream of a "Stokowski" or "Karajan" neural network that
interprets your MIDI composition with emotion and sensibility!

------
mrcactu5
i ran into a few issues trying to study classical music with a computer. first
of all is merely putting in some representation of the musical score into a
file. This was accomplished by MIDI but I am hoping for a more standard way
that looks more like the notes of a score.

Another problem is once you have the music there's a tremendous amount of
"interpretation" that a musician does. the nodes may each read 1/8 but a
musician might add or subtract 1/64 has he/she feels is good.

other times the change is more mathematical 1/8+1/8+1/8 might have to actually
be read 1/12+1/12+1/12 = 1/4 but that is much easier to fit into a computer

I have said nothing of dynamics (loud/quiet), articulation (stoccatto,
slurring etc).

scores are available in IMSLP and other sources. but are computer files
available as well?

~~~
ktRolster
You might look at GNU Lillypond, which might be the type of representation you
are looking for, since it can be made to look like the notes of a score.

Here are some collections of music in that format:

[http://www.mutopiaproject.org/](http://www.mutopiaproject.org/)

[https://github.com/trending/lilypond](https://github.com/trending/lilypond)

------
gattilorenz
It is interesting to note that in 1990 there was an expert system composed of
a myriad of handmade rules that could produce Bach-like harmonizations.

[http://www.global-
supercomputing.com/people/kemal.ebcioglu/p...](http://www.global-
supercomputing.com/people/kemal.ebcioglu/pdf/Ebcioglu-JLP90.pdf)

Unfortunately I can't seem to find the samples now, but to my (untrained) ear
they sounded as Bach as the real thing.

------
Gaussian
Professor David Cope of UCSC has done extensive work in this space, starting
with his EMI algorithm. His algos + DBs have created some in incredible music
in the Bach style.

------
lalos
Relevant project to train a model that generates Bach music

[http://bachbot.com/](http://bachbot.com/)

