
Making a Neural Synthesizer Instrument - jessejengel
https://magenta.tensorflow.org/nsynth-instrument
======
svantana
This is interesting, although the results are underwhelming IMO. I actually
had the same idea - finding the latent space of instrument sounds and using
that for synthesis - a couple of years back. After countless hours of research
I managed to turn it into a commercial software instrument called "GalaXynth"
[1]. For me at least, it turned out that the "automatic" latent space (i.e.
discovered by autoencoding) isn't that interesting musically, therefore I
turned to hand-designing the latent space, which was a gargantuan task that
I'm not sure I would do over again if I knew how hard it would be. Anyway if
anyone is interested in this type of thing you should get in touch!

[1]
[https://heartofnoise.com/products/galaxynth/](https://heartofnoise.com/products/galaxynth/)

~~~
tchaffee
No Linux version? :-(

~~~
svantana
It's on my todo-list! But as I understand it from other devs, the market for
commercial audio software on linux is pretty small.

------
dharma1
I mean, it's a cool project, but I'm not sure the latent space between
statistically learned sound representations is that interesting for music...
For speech synthesis I can see the applications.

For musical sound generation I think there is still room for new discovery
with physically modeling sounds, like these guys are doing -
[http://www.ness.music.ed.ac.uk/](http://www.ness.music.ed.ac.uk/). You can
morph between physically modeled instrument sounds too, and it's all realtime.
Complex physical models tend to have too many parameters for human performers
to control so maybe neural networks could be used there to learn to control
the params and their interaction in a meaningful way for musical sound
production.

Also, since WaveNet is computationally so intensive, training and re-
constructing sounds sample by sample, at 16k per second at 8bits turns out to
be pretty heavy going - let alone 44khz/16bit. So to make it realtime this
implementation is basically interpolating in a grid of pre-rendered wavetables
to morph between instruments, right?

~~~
MrBuddyCasino
I think Yamaha made a physics based synthesizer. Only a few were produced
because they were expensive and hard to master. Can't remember the name
though.

------
imbusy111
The idea is cool, but the execution is lacking. My expectation is that if you
turn the slider fully to one or the other side, you would get a perfectly
clear representation of that instrument. In reality, you get this dirty
synthetic sound no matter what and the result of blending them together is
always "dirty" and they all sound similar in the end.

I understand that it is hard to encode the sound into these parameters and get
near perfect decoding, but maybe that's the next step?

~~~
pishpash
I think they just compressed too much due to computational constraints. They
say that themselves. However, there is always a question of rate control in
these autoencoder methods, and also the error function. In the original paper
they don't seem to use a very good perceptual error function.

------
smlacy
Biggest missing part of the article is the "Music" promised in the title.

In other words: I was hoping to hear a musical composition or performance
using samples generated by WaveNet.

I think a better, more realistic title for the article would be more like
"Generating Experimental Sounds Using WaveNets" or similar. Even the authors
inclusion of animal sounds in the generation made this more of a novelty and
not really useful for actual musical sound generation.

Would be interesting to compare this approach with other "Less neural-net-y"
approaches, like combining samples in the frequency domain and mixing samples
over time.

~~~
sctb
We reverted the submitted title “Make music with WaveNets” to that of the
article.

~~~
smlacy
Wow, thanks

------
anigbrowl
'Neural audio synthesis' really misses the point. This is just the same
technique applied to emulating real instruments, which is already a massively
well solved problem, and unlikely to result in any significant improvement.

Music synthesis, on the other hand, is an unsolved hard problem. For example,
producing harmonic accompaniment to a hummed melody, or rendering a given
chord sequence in different musical styles, or making a drummer that can pick
up where someone's beatboxing left off..any of these have the potential to
reach a huge audience.

Synthesizing real instruments basically says 'we know nothing about this space
and are going to ignore all the works that's already been done in it.'

~~~
jay-anderson
> This is just the same technique applied to emulating real instruments, which
> is already a massively well solved problem

I disagree. Most instrument synthesis techniques that I've heard don't yet
approach real physical instruments. I say let them try using these techniques
to an old problem and see if they can get any improvements.

Automated musical expression and improvisation are also active areas of
research. (One interesting example:
[https://www.youtube.com/watch?v=qy02lwvGv3U](https://www.youtube.com/watch?v=qy02lwvGv3U)
\- though I wonder how much it is listening and responding vs how much is pre-
programmed.)

~~~
anigbrowl
Inability to get a particular sound you want in a well-equipped audio
production studio is not a problem in 2017.

~~~
jay-anderson
I have never heard emulation of a physical instrument that matched the real
thing. For instance movie studios hire musicians to fill in full orchestras
for that reason. If they could get the same result without the musician I'm
sure they would.

------
Mizza
Seems like a neat-ish idea, but the resulting samples are very meh. Not high
fidelity sound.

If you really want to see the interesting stuff going on with software
synthesizers, I'd strongly suggesting checking out Steve Duda's Serum[1]
wavetable synthesizer, or Sonic Charge's Synplant[2], a "genetic" synthesizer.

Steve Duda's elite master class is also super fascinating, if you've got the
time! [3]

1:
[https://www.xferrecords.com/products/serum](https://www.xferrecords.com/products/serum)

2: [https://soniccharge.com/synplant](https://soniccharge.com/synplant)

3:
[https://www.youtube.com/watch?v=MOUkI5hH2HY](https://www.youtube.com/watch?v=MOUkI5hH2HY)

------
disantlor
this is awesome. this is the kind of thing that spawns entire genres of music
that didn't exist before. a lot of the demos sound terrible but as the
modulation and drum examples allude to, it's going to be super weird and
interesting when more disparate sources are interpolated. I can't wait to
figure out how to "break" this.

~~~
anigbrowl
If you think hard enough you can probably come up with 10 music tech projects
that had similar awesome promise and never delivered on it. Really, it's not
like we're short of ways to make new weird timbres or timbres that are oddly
redolent of others..but if weird is all you need, you can just buy a modular
and be half-way to outer space already. As you know there, there's typically a
vast waste of sonically uninteresting space in between the sweet spots - one
reason I've become suspicious of synths whose primary claim is the broadness
of the sound palette, because that promises endless tweaking for ultimately
unsatisfying results.

~~~
disantlor
i think what is appealing to me is the idea of playing unexpected combinations
of sounds of one another. and it's the drum example in particular that really
caught me.

you can certainly make weird sounds with existing synths, but interpolating
rhythmic sound with a harmonic sound is different to me in that the resulting
thing is more rooted in a musical context and can work with other non-neural
elements more easily.

for example, once you get some sort of intuition for how sounds might meld,
you could compose a "beat" made up of samples (maybe drum sounds, maybe not)
in the "left" side that is tailored to interact in certain ways against the
"right" (i'm referencing the UI in the abelton video).

people might trade their "seed" sounds, or they might keep them close to the
vest!

probably you could use max msp to do stuff like this already but i'm imagining
that the "left" sound itself being thought of like an intuitive signal
processing algorithm.

it's like second order sampling. you can find pieces of audio and, rather than
use them directly, as today, you can create a third sound that probably can't
be deconstructed back to the original.

might not birth a top-level genre like sampling did hip hop, but i think once
someone puts it together the right way, and once processing power allows them
to go beyond some the limitations described, it will really open some new
avenues

~~~
anigbrowl
But you can do this already using tools like Tassman or one of the many
spectral convolution/resynthesis tools. And if you have sufficient money to
throw at analog or sufficient computing power to run a large digital modular
tools like Reactor (or any number of others) you can do so many wild things
with bandpass filters and envelope followers as modulation sources. I have a
Nord Modular sitting next to me and realistically that and a sampler offer
more timbral possibilities than can be explored in a single human lifetime.

I don't want to just piss on this, of course any new technology is
interesting. But synthesizing novel timbres is just not a big deal in 2017.
'Just imagine what's possible' is still a great marketing line, but anyone
waiting on some new technology to make sounds that nobody has ever heard
before is suffering from a failure of imagination rather than a limitation of
technology.

think of it this way, I could probably hop over to my local biohacking lab and
find some way to map audio data onto DNA, modify it, and read it back out
again using CRISPR. It would definitely be possible to encode audio
information in DNA form. You know perfectly well it won't automatically give
you more 'organic' or 'natural' sound despite the novel fact of doing the
computation on a biological substrate, and you also know perfectly well that
it would be marketed that way, just like almost every other synth is marketed
on the basis of its wild creative possibilities.

It's like showing off your new graphic manipulation software with a picture
resembling the Mona Lisa. You're selling some basic tools, but people are
buying into the idea that having the tools will endow them with increased
artistic ability. In reality everyone likes the new tool or filter you've come
up with, it spreads rapidly to the point of over-familiarity, and then becomes
fairly standard in future toolkits after the novelty has worn off.

------
jay-anderson
Synthesized musical instruments never sound quite right (the best I've heard
are the vienna symphonic library:
[https://www.vsl.co.at/en](https://www.vsl.co.at/en)). While that doesn't
appear to be the goal of this specific work, some of the wavenet approaches
seem like they could be used towards that end. Even if this requires rendering
the audio for an instrument slower than real time it would be a nice
achievement if it can improve the quality. (Studio musician jobs I think are
safe for quite a while still.)

~~~
TheOtherHobbes
Most instruments can make a wide range of different sounds, and players can
move smoothly between the different sounds by playing the instrument in
different ways.

This is really a kind of morphing. You can capture examples of each kind of
sound with sampling, but you can't capture the performance morphing. Even if
you could, there's no good way to perform the morphing with a typical synth
keyboard, which only allows for velocity and maybe aftertouch - possibly poly
AT for a handful of models.

So these huge sample sets have started using rule-based systems to try to add
the morphing, or at least to make sample choices, in a context-sensitive way.
This kind of works, up to a point, but it's not as good as the real thing.

As a side effect, sampling has driven jobbing composers, especially in
Hollywood, towards an industry standard mechanical and repetitive orchestral
sound.

It sounds orchestra-like, but it's a narrow and compressed version of all the
colours an orchestra is capable of. If you compare it to the work of master
orchestrators - Ravel, Stravinsky, Puccini - it's not hard hear just how flat
and colourless these scores are.

A good ML model of an orchestral instrument would be a very useful thing,
because it would make it possible to think about breaking out of the sampling
box. But there aren't enough people with enough of a background in both ML and
music to make this likely.

Sadly, I think it's more likely we'll get even more compressed and narrow
representations, with even more of the subtlety and expressive range removed.

~~~
pierrec
Modern virtual instruments are capable of much more than what standard
Hollywood soundtracks might make you think.

1) Performance morphing. We have moved from straightforward sampling to hybrid
sampled/synthesized approaches. It will never be as good as the real thing,
but it already allows for richer performances than what a boring player would
do. Here is an example of a virtual clarinet (Sample Modeling Clarinet). I
sequenced many variables separately to demonstrate: vibrato depth, vibrato
speed, legato and portamento speed, growl, pressure and accent on the attack.

[http://007ee821dfb24ea1133d-f5304285da51469c5fdbbb05c1bdfa60...](http://007ee821dfb24ea1133d-f5304285da51469c5fdbbb05c1bdfa60.r16.cf2.rackcdn.com/SWAM%20clarinet.mp3)

2) Extended techniques. Competition has encouraged virtual instrument
publishers to go for the unusual stuff, and fill whatever niche hasn't been
filled yet. For example I recently acquired a library specialized in extended
cello technique (Jeremiah Pena Mystic). I used it in the soundtrack of a no-
budget short film, here's an excerpt of the cello part:

[http://007ee821dfb24ea1133d-f5304285da51469c5fdbbb05c1bdfa60...](http://007ee821dfb24ea1133d-f5304285da51469c5fdbbb05c1bdfa60.r16.cf2.rackcdn.com/part2_mcell.mp3)

Anyways, I agree that Hollywood soundtracks have been converging to
standardized styles, and sampling may be to blame historically, but it is
hardly a limiting factor anymore. If anything, it should now encourage
creativity as it partly removes the fear of wasting massive resources when
your experimental score ends up sounding like crap at the recording session.

~~~
buzzybee
It's not purely the sampling that's doing it to Hollywood: In some fashion,
it's the heavily-derisked blockbuster formula to blame, and technology comes
along for the ride.

Tony Zhou's Every Frame A Painting [0] offered a take on how the tendency is
to work very closely to a temp track and then ask for something identical,
which of course can only get you increasingly similar sounds. Dan Golding
responded to this by adding some nuance, noting that temp tracks have always
been in use, so the answer has to be a little more complicated, and he points
back to the technology. [1] I would say that the technology is just a piece of
the puzzle; you can order in a different type of sound and get it, whether or
not you're using a computer-heavy approach. That's aptly demonstrated by the
variety seen in indie games, for example. This is a problem that movies have
made for themselves by being focused on fitting everything to a formula. The
occasional film does slip through that has a great score that draws on
something bigger than other films(for one example: Scott Pilgrim vs the
World).

[0]
[https://www.youtube.com/watch?v=7vfqkvwW2fs](https://www.youtube.com/watch?v=7vfqkvwW2fs)
[1]
[https://www.youtube.com/watch?v=UcXsH88XlKM](https://www.youtube.com/watch?v=UcXsH88XlKM)

------
projectorlochsa
They should try to incorporate ideas of sound generation from samples which
were established long ago, patented and then the area completely forgotten.
Patents have since then expired.

[https://en.wikipedia.org/wiki/Karplus%E2%80%93Strong_string_...](https://en.wikipedia.org/wiki/Karplus%E2%80%93Strong_string_synthesis)

This algorithm works wonderfully for guitars (there are improvements to it
too). There was a synth 20 decades ago, if not more, I believe, that mimicked
all of the wind instruments extremely well (you had a pipe through which you
could blow and a keyboard to choose the sounds).

Unfortunately the whole area seems to be abandoned due to patents on
algorithms.

Looks wonderful that WaveNets are producing something so well, although the
sound still needs improvement.

~~~
pierrec
The list of software synths using and expanding on this algorithm piles up
pretty high and is still rising. Perhaps you don't hear the name a lot because
most of these plugins don't explicitly say so in their descriptions and
marketing (although many do). The more common term is physical modeling, which
basically implies Karplus-Strong and more advanced delay lines, waveguides,
etc. as underlying algorithms. For example, Applied Acoustics Systems has been
researching and publishing this kind of software synth for maybe 20 years.
Native Instruments has also made a bunch of stuff clearly using Karplus-
Strong, including patches for their popular Reaktor synth. Heck, I've made
plugins using this algorithm myself.

I find pure physical modeling has been stagnating though, hybrid approaching
with sample-based synthesis seem more promising right now. This is what Sample
Modeling has been doing and their results are impressive.

~~~
projectorlochsa
How does the performance of algorithms hold? I'm seeing all these old reports
of how it's computationally difficult, yet can't find any performance reviews
on new hardware. Not mentioning FPGAs or whatever else.

------
anotheryou
If you take synth sounds as inputs you can just mix the parameters without any
neural net and get the same result...

Sadly does not sound new at all.

------
artilect
Check out the artist TCF. He claims his songs are all algorithmically
generated using neural nets. Wicked sounds.

------
snissn
Very disappointed i cant blend cats and dogs

------
funkdubious
This is a little over my head. Could you please explain how to use it?

