> Because the WaveNet decoder is computationally expensive, we had to do some clever tricks to make this experience run in real-time on a laptop. Rather than generating sounds on demand, we curated a set of original sounds ahead of time. We then synthesized all of their interpolated z-representations. To smooth out the transitions, we additionally mix the audio in real-time from the nearest sound on the grid. This is a classic case of trading off computation and memory.
The synth performer in me is longing for a traditional ADSR-type of interface, such that I can emphasize the attack and delay of a bowed-string instrument, sustain like a reed, and release like a horn.
Perhaps that's achievable with Galaxynth?
For musical sound generation I think there is still room for new discovery with physically modeling sounds, like these guys are doing - http://www.ness.music.ed.ac.uk/. You can morph between physically modeled instrument sounds too, and it's all realtime. Complex physical models tend to have too many parameters for human performers to control so maybe neural networks could be used there to learn to control the params and their interaction in a meaningful way for musical sound production.
Also, since WaveNet is computationally so intensive, training and re-constructing sounds sample by sample, at 16k per second at 8bits turns out to be pretty heavy going - let alone 44khz/16bit. So to make it realtime this implementation is basically interpolating in a grid of pre-rendered wavetables to morph between instruments, right?
I understand that it is hard to encode the sound into these parameters and get near perfect decoding, but maybe that's the next step?
In other words: I was hoping to hear a musical composition or performance using samples generated by WaveNet.
I think a better, more realistic title for the article would be more like "Generating Experimental Sounds Using WaveNets" or similar. Even the authors inclusion of animal sounds in the generation made this more of a novelty and not really useful for actual musical sound generation.
Would be interesting to compare this approach with other "Less neural-net-y" approaches, like combining samples in the frequency domain and mixing samples over time.
Music synthesis, on the other hand, is an unsolved hard problem. For example, producing harmonic accompaniment to a hummed melody, or rendering a given chord sequence in different musical styles, or making a drummer that can pick up where someone's beatboxing left off..any of these have the potential to reach a huge audience.
Synthesizing real instruments basically says 'we know nothing about this space and are going to ignore all the works that's already been done in it.'
I disagree. Most instrument synthesis techniques that I've heard don't yet approach real physical instruments. I say let them try using these techniques to an old problem and see if they can get any improvements.
Automated musical expression and improvisation are also active areas of research. (One interesting example: https://www.youtube.com/watch?v=qy02lwvGv3U - though I wonder how much it is listening and responding vs how much is pre-programmed.)
Shameless plug for my RNN trained piano music generator which can do exactly that:
It can also generate novel chord sequences and compose to those.
If you really want to see the interesting stuff going on with software synthesizers, I'd strongly suggesting checking out Steve Duda's Serum wavetable synthesizer, or Sonic Charge's Synplant, a "genetic" synthesizer.
Steve Duda's elite master class is also super fascinating, if you've got the time! 
you can certainly make weird sounds with existing synths, but interpolating rhythmic sound with a harmonic sound is different to me in that the resulting thing is more rooted in a musical context and can work with other non-neural elements more easily.
for example, once you get some sort of intuition for how sounds might meld, you could compose a "beat" made up of samples (maybe drum sounds, maybe not) in the "left" side that is tailored to interact in certain ways against the "right" (i'm referencing the UI in the abelton video).
people might trade their "seed" sounds, or they might keep them close to the vest!
probably you could use max msp
to do stuff like this already but i'm imagining that the "left" sound itself being thought of like an intuitive signal processing algorithm.
it's like second order sampling. you can find pieces of audio and, rather than use them directly, as today, you can create a third sound that probably can't be deconstructed back to the original.
might not birth a top-level genre like sampling did hip hop, but i think once someone puts it together the right way, and once processing power allows them to go beyond some the limitations described, it will really open some new avenues
I don't want to just piss on this, of course any new technology is interesting. But synthesizing novel timbres is just not a big deal in 2017. 'Just imagine what's possible' is still a great marketing line, but anyone waiting on some new technology to make sounds that nobody has ever heard before is suffering from a failure of imagination rather than a limitation of technology.
think of it this way, I could probably hop over to my local biohacking lab and find some way to map audio data onto DNA, modify it, and read it back out again using CRISPR. It would definitely be possible to encode audio information in DNA form. You know perfectly well it won't automatically give you more 'organic' or 'natural' sound despite the novel fact of doing the computation on a biological substrate, and you also know perfectly well that it would be marketed that way, just like almost every other synth is marketed on the basis of its wild creative possibilities.
It's like showing off your new graphic manipulation software with a picture resembling the Mona Lisa. You're selling some basic tools, but people are buying into the idea that having the tools will endow them with increased artistic ability. In reality everyone likes the new tool or filter you've come up with, it spreads rapidly to the point of over-familiarity, and then becomes fairly standard in future toolkits after the novelty has worn off.
This is really a kind of morphing. You can capture examples of each kind of sound with sampling, but you can't capture the performance morphing. Even if you could, there's no good way to perform the morphing with a typical synth keyboard, which only allows for velocity and maybe aftertouch - possibly poly AT for a handful of models.
So these huge sample sets have started using rule-based systems to try to add the morphing, or at least to make sample choices, in a context-sensitive way. This kind of works, up to a point, but it's not as good as the real thing.
As a side effect, sampling has driven jobbing composers, especially in Hollywood, towards an industry standard mechanical and repetitive orchestral sound.
It sounds orchestra-like, but it's a narrow and compressed version of all the colours an orchestra is capable of. If you compare it to the work of master orchestrators - Ravel, Stravinsky, Puccini - it's not hard hear just how flat and colourless these scores are.
A good ML model of an orchestral instrument would be a very useful thing, because it would make it possible to think about breaking out of the sampling box. But there aren't enough people with enough of a background in both ML and music to make this likely.
Sadly, I think it's more likely we'll get even more compressed and narrow representations, with even more of the subtlety and expressive range removed.
1) Performance morphing. We have moved from straightforward sampling to hybrid sampled/synthesized approaches. It will never be as good as the real thing, but it already allows for richer performances than what a boring player would do. Here is an example of a virtual clarinet (Sample Modeling Clarinet). I sequenced many variables separately to demonstrate: vibrato depth, vibrato speed, legato and portamento speed, growl, pressure and accent on the attack.
2) Extended techniques. Competition has encouraged virtual instrument publishers to go for the unusual stuff, and fill whatever niche hasn't been filled yet. For example I recently acquired a library specialized in extended cello technique (Jeremiah Pena Mystic). I used it in the soundtrack of a no-budget short film, here's an excerpt of the cello part:
Anyways, I agree that Hollywood soundtracks have been converging to standardized styles, and sampling may be to blame historically, but it is hardly a limiting factor anymore. If anything, it should now encourage creativity as it partly removes the fear of wasting massive resources when your experimental score ends up sounding like crap at the recording session.
Tony Zhou's Every Frame A Painting  offered a take on how the tendency is to work very closely to a temp track and then ask for something identical, which of course can only get you increasingly similar sounds. Dan Golding responded to this by adding some nuance, noting that temp tracks have always been in use, so the answer has to be a little more complicated, and he points back to the technology.  I would say that the technology is just a piece of the puzzle; you can order in a different type of sound and get it, whether or not you're using a computer-heavy approach. That's aptly demonstrated by the variety seen in indie games, for example. This is a problem that movies have made for themselves by being focused on fitting everything to a formula. The occasional film does slip through that has a great score that draws on something bigger than other films(for one example: Scott Pilgrim vs the World).
This algorithm works wonderfully for guitars (there are improvements to it too). There was a synth 20 decades ago, if not more, I believe, that mimicked all of the wind instruments extremely well (you had a pipe through which you could blow and a keyboard to choose the sounds).
Unfortunately the whole area seems to be abandoned due to patents on algorithms.
Looks wonderful that WaveNets are producing something so well, although the sound still needs improvement.
I find pure physical modeling has been stagnating though, hybrid approaching with sample-based synthesis seem more promising right now. This is what Sample Modeling has been doing and their results are impressive.
The wind synth was Yamaha VL1 - quite expensive at the time but very expressive. Not sure what happened to the patents on that, but there have been other physically modeled synths since then.
Stefan Bilbao's group at University of Edinburgh is carrying the flag on next-level physical modeling, though I think the funding for the project has come to an end unfortunately.
There was cool work on physically modeled instruments done in Finland by Vesa Välimäki et al, not sure what the state of that is now.
Yes, it was VL1.
Sadly does not sound new at all.