Hacker News new | comments | show | ask | jobs | submit login
Open Source Neural Network Synthesizer (withgoogle.com)
95 points by burningion 6 months ago | hide | past | web | favorite | 23 comments



I'd be more excited if this weren't so tame. The Nord Modular had genetic algorithm patch mutation well over a decade ago, details starting on page 99.

http://www.nordkeyboards.com/sites/default/files/files/downl...

The Hartmann Neuron took a similar approach with neural networks in 2003: https://www.soundonsound.com/reviews/hartmann-neuron

I mean, well done and everything, it's a good project, but Synthesize Brand New Sounds In Ways Never Before Possible!!! is a pitch that synth users hear year after year (pun intended). It turns out that musicians don't like black box patching all that much, but prefer morphing things in parameter space because being musicians they want to interact with their instruments, whether that's timbrally, melodically, or harmonically.

Electronic musicians in particular don't need More Sounds or even More Oscillators and More filters and More FX - sure, those are interesting, but honestly people are already spoiled for choice. What people like most is an instrument whose timbral range may be limited but which has a strong center - secondary characteristics remain largely consistent as primary variables are manipulated, so oscillators don't thin out at higher or lower ranges, filter Q negative feedback gain isn't damped so aggressively that it changes gain structure and so forth. The nicest thing an electronic musician can say about an instrument is not 'it can make so many sounds' but 'you can't get a bad sound out of it.'


I agree with most things you say, but "you can't get a bad sound out of it" is pretty subjective, especially when it comes to synthesis. So how would one define the training set of "good sounds" for supervised training?

One thing I thought would be pretty cool - I've got a friend who did a phd on physical modeling of sound in Edinburgh - http://www.ness.music.ed.ac.uk/. Often those physical models have hundreds of parameters and it's quite difficult to tune them to make meaningful/good sounds. Perhaps a neural network would be useful for tuning the parameters - you could use a real musician playing a real instrument with sensors, and use the generated sensor data and audio for your training set. And then the neural network could learn to match the physical model parameters to that.


It's absolutely subjective, eg as a simple example I love the 303 and you may hate it but adore some other model. But I generally understand a comment like to mean both that the person likes the timbrality of a particular instrument, and that one won't 'lose' the sound of a patch while tweaking.

I'm not sure how to articulate this. I have synths that I can program in my head because I know the architecture really well and when I imagine a sound I can walk over and dial it in from the front panel and get more or less what I expected, plus I can then tweak the results with abandon and for musical satisfaction. Then I have others (sometimes from the same manufacturer which emit all manner of nice sounds but are far harder to program and easily veer off into sonic mush - technically impressive but not really fun to play.


Yeah I know what you mean. The "sweet spot" on some synths. Just don't know how to pose that as a supervised learning problem, or what it would do (limit the range of parameters?) I'm not sure the amount of parameters in normal substractive synth+filter+mod architecture is large enough for very interesting results that you couldn't achieve otherwise, or how you would generate a training set that can produce meaningful mapping.

I also think wavenet sample by sample generation and interpolation between latent features doesn't sound that exciting, as cool as it technically is.

We'll find some place to use machine learning in music/audio eventually :) I think perhaps more natural sounding pitch shifting could be one area (since you could learn the structure of sound of different instruments at various pitches), reverb removal, denoising, polyphonic audio to midi - things like that, where you have obvious training data.


You should check the other projects of magenta, they did something like you describe with piano using 1400 training pieces. Now they only need user feedback to find out which outputs of the network are garbage ... oh wait, maybe they are already doing that, who knows.

Well, they are probably not tuning 100 instrument parameters but only the play characteristics like accent, velocity and so on.


Neat! Like a Kaoss pad DIY sample 2d crossfader running on rpi3.

This is 2 Parts, a high end computer that analyses (with ML and Neural magic!) some source waves and outputs blended samples that you can put on a 2d grid, and for these generated waves a simple sample player (made with openFrameworks) running on rpi3 that mixes the waves depending on your xy position.

However it doesn't sound interesting or good for what they show, they probably need a better demo without any roland classics. Their Bass / Piano mix sounds mushy and essentially represents the most boring average synth sound i could imagine. The most interesting thing is the flute/snare crossover that is buried in the overladen promotional fluff video.

Would be nice to hear a demo that really puts out the 'new' neural sounds.

edit: the essential 15 seconds of the video here: https://www.youtube.com/watch?time_continue=100&v=iTXU9Z0NYo...


They missed the boat on being first with NN synthesis by about fifteen years.

http://www.vintagesynth.com/misc/neuron.php

(IMO it's depressing that Google don't appear to know this.)

Musically the sound is a lot less interesting than the engineering is. In fact it's a perfect demonstration of why you can't just throw NNs at a problem and expect to get something useful out.

Musical sounds - even synthesised musical sounds - tend to cluster around certain perceptual parameter sets. If you don't know what those parameter sets are - and they're not just frequency distributions, or envelope shapes, or waveform sequences - your model will tend to generate sounds that are perceived as musically trivial and/or uninteresting.

By a strange coincidence, this was the problem with the Hartmann Neuron. There was some very clever technology inside the box, but the sounds had none of the quality that made it a must-have for musical production. It shipped a few hundred units and then disappeared.

That quality is a very elusive thing. Some synth companies, like Roland, have been very good at capturing it. But if you ask their designers what they're aiming for, it's unlikely they'll be able to tell you. Even more strangely, that quality sometimes appeared in products apparently by accident, when they were abused to make sounds that were an accidental twist on their original design.

...Which would be a convincing argument for cultural preference if it weren't for the fact that many of the classic products that were abused in this way were made by Roland.

All we know is that musicians respond to that quality when they hear it. Unfortunately for engineers, sounds that have that quality can have very little in common with each other. So there's unlikely to be a statistical process that can engineer "good" sounds with a high hit rate.


"You will also need the following Open NSynth Super-specific items" super specific indeed.

If someone is interested in machine learning and music, I'd send them to: http://wekinator.org which is actually a research project rather than a marketing campaign one off, and can be setup, run, and played with in a matter of minutes.


If you want to actually play with it Magenta (another Google group I think) who provided the actual musical sauce has released a plugin for Ableton and for MSG plus there's the browser experiment.

https://experiments.withgoogle.com/ai/sound-maker

https://magenta.tensorflow.org/nsynth-instrument


I wish I was a little more excited, the results honestly sound rather like what a Yamaha TG-33 outputs, or any other wavetable synth where you can crossfade between two sounds.

I love the idea of using neural networks to find new sounds and possibilities, but for some reason the NSynth project just doesn't hit it for me. Would love to be convinced otherwise.


Neat. Really needs someone to go ahead and mass produce. I assume Google realized the market is too small for them to worry about, but if someone could build them in bulk I'm sure they would find an audience of people willing to pay a decent price.

My guess is there are not a lot of people who could both a) build this in a short amount of time and b) find practical uses for it.


> I assume Google realized the market is too small for them to worry about, but if someone could build them in bulk I'm sure they would find an audience of people willing to pay a decent price.

Definitely underestimating the market if that is the case. My gear acquirement syndrome is already triggered.


I seem to recall that NSynth and Wavenet are only operating at 16Khz mono, or perhaps it was even lower. Are we now able to generate full 44.1Khz sound?



Aphex Twin did it better:

https://fo.am/midimutant/

Not that I mind having this sort of technology being promoted by the likes of Google (casts glance at two 19" racks full of synthesisers), but I think I'd prefer to go with Mr. D James comes up with over the corporate bread maker path ..


The two projects are in completely different playgrounds. The midimutant is device that controls synths via midi, while nsynth is a synth (surprise!). And yes, it would be fun to use them with each other, though it would certainly take a bit of work on both sides (and is currently impossible because the midimutant source hasn't been released).


In what way is Aphex Twin's project better?


He cares because we do, while Google is a faceless uncaring corporation.


thats a very subjective way at looking at it.


It was a joke, referencing an Aphex Twin album: https://en.wikipedia.org/wiki/...I_Care_Because_You_Do


Distortion was "discovered" by misuse of technology. I'm sure it would never have been invented / discovered on purpose. Digital technology can't be abused the same way analog technology can.


> Digital technology can't be abused the same way analog technology can.

I assume that is sarcasm


Too lazy to log into github: where is the nsynth-generate that is referenced in in the repo in audio/readme.md? Not in the repo in any case and no link either ... but a hyperlink to tmux is given. Mixed up priorities!




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: