
Deep Learning for Guitar Effect Emulation - teddykoker
https://teddykoker.com/2020/05/deep-learning-for-guitar-effect-emulation/
======
fab1an
Pretty cool, though I wonder what the latency of this would be if used as a
plugin?

The author says it works in real-time, but to non music/audio folks this could
mean '100 ms latency is real-time enough, right?'

Generally, I think the audio VST business is a really fun space to be in for a
lifestyle business, as it is way too small to be attractive for VCs. It seems
like a space that provides many niches for lots of small players to thrive in.

As an aside, it's really quite interesting that a lot of cutting edge tech is
now used to emulate the hardware-based tech of yesteryear. Think film filters
for photoshop, and about 90% of all audio plugins that emulate high end
hardware, compressors, pedals, etc etc.

~~~
whiddershins
Do solo or small shop vst plugin developers make any money?

I’m curious if anyone has any direct knowledge about that.

There are so many professional activities similar to that where no one makes
any money and people really just do it for the love, and then there are
seemingly similar things like that where people make surprisingly large
amounts of money.

~~~
abaga129
I'm fairly new to the game, but I'm a solo developer. Currently I dont make
enough to quit my day job, but it is a nice supplementary income, and it's
nice to get paid a bit for something I truly enjoy.

There are also several solo/small shop developers that do make a living from
selling plug-ins. Here are a few that I can think of off the top of my head.

Auburn Sounds: [https://www.auburnsounds.com/](https://www.auburnsounds.com/)
Valhalla DSP: [https://valhalladsp.com/](https://valhalladsp.com/) Kilohearts:
[https://kilohearts.com/](https://kilohearts.com/)

~~~
fab1an
what's your link?

~~~
abaga129
Cut Through Recordings:
[https://cutthroughrecordings.com/home](https://cutthroughrecordings.com/home)

------
svantana
End-to-end modelling is very enticing for the lazy engineer, unfortunately
parameter control (knobs) are an important feature of most audio effects, and
sampling enough of the parameter space will become prohibitive for more
complex effects. That's why the traditional approach is divide-and-conquer.

Also, I don't think this approach won't work well with time-varying effects
such as chorus, although I'm happy to be proven wrong.

~~~
fxtentacle
Sadly, I believe you will be proven correct.

What that neural network learns is basically an approximation of a static
impulse response. So while it can simulate linear time-invariant effects such
as reverb quite nicely, it'll surely have issues with chorus.

~~~
sk0g
Reverb is time invariant? You can set custom decay time, rate etc, so the one
not can be heard for say, 10 seconds if you want to go full Devin Townsend.
I'd think Chorus would work better.

I wanted to do a very similar project, but with an overdrive. Let's see if I
get time anytime soon!

~~~
InitialLastName
>Reverb is time invariant?

You might want to familiarize yourself with [0]. Time-invariance is a specific
property of a system, where the output (for any given input) has no dependency
on if the input signal happens now or 1 second from now or 100 years from now
(except for the corresponding delay). Most reverb models are, to a first
approximation, time invariant, because the effect will have the same sound for
the same guitar line, no matter when you play the line.

Chorus, on the other hand, has a (perhaps subtle) modulator to get that warbly
(scientific word!) sound. It doesn't feel like a time-based effect, but it
certainly is and that makes it quite a lot more difficult to mimic with a
system that (as others have noted) boils down to an impulse response.

[0][https://en.wikipedia.org/wiki/Time-
invariant_system](https://en.wikipedia.org/wiki/Time-invariant_system)

~~~
TheOtherHobbes
Impulse response reverbs (Bricasti, etc) are based on time-invariant
convolution.

Studio reverbs famously aren't, and some of the most popular models (notably
Lexicon) have included time-variant algorithms since the late 70s. The
processing power to handle IR convolution didn't exist, and it turned out some
time variation added lushness and density to the sound that simpler models
couldn't capture.

Modelling a chorus or time-variant reverb with any form of convolution -
including any convolution-based neural net - is a complete waste of time,
because most chorus algos are trivial and convolution is completely the wrong
tool for the job.

It's literally about as useful as taking a still picture of a 90 minute movie.

------
mrob
This isn't bad, but the note decays sound noticeably different. My guess is
that the NN doesn't know that human ears have non-linear response that makes
them more sensitive to errors in the decay than the attack, so it treats them
equivalently. If this is the case then it might be fixable by using
logarithmic scale audio samples instead of linear.

The non-linearity of the ear is frequency dependent[0], but in practice I
suspect it would be sufficient to pre-process the linear PCM data with
x=sqrt(x) and undo before playback with x=x^2.

[0] [https://en.wikipedia.org/wiki/Equal-
loudness_contour](https://en.wikipedia.org/wiki/Equal-loudness_contour)

~~~
rubatuga
Why square root and not log?

~~~
mrob
Cheap and dirty fast calculation. I don't actually know what the best mapping
is, so I'd start with this.

------
munificent
I'm not an expert on machine learning or DSP, but I do know just enough of
each to suspect this isn't anywhere near as impressive as it seems.

A distortion pedal is essentially just a waveshaper [1]. Think of audio in
digital terms as just a series of numbers. A waveshaper is just a simple
mathematical function. To apply it, you literally just apply the function to
each value in the input stream and there's your output stream. There's no
memory or interesting algorithms going on. It's the audio equivalent to
calling map() on your list of samples with some lambda to produce a new list
of samples.

Of course distortion pedals do that in the analogue domain using circuitry,
which has some additional complexity because transistors and diodes and
friends don't behave exactly like mathematical functions. There's "sag" and
some other physical effects that cause the output to also somewhat depend on
previous input.

Even so, that can generally be modelled using a simple convolution. Each
output sample is calculated by taking some finite number of previous input
samples, multiplying each of them by a weight factor, and then summing the
results.

Does that sound like a neural net? It is. That's what we call them
_convolutional_ neural networks. Convolution is bread and butter in DSP. You
can easily generate one that produces the same effect as some piece of
hardware or acoustic environment by running an impulse (a single 1.0 sample
surrounded by silence) through the system and then recording the result. That
"impulse response" essentially _is_ your set of convolution weights.

So using a _deep_ neural network and then _training_ sounds a lot to me like
overkill to me. You could accomplish much the same by using a "depth-1
network" and running an impulse through it.

Caveat, though: I am just a novice here, so there could very well be a lot of
subtlety I'm missing out on.

[1]:
[https://en.wikipedia.org/wiki/Waveshaper](https://en.wikipedia.org/wiki/Waveshaper)

~~~
dontreact
I believe you are are vastly oversimplifying this.

An impulse response will characterize only a system that is

* linear

* time-invariant

Many effects are not linear (especially distortion: the crunchiness comes from
the nonlinearity). f(a) + f(b) != f(a+b)

And many effects are time varying, for example phasers and choruses which have
low frequency oscillators controlling how the sound is shaped depending on
when it comes in. Chorus for example will vary the pitch up and down.

~~~
aerospace_guy
Yup! This covers the basics of control theory; a simple concept that most
don't understand.

------
jelling
> We find that the model is able to reproduce a sound nearly indistinguishable
> from the real analog pedal.

Maybe for the average person or buried in the mix, but the audio samples were
easy to distinguish for me as a guitarist. The NN samples unnatural decay were
a dead give away.

~~~
jefftk
Not really a guitarist, but listening to them I couldn't hear a specific
difference. Yet I still liked one of them more. And when I clicked "reveal"
that one was the real one, turns out.

~~~
zuppy
the real one has longer fading tones, the one generated by machine learning
cuts the sound abruptly.

it seems easy for me to differentiate them and I’m a beginner with guitars (~1
month, so I’m your average Joe). it’s pretty good though, I’m sure it can be
improved greatly.

------
Tade0
Sounds great and I had to listen to both of the samples to guess correctly.

That being said the Tube Screamer is a somewhat simple effect: it's just a
distortion with the clipping diodes moved to the feedback loop.

How possible would it be to get the famous A/B class amplifier voltage sag and
associated changes in parameters of the whole amplifier, or in other words
"will it chug"?

~~~
cesaref
I think this would be very possible - there was quite a bit of discussion of
using NN techniques for modelling fx discussed at DAFx2019
([http://dafx2019.bcu.ac.uk/](http://dafx2019.bcu.ac.uk/)). There are a number
of papers discussing different techniques in the paper archive.

Many of the techniques discussed were variations on image processing -
transforming the input to the frequency domain then converting this to an
image, and applying standard techniques to transform the image, then back to
the time domain. There are many compromises with this approach (loosing phase
information for example) but with a suitable overlap/add the results were
better than I expected, and certainly there's room for further investigation
to see if there's useful stuff in there.

Another time domain approach that was applicable to your amplifier model
question was an attempt to determine hidden variables in a circuit. Basically,
the circuit under test is examined, and rather that build a spice model (which
can be laborious) the technique was to expose the interval voltages following
components with memory (so capacitors for example). These outputs were
included in the NN training model, and so in effect the normally hidden
internal state was exposed and allowed for a very good approximation.

Here's the paper:

[http://dafx2019.bcu.ac.uk/papers/DAFx2019_paper_42.pdf](http://dafx2019.bcu.ac.uk/papers/DAFx2019_paper_42.pdf)

~~~
Tade0
Thank you very much.

Do you know if there will be a DAFx2020? That would make it the first
conference in years that I would really want to attend.

~~~
cesaref
Unfortunately not, it's been delayed. DAFx2020 was due to be in Vienna, and
i'm assuming they are still planning on being there, but it's scheduled to be
in 2021.

It's a great conference, well worth attending. It's heavy on the maths, but
that's DSP for you!

------
wintermutestwin
"many purists argue that the sound of analog pedals can not be replaced by
their digital counterparts."

Truly effective modelling of analog pedals, tube amps and guitar cabs has been
around for years and is way more cost effective from the bedroom to touring
bands.

The "purists" are hipsters who value the rarity of some pedals, massive
pedalboards and their tube amps. I'm not knocking them - I understand why
there is a nostalgia factor and tweaking dials is cool. As a computer guy
though, I much prefer the ability to make things like this in my bedroom:
[https://i.imgur.com/OqMoBxz.png](https://i.imgur.com/OqMoBxz.png) And when I
want to tweak a dial, I program an expression foot controller to tweak any
parameter (or multiple).

All that said, great to be looking at modelling techniques...

~~~
selykg
I am by no means a musician or an experienced one at that. I tinker and enjoy
playing and learning. But I have limited experience overall.

My personal experience with electronic tools is the lack of feel. Can I make
music with digital tools like AxeFX and similar? Absofreakinglutely. No doubt
about it.

But those digital tools feel VERY different to me than the real thing. I'm not
just talking about a speaker moving air, though that's certainly part of it.
My tube amp simply responds differently than any digital model of a similar
amp.

I find tools like the Kemper to be amazing, but they're just a snapshot of an
amp in a particular configuration in a particular room.

From a technical standpoint, all this modeling stuff is super cool. But it
doesn't feel the same at the end of the day and this is a personal opinion and
preference on my part.

I look forward to the day that I can get an amp in a pedal (like the Strymon
Iridium) and it behaves the same as the real amp. I think Fender's Deluxe
Reverb (Tonemaster model) is as close as it has ever gotten, but it very
specifically emulates a single amp and does so within a real amp cabinet
rather than pushing it out to an audio interface.

Anyway, anything that gets people playing guitar is, in my opinion, a great
thing. We live in a golden age of guitar equipment. I don't think it can
honestly get much better than it is right now. It's an amazing time to be a
guitar player and incredible options are available at amazing prices.

~~~
renaudg
>Anyway, anything that gets people playing guitar is, in my opinion, a great
thing. We live in a golden age of guitar equipment. I don't think it can
honestly get much better than it is right now. It's an amazing time to be a
guitar player and incredible options are available at amazing prices.

It sure is a great time for guitar equipment, as the digital revolution has
made its way there too.

But being a guitar player is also increasingly lonely :
[https://www.washingtonpost.com/graphics/2017/lifestyle/the-s...](https://www.washingtonpost.com/graphics/2017/lifestyle/the-
slow-secret-death-of-the-electric-guitar/)

And it's arguably an opportunity cost for a kid to be pouring so much effort
today learning the iconic (but tired) instrument of the boomer generation,
when they could be breaking new musical ground instead, mastering Ableton's
Push for instance. But to each their own, of course.

~~~
wintermutestwin
In the late 80s it sure seemed like the guitar was doomed. On one hand there
was new wave with its synths and on the other were the guitar "gods" who
wanked on with amazing technical precision making amazingly pedantic music.
Then in the 90s the guitar and rock was reborn and suddenly cool again. It
will come back and it won't be tired anymore. There are so many things that
the guitar has barely hinted at in the past that will resurface as innovation.
In the meantime, you can learn the guitar AND new tech. Besides, there will
always be the draw of impressing a member of the opposite sex at a party by
picking up a guitar. You just don't have that with "check out my latest drum
programming, etc...

------
317070
That is very cool. Though, part of the pedal are of course the knobs. You'd
need to condition the wavenet on the knobs. Did that work well (I assume that
you tried that already)?

Also, what is the inference latency on your model? A nice thing about analog
guitar effects is that they are blazingly fast.

------
ericfrederich
So this seems similar to an IR (impulse response) where you get a snapshot of
an amp mic'd up in a room with knobs fixed at a particular position. In the
end, you don't get knobs to fiddle with.

Awesome, I'd love to hear Josh from JHS Pedal's opinion on this.

~~~
ratww
This is even more impressive since regular IRs can't duplicate the distortion
effect itself, only the frequency response

~~~
munificent
What is the difference between "distortion itself" and "only the frequency
response"? Are you saying the phase response is important?

~~~
ratww
Impulse responses can only represent linear time-invariant systems. Like
delays, reverbs, equalization curves.

Distortion is non-linear, it is something like a _max(-1, min(1, input))_
function (a waveshaper, like you said), and it produces harmonics when applied
to audio signals.

However guitar pedals also have some additional circuitry to "sweeten" the
distortion, removing the extra harmonics added by the clipping diodes.
Tubescreamers are notable for cutting bass and enhancing mids. An IR is able
to capture this. This is important for guitar pedals, and the reason multiple
of them exist.

If you capture the impulse response of an overdrive pedal you'll be capturing
only the frequency response of a distorted impulse. If you process clean
guitar trough this you'll simulate the frequency response but not the
distortion itself, so it will just be a clean guitar with a tinny, shrill,
sound, not an overdriven guitar sound.

One way around it (other than the idea in this article!) is doing multiple
passes of Impulse Response capture with different amplitudes, this will
capture this distortion non-linearity. This is supposedly how a Kemper
Profiler works.

------
sdenton4
It has been said that if we achieve the ability to fully simulate the universe
from initial conditions, the first application will be creating a perfect
recreation of Marvin Gaye's Roland 808 drum machine in a 1982 performance.

------
dharma1
Here is the original paper from 2019 by Eero-Pekka Damskägg-
[https://research.aalto.fi/en/publications/realtime-
modeling-...](https://research.aalto.fi/en/publications/realtime-modeling-of-
audio-distortion-circuits-with-deep-
learning\(5f5abe8a-9875-40a0-8e27-39109077f4e3\).html)

It was also published as a realtime JUCE project, which might be more useful
for actual (realtime VST/AU) use:

[https://github.com/damskaggep/WaveNetVA](https://github.com/damskaggep/WaveNetVA)

Alec Wright has done more work on this since then, using it for amplifiers:

[https://www.aalto.fi/en/news/deep-learning-can-fool-
listener...](https://www.aalto.fi/en/news/deep-learning-can-fool-listeners-by-
imitating-any-guitar-amplifier)

And time variant effects:

[https://github.com/Alec-Wright/NeuralTimeVaryFx](https://github.com/Alec-
Wright/NeuralTimeVaryFx)

------
TrackerFF
Isn't this essentially just learning the case of learning one function, with
set parameters?

I.e, if you want to build a complete model of the tubescreamer, you'd
essentially have to train a model for each possible setting on the pedal - or
in other words, every combination of the knobs.

Sounds like a real chore, if you were to actually do that physically - and in
the end, don't you just want to learn the impulse response of the circuit?

I know some tools - like the Kemper modelling gear, are made for that exact
purpose, and with extremely convincing results.

~~~
Scene_Cast2
Not quite. As long as the knobs make consistent changes, just feed some large
amount of tests and the model should generalize (smartly interpolate) the
rest.

What I do have a problem with is that if the pedal is already implemented
digitally, then all the human interpretability, along with the classic DSP
machinery, is thrown out the window. A better approach would be to build the
pedal via a differentiable programming language and then try to gradient
descent toward some analog "can't get this juicy tube sound digitally"
variant.

~~~
ben7799
The knobs actually don't behave linearly on a tube screamer. Even the "tone"
knob (EQ) doesn't behave at all linearly like you might expect out of consumer
audio gear. Tube Screamers have an S-curve potentiometer in use for that knob.

That would be part of the problem with this approach.

Also with this approach you pretty much have to train the model with a near
infinite collection of guitars in front of the model and a near infinite
number of other effects turned on and off in front of the model.

~~~
Scene_Cast2
The knobs don't have to be linear at all, just differentiable - that's the
beauty of ML.

As for the collection of guitars and samples - not necessarily, it would
depend on how you set up the training.

------
baylessj
Excellent writeup, I love seeing real engineering applied to guitar pedals
rather than black magic tone chasing.

I'd be really curious to see if the model could be expressed as a transfer
function and compared to the schematic for the pedal. The Tubescreamer is a
fairly simple circuit but the mystery surrounding it indicates that there are
some weird variables at play with the component properties that would lead to
additional factors in the transfer function. Wonder if those variables could
be identified somehow.

~~~
hashkb
The "weird variables" may have to do with the various changes in manufacturing
over the years. "Tube screamer" refers to at least 10 different units. Maxon,
Ibanez, TS9, TS808, and zillions of clones.

------
willis936
A neat approach for sure. I am more interested in SPICE style modeled VSTs
though. There's no need to throw ML at a simple math problem to get a bad
approximation. I have not found many VSTs that seem like they're doing proper
simulation of analog circuits. The VST space is filled with people claiming
awesome results, but never revealing the sauce. If you're making a convincing
sounding zener limiter, what are you actually doing? There are a dozen
different levels of approximations you could make. Shouldn't a VST that is
really simulating the analog circuit advertise that? On paper it should be
easy, right? I've sat down with pen and paper to try to write out a simple
input/output equation for a zener limiter circuit and I decided it was
probably more worth my time to just plop a zener SPICE model into some
language that could evaluate expressions and compile to VST (or use a systems
of equations solver).

And then there's the real holy grail of analog simulation: the tube amplifier.
I'm not sure SPICE models really capture the limiting behavior of tubes very
well. You might need to implement the spec sheet in code. All fun sounding
problems, and I'm not sure anyone has even done them yet.

~~~
dsharlet
Funny you mention SPICE to VST compilation... It was on my list for this (my)
side project but I never got around to it:
[http://livespice.org/](http://livespice.org/)

edit: And a Tubescreamer is one of the examples!

------
fallingfrog
I think that whereas most guitar effects are really very simple (gain and
clipping, or delaying the signal and adding it back in), this approach will
probably work just fine.

But, it is sort of using a sledgehammer where a tap from a spoon will do- the
original tube screamer is just an op amp and a couple diodes, plus a bit of
eq! Not much to it.

Plus, your real problems are going to be noise level (tube screamers in
particular are noisy but a discrete transistor distortion can be made very
very quiet). your a/d converter, your power requirements (comparable analog
distortion effects use a few milliwatts) and cost.

Edit: But that said, this is a super cool project! Good job! Sorry I just
realized that what I wrote was kind of negative.

------
exabrial
Pretty cool! Is this how Kemper amplifiers work when they do a capture?

~~~
ratww
AFAIK Kemper performs multiple passes of impulse-response capture, all at
multiple signal levels in order to model non-linearities (like distortion).
This is called dynamic convolution. [1] [2]

There are other ways to do that, like Volterra Series, used by Nebula plugins
[3]

[1]
[https://www.uaudio.com/webzine/2004/july/text/content2.html](https://www.uaudio.com/webzine/2004/july/text/content2.html)

[2]
[http://www.sintefex.com/docs/appnotes/dynaconv.PDF](http://www.sintefex.com/docs/appnotes/dynaconv.PDF)

[3]
[https://en.wikipedia.org/wiki/Volterra_series](https://en.wikipedia.org/wiki/Volterra_series)

------
ZoomZoomZoom
For anyone planning to try this, don't forget about impedance matching and use
a transformer/active reamper. Some pedals may react very differently.

------
mgamache
It would be interesting to see how this responds to dynamics. For example, a
favorite guitar sound is a fuzz cranked, but with the guitar volume turned
down. This results in a compressed dirty sound that can overdrive into
distortion if you hit the strings harder (attack).

------
ben7799
I play guitar and own a tube amp & a tube screamer.

All of this sounds horrible.. it doesn't even sound like his input is an
actual guitar, it sounds like he's using a synth guitar sound or something.
There's no dynamics, almost no sustain, no articulations. The outputs barely
even sound distinguishable as a guitar through a tube screamer, even his
actual tube screamer samples. (Possibly cause his interface is terrible?)

The conclusion is ridiculous given how simplistic everything is.

You can't use two tiny little clips to justify your model being high quality.

The true test has to even allow a bunch of guitarists to move all the knobs,
plug the model into different amp & guitar combinations, put other effects in
front of and behind it, etc..

The Tube screamer is called a Tube screamer because it's intended use case is
to make the tubes in a tube amp "scream". Using it with all the knobs at noon
is not consistent with this, it usually gets used with a tube amp that is
already on the verge of distortion, and then you use the TS with the volume
turned up a lot (3/4-max) and the gain quite low, this might be part of why
this sounds so bad to me.

There are actually two different trains of thought on guitar effect modeling:

\- Model it based on input & output waveforms like he's doing

\- Actually model the circuit as an electrical simulation and then pass the
signal through that.

I have personally found the second approach to be way more realistic and
satisfying. The Yamaha THR amps work this way and they're really amazing.

One of the tricks here is a listener might not be able to tell a difference,
but the guitar player picks up on a perceived change in how the guitar feels
with these effects. A tube screamer has a lot of compression built into it for
example. It causes everything to play to sound a little dirtier for the same
amount of picking energy you put into the guitar. It will cause the player to
play a little more lightly than they would without the effect. This is the
kind of thing that makes a player reject the model and want to stick with the
real thing, whereas the guy in the naive lab building the model thinks it's
great cause they're not even playing an actual guitar through it. Once a
skilled player tries it the "feel" is a dead giveaway which is which.

It's easy for some of this stuff to get lost on the electronics crowd if the
background is electronic music. An actual acoustic piano is the only keyboard
based instrument that has anywhere near the nuance that a guitar has, and a
guitar still has way more weird stuff going on with dynamics and articulation.
The range of inputs you have to feed into any kind of computer model to
simulate guitar well is huge.

------
EamonnMR
Add the ability to train on arbitrary effects as inputs and this will a best-
selling VST for whoever can make it first.

------
saadalem
This is actually impressive, I'm wondering if we could transfer the smartphone
mic to a high quality one with AI

------
veenkar
B-but a simple convolution would do the same. Or for faster operation - a
transfer function obtained using least squares method. NN is kinda overkill
for this, but it's cool POC anyways ;)

~~~
ssalazar
Nope- Tubescreamer is non-linear so a simple transfer function won't do it.

------
hashkb
Trey Anastasio of Phish famously uses 2 stacked tube screamers. (And so do
many of us phans). He deserves to be mentioned because more notes have hit
audience ears through his screamers than anyone else's.

Also, the modern TS9 isn't exactly right. I'd love to see this work applied to
vintage vs current TS vs modded units.

