
Single Tone Detection with the Goertzel Algorithm (2012) - jacquesm
https://www.embedded.com/single-tone-detection-with-the-goertzel-algorithm/
======
kragen
I've been playing around with single tone detection using the Minsky circle
algorithm. Without roundoff, they are equivalent, but Minsky tolerates
roundoff better, especially at low frequencies — you can think of Goertzel as
representing the phase or quadrature information as the difference _s_ [ _n_ ]
- _s_ [ _n_ -1], which obviously gets less bits allocated to it than the _s_ [
_i_ ] themselves, particularly near the peaks and at low frequencies.

Precisely, you can derive Minsky’s algorithm (modified for frequency detection
by adding the input samples into the register you produce the output from)
from applying Goertzel’s algorithm to the first backward differences of the
input signal.

The derivation is in Dercuano in notes/goertzel-minsky-prefix-sums.html. The
latest, and probably the penultimate, version of Dercuano is at
[http://canonical.org/~kragen/dercuano-20191215.tar.gz](http://canonical.org/~kragen/dercuano-20191215.tar.gz).

There are some tricks in there for windowing the computation over different
timescales, too.

~~~
jacquesm
Interesting, thank you, I will look into that.

The problem I'm working on requires something that will work well in the
presence of chords and even though I've made really good progress I'm still
about 4% away from SOTA, but 100's of times faster. Not quite ready to give
up.

edit: found a HN discussion on that:
[https://news.ycombinator.com/item?id=15266331](https://news.ycombinator.com/item?id=15266331)

~~~
kragen
Anything linear should work well in the presence of chords, I would think?
(Although maybe PLLs would work _better_ , depending on what you're doing,
despite being nonlinear.)

The HN discussion has some good information, but it also has some mathematical
errors.

Presumably you don't mean the School of the Arts or Summits on the Air (though
both involve detecting frequencies), so what is SOTA?

~~~
rjeli
State of the art

------
MobileVet
We made the first guitar tuner on a mobile phone in 2006 using the algorithm.
It ran like a champ on the Motorola Razr and other ‘dumb phones’ of that time.
We charged $25 for it and people loved it as it performed way better than the
$25 stand alone tuners. My testing showed it was equivalent to a ~$75 model at
the time.

I had a background in signals and controls and was familiar with the DFT.
Goertzel was a new variant to me but ran much better on the limited resources.

Good memories.

~~~
pierrec
I suppose that made sense because of the very limited resources, but note that
it's not ideal because it doesn't estimate pitch at all, it only tells you if
there's any content inside a frequency bin you provide. A guitar tuner is
expected to have an arrow pointing up/down to tell you which way you should
adjust, and this algorithm can't give you that. Getting the direction of that
arrow accurately can be challenging because of harmonic content (which guitars
have a lot of).

These days it would make more sense to use an algorithm that takes harmonics
into account and gives you an estimated fundamental frequency, like Miller
Puckette's pitch detection algorithm.

~~~
kragen
This is true. Without reaching for state-of-the-art psychoacoustics research,
though, even the second autocorrelation peak will take harmonics into account.
And you can compute it without the STFT latency, which matters here.

It _is_ possible to estimate where a signal is inside the frequency bin;
that's how phase vocoders work.

~~~
CamperBob2
Couple of ways to do it. If you don't want to deal with the latency required
to take the block-to-block phase difference (which somehow ended up with the
$50 nomenclature "phase vocoder"), you can just do a parabolic fit on the
magnitudes of the central bin and its adjacent pair. 99% of the time, the
latter is sufficient. With only three points the math is super trivial [1].

For sharp/flat indication, I'd be inclined to look at the adjacent-bin
magnitudes. But it's been explained to me numerous times that tuning an
instrument isn't just a matter of measuring its frequency, and while I can't
argue with that, I don't think I'm ever going to truly grok it.

1:
[https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation...](https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html)

~~~
jacquesm
> But it's been explained to me numerous times that tuning an instrument isn't
> just a matter of measuring its frequency, and while I can't argue with that,
> I don't think I'm ever going to truly grok it.

It depends on the instrument. If the instrument has only a few , single
physical elements per tone generated and the instrument is frequently tuned
then you are correct (say a guitar), there is nothing to be done beyond
setting the center frequency correctly depending on how you want to tune.

But tuning a piano or harpsichord is a different matter. First of all, there
are _many_ different ways of tuning, for instance there is 'stretch' tuning
which deals with inharmonicity (sp?) in the higher regions.

It gets more complex when you have multiple physical resonating elements (such
as strings), for instance depending on where you are on the piano keyboard you
might have one, two or three strings per note called a 'unison' (sound
together).

The strings in a single unison not only need to sound good right after tuning,
they need to continue to sound good together afterwards and that means that
all of them need to degrade gracefully over time as well and that in turn
means that they need to lower in tension in a similar way. Good tuners are
able to ensure that the string tension is distributed equally between the
'sounding' part of the string (the middle part) and the non-sounding parts
(the ends). The amount of tension on the strings and the friction of the pin
block and the pins (stick slip) can make this a lot harder than you might
think.

Finally, just turning the pins until the frequency matches is going to result
in a piano that de-tunes very quickly, the best approach is to go a bit too
high and then to slowly lower the pitch while hitting the keys quite hard so
that the tension of the string gets distributed evenly.

Added complexity is that the combined 18,000 KG of tension on the frame will
cause the tuning of one section of the strings to affect all the others. If a
piano is off very far from being properly tuned you may have to make multiple
passes to get the job done.

~~~
kragen
Even a guitar string has inharmonicity, so the pitch you measure by measuring
the lowest resonant frequency is somewhat too low: a sine wave at that
frequency will sound lower than the string does. I thought that was CamperBob2
meant. It's certainly what pierrec was talking about, and it's an advantage of
ACF-based pitch estimation algorithms.

~~~
jacquesm
Do you stretch tune guitars too then?

~~~
kragen
That's an interesting question; you'd have to do it either by fret placement
or pitch bending with finger pressure. In the latter case it would only happen
when you were playing chords with far-apart notes.

------
sevagh
Bizarre that people keep mentioning their "guitar tuners" that use Goertzel.
Are you looping over every single possible frequency? You should be using real
fundmental frequency estimation/pitch detection algorithms ([1], [2]) instead.

1\. [https://github.com/sevagh/pitch-
detection](https://github.com/sevagh/pitch-detection)

2\. [https://github.com/sevagh/pitcha](https://github.com/sevagh/pitcha)

~~~
MobileVet
This was 2006... on a dumb phone. It wasn’t ideal, but it worked quite well.

~~~
sevagh
So did you indeed loop over every possible frequency?

I know I attempted a Goertzel-based pitch detector once, where I looped over
frequencies and picked the one with the largest magnitude. I picked 25Hz and
4200Hz as my limits (from a piano).

------
jacquesm
This is a neat trick if you need to detect only a few tones out of the
possible range offered by an FFT, it essentially allows you to compute only a
single bucket with minimal overhead. As the article states as long as you run
it < log2(n) times where 'n' would be the size of the FFT required you should
come out ahead.

Unfortunately for my application it did not give an improvement but I thought
it was interesting anyway, and on embedded platforms something like this may
be your only option.

~~~
neonate
What is your application?

~~~
jacquesm
Automatic music transcription.

I've blogged about it here:

[https://jacquesmattheij.com/mp3-to-midi/](https://jacquesmattheij.com/mp3-to-
midi/)

That's almost two months ago, a lot has happened since then but it gives you a
general idea of what I'm up to.

~~~
sevagh
Interesting project. What do you think of
[https://github.com/sevagh/transcribe](https://github.com/sevagh/transcribe) ?
It seems like in your project the word transcription isn't referring to sheet
music transcription?

~~~
jacquesm
Well, you can go to sheet music if you want to but there are plenty of
packages out there that will display midi files as sheet music so that would
seem to be an unnecessary complication. The core problem is polyphonic music
-> midi and that is hard enough!

I will take a look at your software, it definitely sounds interesting, the
question at the end of
[https://github.com/sevagh/transcribe/blob/master/doc/snac.md](https://github.com/sevagh/transcribe/blob/master/doc/snac.md)
can be answered in the affirmative, the Onsets and Frames package mentioned
elsewhere in this thread does just that and works very well.

By the way, the linked paper is unavailable.

~~~
sevagh
OK, fair enough. A MIDI file would contain the musical information necessary
to generate any notation. Thanks for the reply.

I see indeed the link to the Mcleod pitch method paper is dead. I'll see about
hosting it in a git repo.

------
ajnin
I used this algorithm to detect the signal sent by heart belt sensors a few
years ago. Those belts send a quite weakly powered 5kHz "chirp" through a coil
for each heart beat. You're suppsed to detect those chirps via another coil
magnetically coupled and tuned with an LC circuit to the 5kHz resonance
frequency. Through a combination of a lot of analog amplification and this
algorithm I was able to implement detection on an AVR microcontroller, which
would never have been possible using FFT or similar techniques.

~~~
jcims
Did the ADC come with the microcontroller? That’s a pretty high sample rate.

~~~
jacquesm
[http://maxembedded.com/2011/06/the-adc-of-the-
avr/](http://maxembedded.com/2011/06/the-adc-of-the-avr/)

Has the built in ADC of the AVR at 15 K samples/second which should be enough
to process a 5 KHz signal.

~~~
jcims
Whoa. The capability in these little chips blows my mind. Nice to be able to
eliminate the LC passives with a bit of code. If the chirps are of adequate
length could probably sleep or sample other inputs a good bit of the time too.

------
riggsd
For many simple use cases, it is vastly easier to count zero crossings per
unit time. One period of a sinusoid contains a single positive-to-negative and
a single negative-to-positive transition. No DSP required, though you may
first bandpass the audio if you've got a target frequency range. Assumptions:
the fundamental is the strongest harmonic present, negligible DC offset.

Simple example: For a 1.0 second audio sample, if the sign changes from +/\-
or -/\+ 880 times, the frequency is 440Hz.

~~~
jacquesm
How would you adapt that method to more complex waveforms such as composites
resulting from interference and harmonics?

~~~
riggsd
Strong harmonics which overpower the fundamental will fool a zero-cross
analysis, resulting in, say, a double or triple frequency output if the 2nd or
3rd harmonic is strongest. The strongest signal present will overwhelm the
amplitude sign of the combined signal; so it still works even in the presence
of some noise or other signal.

High amplitude noise will also skew results resulting in a higher frequency
output. Low-pass or band-pass filter can work wonders.

Again, zero-cross analysis is suitable for simple use cases where you are
directly sampling a signal.

~~~
jacquesm
I wonder if you could take just the sign changes of the input signal and use
that in lieu of the zero crossings and get something useful out of it, for
instance, a finger print for chords or harmonics.

~~~
riggsd
Nope.

~~~
jacquesm
5 chars save me day of experimenting. Thank you!

Any suggestions on what might be fruitful avenues?

------
yters
DSP question: how do I detect signals, such as certain words, in a series? Do
I convolve a reference signal? And how do I remove such a signal from the
series automatically? Context is I want to remove eye blinks and other such
artifacts from an EEG.

~~~
stagger87
I don't know about EEG, but a standard way to perform this type of action is
to correlate your signal with a known signal (signature of a blink in your
case) and if the correlation result exceeds some arbitrary threshold you set
you have a match.

~~~
yters
Is it possible to whiten the series to remove the signal but retain other
signals occurring at the same time?

~~~
jacquesm
That is getting quite close to the way my software does chord detection one
tone at the time, so yes, that works. What I do is subtract the match from the
signal and then re-run the analyzer. This works well enough for signals that
are orthogonal, less well for signals that have a lot in common (such as
octaves or other close multiples of the original). Correlation is super
expensive numerically if you don't know the input phase by the way, you'd have
to hunt for the best match, if you can take it out of the time domain then it
will get a lot better.

~~~
yters
A bit of research suggests to me the best way to do what I want is convolve
the signal to remove with the series, and subtract the signal at the peak of
the convolution, if it passes the threshold.

Also found out that convolution can be calculated much more quickly than N^2
(my original concern) by using the Discrete Fourier Transform. The DFT of
convolved series is the same as pointwise multiplying the DFT of each series.
So, taking the inverse DFT of the pointwise multiplied series' DFTs will give
me the convolution with a cost of N log N instead of N^2, if I use the Fast
Fourier Transform.

I'll try this out and report back!

~~~
jacquesm
Ok, much good luck, I'm really curious how well it will work. btw, my email is
jacques@modularcompany.com, in case you want to take it out-of-band.

------
neuralzen
Tone detection has an interesting history to say the least.
[https://www.damninteresting.com/on-the-origin-of-
circuits/](https://www.damninteresting.com/on-the-origin-of-circuits/)

~~~
jacquesm
That's not exactly the genesis of the field but it _is_ an interesting story
about genetic algorithms and the bit that always got me was how there were
parts in there used in ways that no human designer would ever use them and yet
they were critical to its functioning.

"Dr. Thompson peered inside his perfect offspring to gain insight into its
methods, but what he found inside was baffling. The plucky chip was utilizing
only thirty-seven of its one hundred logic gates, and most of them were
arranged in a curious collection of feedback loops. Five individual logic
cells were functionally disconnected from the rest— with no pathways that
would allow them to influence the output— yet when the researcher disabled any
one of them the chip lost its ability to discriminate the tones. Furthermore,
the final program did not work reliably when it was loaded onto other FPGAs of
the same type."

Still amazing to re-read, and taking this into account it will be a long long
time before we can finally say that we have cracked the DNA code and all its
gory little details.

~~~
perl4ever
I always wanted to know what you would get if you developed the same thing,
but forced it to deal with different FPGAs during its evolution. Would it
still come up with something surprisingly inhuman?

~~~
jacquesm
That's an experiment worth doing.

------
xchip
FYI: I am using here the Goertzel Algorithm to help tune ANY musical
instrument.

[https://aguaviva.github.io/GuitarTuner/GuitarTuner.html](https://aguaviva.github.io/GuitarTuner/GuitarTuner.html)

------
cushychicken
What's the bandwidth of a Goertzel bin?

What's the 3dB bandwidth of the bin?

~~~
kragen
Anything you like. I mean, it precisely calculates the inner product of a
constant-amplitude complex exponential with the signal, so the tradeoff is the
same as with the FFT, except that you have the option of using a fractional
number of cycles.

In the Dercuano note I mentioned in my comment above, I go into some detail
about alternative ways to window.

~~~
cushychicken
So the bin bandwidth is about the same as an FFT bin of the same resolution?

Will check out that note.

~~~
kragen
No, not _about_ the same, _precisely_ the same, except for rounding error
(which is worse for Goertzel.) The numerical result you will get is precisely
the same. If bin 27 of your FFT over some segment of some signal is 4.557 +
6.288j, then Goertzel for that frequency and that segment will also give you
4.557 + 6.288j.

------
p5a0u9l
This seems rather niche. Most real world applications don’t provide the luxury
of knowing your desired response ahead of time.

Also, consider implementation in an FPGA where there are decades worth of
validated IP to select from vs. this.

~~~
vardump
> Also, consider implementation in an FPGA where there are decades worth of
> validated IP to select from vs. this.

Yeah, encrypted "intellectual property" that acts like a black box. Pass.

~~~
kragen
Unfortunately the electronics industry has adopted the propaganda term
“intellectual property” to mean “designs”, so that's what the parent is
referring to: decades worth of validated designs.

