
How Shazam Works (2015) - dayve
http://coding-geek.com/how-shazam-works/
======
nateguchi
For those interested in audio fingerprinting, There was a company called Echo
Nest that open sourced a full audio fingerprinting stack (server + client)
called Echoprint [1] but it seems they've been bought by Spotify now and
there's been no development since on echoprint.

I used it back in the day and I recall it working very well.

There's also a paper on the technology [2]

[1]
[https://github.com/search?utf8=%E2%9C%93&q=echoprint&type=](https://github.com/search?utf8=%E2%9C%93&q=echoprint&type=)

[2]
[http://mediatechnology.leiden.edu/images/uploads/docs/wt2015...](http://mediatechnology.leiden.edu/images/uploads/docs/wt2015_echoprint.pdf)

~~~
gourou
Sounds like a killer tool for identifying similar songs, I'm sure it'll become
a big part of their recommandation engine.

------
nakedrobot2
Shazam, for me, (maybe Maps is a close second) the most magical app on my
phone, the _thing from the future_ that I would never have imagined being real
if you told the Me from 20 years ago. It is sometimes so fast, under 5
seconds. Just wonderfully great. It makes me feel superhuman. It really is one
of those extra sensory powers that we have recently gained with these
universal gadgets in our pockets.

~~~
parkaboy
It really is a marvel in its simplicity and reliance on "older" concepts
(frequency transforms, hashing, etc.) as opposed to any crazy ML (AFAIK) --
surprisingly not _that_ cutting edge. And I'm not knocking that / there's
nothing wrong with the age of a technique. If it's useful, it's useful. It's
incredible. It's a perfect example of bringing together several previously
somewhat disparate domains of math with just the right application and an
Internet-connected smartphone. In that regard, it's a bit crazy to think that
the math to enable Shazam has probably been around for...several decades? Just
took a stroke of ingenuity to bring it all together.

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=9870408](https://news.ycombinator.com/item?id=9870408).

~~~
fergie
I just followed that link, which led to a page containing another "previous
discussion" link, and then a couple more, and ended back on a slashdot article
from Monday August 28, 2000!!

Shazam has always been a pretty amazing product, especially when you consider
that it was first released nearly 20 years ago. It has also been fairly poorly
understood, so I am glad that people are taking the time to revisit and review
the technology behind it.

------
noipv4
Question for Sound Engineers / DSP engineers: Is it a good idea to use
Cepstrum signature and then machine learning for music recognition?

~~~
parkaboy
"Good idea" is a tough subjective question to answer. Could you do better than
FFT/DCT/frequency-based transform? Maybe/possibly given that music tends to be
quite "harmonic." Taking a step back, when trying to classify, the best
features are ones that separate your classes out better. Certain
representations / changes thereof can lead to better separations depending on
the the properties of your input source.

The cepstrum is good for providing energy compact/sparse representation of
harmonic features. This is why it's used (/was used) a lot in speech
recognition. Speech sounds tend to have harmonic properties (see: formants).
Frequency-based transforms tell you how much of a frequency
(repeating/periodic signal) is present. If you have harmonics, those can
sooort of be thought of has repeating patterns in the frequency domain. So
taking a frequency transform of a frequency transform (which is super loosely
a cepstrum) gets you a nice compact (separable) representation of inputs that
tend to have harmonic features.

Most music tends to be pretty damn harmonic... so maybe?

Also an argument to be made that if you have a big enough network of the right
kind that's just taking in windowed time domain data (almost surely involving
recurrence), it might not be surprising that you could find some cepstral-like
stuff naturally pop out.

------
IshKebab
I assume they use some kind of deep learning these days. Does anyone know?

~~~
ariaghora
I'm not sure. Though, IMO, using deep learning is too much if we can achieve
desireable result just by using the shallow/simpler one.

~~~
mindhash
Agree. But it will be interesting to apply auto encoder pattern or generate
note vectors

------
andrewmcwatters
I'm wondering if one could create an audio fingerprint of a song but using a
sound file that doesn't sound much like the original.

