
Shazam: not magic after all - mariorz
http://blog.revolution-computing.com/2009/10/shazam-not-magic-after-all.html
======
tptacek
Don't I feel vindicated:

[http://news.slashdot.org/comments.pl?sid=7310&cid=823710](http://news.slashdot.org/comments.pl?sid=7310&cid=823710)

(from 2000).

I remember this because I got in a big argument with someone about whether
this could possibly work. Of course, I never got off my ass and implemented
it, which I guess makes me a huge loser.

~~~
axod
Quite funny anyone would argue it wouldn't work. It's pretty similar to speech
recognition algorithms which have been around for a while now.

------
allenbrunson
quote from the article:

"Unfortunately, there's no indication in the paper of what software was used
to develop the process (although the scatterplots in the paper do look
decidedly R-like)."

I used to work with Avery Wang, the guy who devised the algorithm. He used
Matlab.

~~~
las3rjock
Here's an independent MATLAB implementation of the Shazam algorithm:
<http://labrosa.ee.columbia.edu/matlab/fingerprint/>

------
transmit101
The article and comments seem to suggest that the use of Matlab or R is a
prerequisite for performing calculations such as this. However, MIR (Music
Information Retrieval) libraries exist for a number of languages, including
Java[1] and Ansi C[2], amongst others. A good dynamic language for
experimenting with this sort of thing is SuperCollider[3].

By the way, the psycho-acoustically spectral measurements referred to in the
article are called MFCCs[4] - basically an FFT reading weighted according to
the sensitivity of our ears. They are often used in both music and
(especially) speech recognition because they tend to accurately sum up the
timbre we perceive in a given sound. Timbre is much easier to extract from a
digital audio file than pitch or vocal information, hence why it tends to be
successful in applications such as this.

Shazam is still pretty cool too

[1] <http://jmir.sourceforge.net/>

[2] <http://libxtract.sourceforge.net/>

[3] <http://supercollider.sourceforge.net/>

[4] <http://en.wikipedia.org/wiki/Mel-frequency_cepstrum>

------
mhansen
This is exactly how you identify chemical compounds using X-Ray
Crystallography. You shine x-rays of different frequencies onto a compound,
measure the magnitude of the reflections, noting down the 3 highest peaks.

Then, you look up those peaks in a book, which has compounds ordered by the
wavelength of the highest peak.

It takes minutes to do it by hand, I'm not surprised computers can do it
better.

~~~
aarongough
That's very interesting! I have a friend who worked on a project that was
using this process but I never knew the name of it!

------
joezydeco
Aww man, when I read the headline I was expecting a SpinVox-like scandal. Like
a room full of idiot-savants in Bangalore that knew every pop hit for the last
50 years or something.

~~~
jyothi
heh. It is actually fascinating how strong crowd sourcing can be. You would
logically opt the manual root & surprisingly lot of times esp for cost and
simplicity of solution

5-6 hrs of a good developer == 1 month of 3 ops in India or elsewhere. Except
that getting ops to work itself can be painstaking.

------
jyothi
An interesting app.

Airtel, a leading telecom provider in India had a SongCatcher service long
back (3 yrs ago)
[http://www.techtree.com/India/News/Catch_a_Catchy_Song_with_...](http://www.techtree.com/India/News/Catch_a_Catchy_Song_with_Airtel/551-77435-663.html)
I never tried it - may this one worked for a predefined set of songs.

~~~
niyazpk
Here is what they say in the website:

 _"Specifically, a fixed length of audio is converted to audio DNA; this
conversion process extracts certain features from the signal based on the
psycho acoustic considerations. The system has two components, one that
enables the extraction of Audio DNA from a few seconds of recording, and the
other is an efficient search engine that finds the exact match for the DNA.

The audio DNA is based on extracting 64 sub DNAs every 3 seconds. The sub DNAs
are generated by looking at the energy differences along the frequency and
time axes. These 64 sub DNA form the chromosomes of the system, which enables
the system to uniquely identify the chosen song."_

Looks to me like pretty much the same technology. This is again not a
surprise. Most implementations of this idea will be using similar techniques.
What I am amazed at is that somebody thought that all this was feasible.

~~~
axod
It's just a variation on speech recognition, which has been around for a
while.

------
eric_t
It would be much better if you could hum or whistle a tune, and it would
recognize it. I saw a PhD thesis once about this, with an actual
implementation that worked pretty well. The only problem was that the database
of songs was very small. It's probably hard to scale this type of search.

~~~
dimbo
The majority of people doesn't have an ear for music to correctly repeat
melodies :(

~~~
eric_t
Exactly. But this algorithm was fuzzy, so it gave you a list of the songs
which most closely resembled the one you tried to sing.

~~~
dimbo
I tried to use different "query by humming" services but the percentage of
false positives (when a service produced a list of melodies and no one matched
yours) was really huge. And even if the melody was in the list and you tried
to find it again with the same service (by humming the same tune) the
probability of getting it in the list again was pretty low.

Anyway, I think the idea of query by humming is not a dead end. However, such
a hypothetical service should somehow collect and use a database of different
"hums".

------
thomasswift
I was working on a little side startup that used crowdsourcing to help ID
songs, I was just getting into researching how programs like shazam and midomi
worked until I killed the project. His paper and the way it works is quite
nice, but it's not perfect for other rare music and songs without elements
that really stand out(frequencies or otherwise like house music). Thanks for
the link!

------
bockris
Reminds me a little bit of

<http://astrometry.net/>

here is an overview:

[http://cosmo.nyu.edu/hogg/research/2006/09/28/astrometry_goo...](http://cosmo.nyu.edu/hogg/research/2006/09/28/astrometry_google.pdf)

It's fun to read about but way over my head mathematically.

------
headShrinker
I have often sat in coffee shops wondering what method of data extrapolation
Shazam used to parse audio to be able to search it's music db. I would think
about how I would do it. I use Shazam all the time so it's nice to finally
know the basic idea.

------
Anon84
<http://news.ycombinator.com/item?id=908201>

------
JoeAltmaier
Now recognize people! Or cars, or engine problems, or birds...Rats! I missed
the yc deadline by 1 day!

------
anonjon
I disagree, any type of programming is magic.

I'm tired of these software 'engineering' types who insist that computers are
run by using 'maths' and 'numbers' (whatever those are).

Clearly, computers are run by aphasic tonally-separated spinning disks. These
disks fire puffs of air out the sides of the computer, creating little tiny
tornados, which summon air spirits to call the fire spirits, which causes the
screen to light up and the keys to make tappy-tap noises.

anyway. to be clear: not statistics. not math. not regulated pulses of
electrons. MAGIC!

~~~
amohr
In my experience, the people who are most informed about how a computer
actually works are those most convinced that it runs on magic. All my compE
friends insist that cpus are maintained and operated by tiny gnomes.

While less technical people don't understand, they have 'faith' that there's a
logical, scientific explanation for how computer's work.

