

Implementing Shazam with Java in a weekend - haasted
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/

======
gkoberger
A bit of background on this post:

[http://yro.slashdot.org/story/10/07/08/2311225/Open-
Source-M...](http://yro.slashdot.org/story/10/07/08/2311225/Open-Source-Music-
Fingerprinter-Gets-Patent-Nastygram)

Basically, Shazam sent the author a cease and desist. The post was removed for
a bit, but it's now back up.

~~~
gigantor
Interesting that they sent a C&D considering the algorithm is likely patent
pending at best. Aren't there other services, like Midori, that likely use the
same 'obvious' algorithm of storing a database of frequency points per song
and comparing it to the sample?

~~~
j-g-faustus
Perhaps that's why the post is back up - I don't believe the bits and pieces
described in the article are patentable. (Fourier has been used in signal
processing for decades. Adding a nearest-neighbor lookup - also know for
decades - is a trivial extension.)

I did a master degree on pattern matching on text in 1999 (same year Shazam
started), and it was obvious then that much of the same concepts (many of them
from textbooks stemming from the 1970s) could be used for pattern matching on
music. (We even discussed implementing music matching, but couldn't see a
market for it.)

There may very well be other parts of music fingerprint algorithms that are
patentable, but I have a hard time believing that the parts described in the
article could be.

~~~
iansimon
One nontrivial part is transforming the spectrogram into some representation
that is robust to the things that can affect the query audio, like background
noise. Another nontrivial part is figuring out, given this representation, how
to quickly match the query with a song or database of songs.

------
StavrosK
This post only touches on the actual way that Shazam recognises music. The
meat of the algorithm is in the feature detection, which must be much more
robust than what the author did. His way is very sensitive to noise and maybe
other manipulations (e.g. dilation), I didn't take too good a look at it.

Basically, to get good features you need to find a set of candidate features
(points in the image that stand out) and then filter them according to some
desirable properties (in standard image recognition, for example, you want
them to be resistant to rotation, translation, stretching, etc). There exist
very good algorithms for finding these features, which is what Shazam uses. Of
course, after finding the features, you need to hash them in a way that is
able to match any time in the song, resist noise, etc etc, so it's a very
interesting application.

I was very surprised to learn that the best way to recognise a song was to
convert it into a picture and then try to recognise that picture. It seems
like a roundabout way to do it, but it works very well.

You can find more information in this paper: Viola and Jones, "Rapid object
detection using boosted cascade of simple features"

~~~
apu
While the paper cited by 'StavrosK is indeed a very important one in computer
vision (it's the basis for almost all modern face detection algorithms), it's
not the one most relevant for Shazam.

I don't know the details of what they're doing, but most likely they are using
something related to SIFT: [http://en.wikipedia.org/wiki/Scale-
invariant_feature_transfo...](http://en.wikipedia.org/wiki/Scale-
invariant_feature_transform)

This is another seminal work in computer vision, which solves both problems
that 'StavrosK mentioned:

    
    
      1. Find candidate feature points that stand out, and are reliably and repeatedly
         detectable despite image variations.
      2. Get a "hash" of each point that can be used to do searches fairly quickly.
    

Lots of work in detecting and recognizing objects now uses some variant of
SIFT, and it's finding usage in lots of other areas of vision as well. I
wouldn't be surprised if as many as 10% of papers at the top vision
conferences use techniques based on some variant of SIFT.

~~~
StavrosK
Ah, yes, I forgot SIFT. last.fm does recognition through the algorithm I
mentioned, which is why that came to mind, but I think SIFT would work better,
thanks for the info!

------
_harry
I never realized how similar Shazam was to the first Matlab assignment we had
to do for an EE class in discrete time linear processing. The only difference
is we had to match whale calls to a type of whale. Spectrum analysis and
everything. Never thought to identify music.

Sometimes, it's way to easy to overlook the obvious.

~~~
Tichy
Still, the whale matching app space seems to be still wide open :-)

~~~
_harry
I will call it "Shamu"

------
jacquesm
Extremely cool, and the speed is more than impressive.

He also seems to be suffering from 'death by HN', the server is more 'down'
than 'up', keep retrying and it will come up.

That Aphex Twin face is on a log scale, not a lin scale, on a log scale it
looks much better.

------
mishmax
Would this algorithm work with music that is hummed or sung live? Shazam
doesn't work for that, but SoundHound does. What's the difference in algorithm
you think?

~~~
_delirium
It's died down a bit, but for a while in the late-90s / early-2000s, there was
a whole research subfield dedicated to retrieving audio via humming, which has
some interesting stuff in it:
[http://scholar.google.com/scholar?hl=en&q=%22query+by+hu...](http://scholar.google.com/scholar?hl=en&q=%22query+by+humming%22)
It's got a nice mixture of signal processing, sequence alignment, and
approximate pattern matching problems all rolled together.

------
djhworld
I really enjoyed that article, it was very honest and the guy obviously enjoys
what he does which really shines through in his writing.

It's also nice to see the thinking process he went through when developing his
solution to this.

Yes the algorithm might not be perfect, it isn't a Shazam clone, but it does
demonstrate that within 48 hours he created something that could recognise
music. Now that's a good effort in my eyes!

------
andrus
The technology implemented by Shazam reminds me of an older art project that
was never released. Scrambledhackz:

<http://www.popmodernism.org/scrambledhackz/index.html>

Has anyone heard anything on this?

------
kola
Great read, thanks.

------
zackattack
this post is just AWESOME, and i would love to see more things like this on
HN.

~~~
chopsueyar
I would love to see this technology used for article submissions on HN.

;-)

