Hacker News new | past | comments | ask | show | jobs | submit login
Shazam: not magic after all (revolution-computing.com)
77 points by mariorz on Oct 29, 2009 | hide | past | favorite | 28 comments



Don't I feel vindicated:

http://news.slashdot.org/comments.pl?sid=7310&cid=823710

(from 2000).

I remember this because I got in a big argument with someone about whether this could possibly work. Of course, I never got off my ass and implemented it, which I guess makes me a huge loser.


Quite funny anyone would argue it wouldn't work. It's pretty similar to speech recognition algorithms which have been around for a while now.


quote from the article:

"Unfortunately, there's no indication in the paper of what software was used to develop the process (although the scatterplots in the paper do look decidedly R-like)."

I used to work with Avery Wang, the guy who devised the algorithm. He used Matlab.


Here's an independent MATLAB implementation of the Shazam algorithm: http://labrosa.ee.columbia.edu/matlab/fingerprint/


The article and comments seem to suggest that the use of Matlab or R is a prerequisite for performing calculations such as this. However, MIR (Music Information Retrieval) libraries exist for a number of languages, including Java[1] and Ansi C[2], amongst others. A good dynamic language for experimenting with this sort of thing is SuperCollider[3].

By the way, the psycho-acoustically spectral measurements referred to in the article are called MFCCs[4] - basically an FFT reading weighted according to the sensitivity of our ears. They are often used in both music and (especially) speech recognition because they tend to accurately sum up the timbre we perceive in a given sound. Timbre is much easier to extract from a digital audio file than pitch or vocal information, hence why it tends to be successful in applications such as this.

Shazam is still pretty cool too

[1] http://jmir.sourceforge.net/

[2] http://libxtract.sourceforge.net/

[3] http://supercollider.sourceforge.net/

[4] http://en.wikipedia.org/wiki/Mel-frequency_cepstrum


This is exactly how you identify chemical compounds using X-Ray Crystallography. You shine x-rays of different frequencies onto a compound, measure the magnitude of the reflections, noting down the 3 highest peaks.

Then, you look up those peaks in a book, which has compounds ordered by the wavelength of the highest peak.

It takes minutes to do it by hand, I'm not surprised computers can do it better.


That's very interesting! I have a friend who worked on a project that was using this process but I never knew the name of it!


Aww man, when I read the headline I was expecting a SpinVox-like scandal. Like a room full of idiot-savants in Bangalore that knew every pop hit for the last 50 years or something.


heh. It is actually fascinating how strong crowd sourcing can be. You would logically opt the manual root & surprisingly lot of times esp for cost and simplicity of solution

5-6 hrs of a good developer == 1 month of 3 ops in India or elsewhere. Except that getting ops to work itself can be painstaking.


why the hell would Bangalore come in here?


An interesting app.

Airtel, a leading telecom provider in India had a SongCatcher service long back (3 yrs ago) http://www.techtree.com/India/News/Catch_a_Catchy_Song_with_... I never tried it - may this one worked for a predefined set of songs.


Here is what they say in the website:

"Specifically, a fixed length of audio is converted to audio DNA; this conversion process extracts certain features from the signal based on the psycho acoustic considerations. The system has two components, one that enables the extraction of Audio DNA from a few seconds of recording, and the other is an efficient search engine that finds the exact match for the DNA.

The audio DNA is based on extracting 64 sub DNAs every 3 seconds. The sub DNAs are generated by looking at the energy differences along the frequency and time axes. These 64 sub DNA form the chromosomes of the system, which enables the system to uniquely identify the chosen song."

Looks to me like pretty much the same technology. This is again not a surprise. Most implementations of this idea will be using similar techniques. What I am amazed at is that somebody thought that all this was feasible.


It's just a variation on speech recognition, which has been around for a while.


It would be much better if you could hum or whistle a tune, and it would recognize it. I saw a PhD thesis once about this, with an actual implementation that worked pretty well. The only problem was that the database of songs was very small. It's probably hard to scale this type of search.


I remember once seeing a "dictionary" of songs. Each one was indexed by whether notes were higher, lower or the same as the previous note. Using D for down, S for same and U for up, and using # for the first note, here's the Start Spangled Banner ...

    #DDUUUUDDDUUSSUDDDDUUSDDD
Many, many tunes can be separated with the first 20 symbols.

Anyone have a reference? I'd like to acquire a copy ...


SongTapper - surprisingly accurate - I think it works because it outsources a lot of the frequency/beat detection (to your brain/sense of rhythm)

http://songtapper.com


The majority of people doesn't have an ear for music to correctly repeat melodies :(


Exactly. But this algorithm was fuzzy, so it gave you a list of the songs which most closely resembled the one you tried to sing.


I tried to use different "query by humming" services but the percentage of false positives (when a service produced a list of melodies and no one matched yours) was really huge. And even if the melody was in the list and you tried to find it again with the same service (by humming the same tune) the probability of getting it in the list again was pretty low.

Anyway, I think the idea of query by humming is not a dead end. However, such a hypothetical service should somehow collect and use a database of different "hums".


I was working on a little side startup that used crowdsourcing to help ID songs, I was just getting into researching how programs like shazam and midomi worked until I killed the project. His paper and the way it works is quite nice, but it's not perfect for other rare music and songs without elements that really stand out(frequencies or otherwise like house music). Thanks for the link!


Reminds me a little bit of

http://astrometry.net/

here is an overview:

http://cosmo.nyu.edu/hogg/research/2006/09/28/astrometry_goo...

It's fun to read about but way over my head mathematically.


I have often sat in coffee shops wondering what method of data extrapolation Shazam used to parse audio to be able to search it's music db. I would think about how I would do it. I use Shazam all the time so it's nice to finally know the basic idea.



Now recognize people! Or cars, or engine problems, or birds...Rats! I missed the yc deadline by 1 day!


I disagree, any type of programming is magic.

I'm tired of these software 'engineering' types who insist that computers are run by using 'maths' and 'numbers' (whatever those are).

Clearly, computers are run by aphasic tonally-separated spinning disks. These disks fire puffs of air out the sides of the computer, creating little tiny tornados, which summon air spirits to call the fire spirits, which causes the screen to light up and the keys to make tappy-tap noises.

anyway. to be clear: not statistics. not math. not regulated pulses of electrons. MAGIC!


In my experience, the people who are most informed about how a computer actually works are those most convinced that it runs on magic. All my compE friends insist that cpus are maintained and operated by tiny gnomes.

While less technical people don't understand, they have 'faith' that there's a logical, scientific explanation for how computer's work.


A good hack is indistinguishable from magic. (Yes I know that the original quote is "Any sufficiently advanced technology is indistinguishable from magic")


Dad?! Is that you?!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: