Hacker News new | past | comments | ask | show | jobs | submit login
How Shazam works (coding-geek.com)
446 points by billconan on July 11, 2015 | hide | past | favorite | 48 comments




I did a 'sonification' of what Shazam's reduction of your music would sound like, some time ago [1]. Perhaps it adds something to the article. You can actually Shazam it, it still works.

[1] https://soundcloud.com/sample_noise/shazuffle-ii-shazam-me


I, I’m the author of the article. In fact I also did the same when I did my prototype of Shazam. When I wrote the article, I hesitated to add a sub chapter in the Shazam chapter when I would have put a well-known music and its fingerprinted version so that everyone can hear what it sounds like but I didn’t do it because I feared copyright lawsuit.


> I didn’t do it because I feared copyright lawsuit

It's sad that we live in this sort of legal climate.


Would that be covered under fair use, or not because the article wasn't about those songs themselves (as in, the author could have picked any song, not necessarily a commercial one?)


This is a textbook example of fair use! A 5-second clip would have sufficed. it wouldn't have reproduced a large part of the work. it certainly wouldn't have affected the market for that work. it was for educational or criticism purposes, etc.

Just see the headings here:

https://en.wikipedia.org/wiki/Fair_use

OTOH I can see why the author would have wanted to steer a million miles of reproducing ANYTHING (including so much as mentioning the title of any work, which obviously isn't copyright infringement.)

in this case it's not so much copyright infringement as steering very very clear of reference to anything.


Fair use would seem to apply under most interpretations regardless (excerpts and quotations for the purposes of illustration are generally held to be covered), but you really can't say that for sure until the courts decide. There's nothing you can do to keep a rights holder from dragging you through court.


Nine Inch Nails released a couple albums under Creative Commons.

https://en.wikipedia.org/wiki/The_Slip_%28album%29


Thought they used Parsons Code as it is space efficient as a fingerprinting technique and less across the wire too for a partial fingerprint and it handles tempo drift. In addition I know they where becoming CPU bound and then moved to GPU to do matching, that greatly helped them.


When I started this side project in 2012, I looked for publicly reliable information (especially thesis or research papers) and the only useful information I found was Shazam confounder’s paper.

Since this paper was written in 2003, I wouldn't be surprised if they have changed their algorithms since this time.

But from my understanding, the 2003 paper describes a highly scalable architecture and a noise tolerant and "time efficient" algorithm (that can be modified using thresholds) so it could still work in 2015 with a few optimizations. Still, I'm not working at Shazam and I'm not a researcher so I could be wrong.


I have wanted to know how Shazam works for a long time. Thanks for this article man!


Can we have the program to do that transformation, so we can run it for ourselves?


Overly concise Matlab version. If you get Dan Ellis' ispecgram [1] you can do something like:

    [x, sr] = wavread('rick.wav');
    x_mono = mean(x,2);
    X = specgram(x_mono, nfft, sr);
    Y = X >= imdilate(X, ones(mask_size), 'same');
    y = ispecgram(Y, nfft, sr);
    % wavwrite(y, sr, 'roll.wav');
params: nfft, masksize (tuple)

[1] http://github.com/tbertinmahieux/MSongsDB/blob/master/Matlab...


It also works with SoundHound. Thanks for that, I didn't expect that song, that really made me laugh.


Never Gonna give You Up -- did I guess right?


The detailed diagrams, code samples, and demos are greatly appreciated. I think it shows that you put a lot of work into this. Thanks!

However, Shazam corporate doesn't seem very nice:

http://www.royvanrijn.com/blog/2010/07/patent-infringement/

With such a nice write-up, you probably already know this. :/


I, I’m the author of the article and I already know that. The big difference between Roy van Rijn and I is that I only put algorithms whereas he put “ready to use” java code. On paper I should be bulletproof to any lawsuit since this article is nothing more than a very detailed version of the confounder Shazam paper (+some unexplained algorithms).


It's really unpleasant how they do not list the patents they think have been infringed. They just make a vague claim for everything, and realise that most people don't have the time, energy, nor money to challenge these broad claims in court and so most people just back down.


Nice article, but please do be careful in your descriptions of sampling and digitization. They are not quite right.

The video here from Monty Montgomery of xiph.org does a nice job of explaining things without in a way that reduces the confusion.

https://www.youtube.com/watch?v=cIQ9IXSUzuM


Yeah this article got me confused because I had Monty in my head telling me different things then what I read here.


The keyword here is "Music Information Retrieval". If you're interested in knowing more there is a few high quality tutorials out there. Don't be afraid of "research looking" papers. They're often pretty accessible. For instance:

http://www.cs.uu.nl/groups/AA/multimedia/publications/pdf/is... (A SURVEY OF MUSIC INFORMATION RETRIEVAL SYSTEMS)

Or a longer PDF:

http://www.nowpublishers.com/article/Download/INR-002


Forgive the comment hijack, but for those interested in the basics of music information retrieval (MIR), I started to compile some notes on MIR as IPython notebooks here: http://musicinformationretrieval.com

It's clearly incomplete, but perhaps it might help someone. Pull requests are welcome.


I think this has been discussed a few times on HN. Here are some links. I know I'm missing 1 or 2 more:

https://news.ycombinator.com/item?id=1702975 https://news.ycombinator.com/item?id=909263

To a lesser extent: https://news.ycombinator.com/item?id=6683866


I really liked the way you put the article together, starting with the literal basics and working your way towards the solution. It's really inspiring to me, I tend to write articles that assume a lot of knowledge or gloss over the details of any basic research I've done, but seeing your article wants me to write up something similar for other concepts. Thank you!


Nice article. I really wished every scientific paper would have such a nice explanation. (I still don't understand why e.g. Google scholar doesn't allow people to add their interpretation and questions and remarks etc. to any paper.)


Not as detailed article, but if you are interested in this kind of things, here is how the audio fingerprinting algorithm used on AcoustID works:

https://oxygene.sk/2011/01/how-does-chromaprint-work/


This is a pretty comprehensive article hitting all the important points, providing decent depth, without turning into the monster that DSP can be as you plunge a bit deeper into each thing like hamming windows or fourier algorithm implementation. Kudos.


Agreed that signal processing can be a big topic. I thought it was well done to hit on the Nyqvist frequency to show why certain frequencies are used for sampling. Acoustics and audio processing are pretty good applications of some of this theory.


Mind boggling!

Also - how does Shazam makes money?


There was an article on here a few months ago that talked about Shazam's ability to predict hits months in advance.

http://www.theatlantic.com/magazine/archive/2014/12/the-shaz...


Exactly the article I was thinking of, and if you think of the way music is produced today, that's probably pretty useful and valuable data.


As far as I know, Siri uses Shazam to find music. Example from my life: "Siri, what song is it playing now?" "It's xxx". Then I got home, opened iTunes, found that song in my latest searches and bought that album. So clearly that's a moneymaker for iTunes, artist and they'll pay for that.

May be Shazam has direct relations with major music brands who'll pay for each successful search.


These algorithms are trivial to implement for big companies. Google (in YouTube) and Microsoft (in Cortana) have similar things, it's likely Apple uses a custom-made version as well.

To answer the previous question, I'd assume part of their revenue comes from big music labels: knowing what songs are about to go big (before anybody else realizes it), from where their popularity originates, and lot more, is invaluable for them.


> it's likely Apple uses a custom-made version as well.

Siri uses Shazam to identify songs (as of iOS 8) [1]

[1] http://appleinsider.com/articles/14/09/19/siri-partners-with...


Apple definitely uses Shazam, because there's a Shazam icon next to the result: http://i.imgur.com/xTKUyyH.jpg


Have you ever noticed that the app has links where you can purchase the song that it found for you?


There is a premium version of the app and they also have partnerships.


They seem to have a value add thing for TV adverts, I notice it a lot when I'm at my mum's and the TV is on. You see a pulsing shazam logo and presumably you shazam the advert and they redirect you to something from the advertiser. I assume they get paid there, maybe pretty well if they're really adding to the value of expensive TV ads.


Here is one article about partnership with WMG:

http://www.billboard.com/biz/articles/news/digital-and-mobil...


Think about the value of the data: they can predict trends long, long before anyone else can. After all, using Shazam almost equates to a "like". The music industry can figure out who likes what in practically realtime.


From its product - which is the users, who else?


Does Siri use Wolfram|Alpha for any kind of complex requests?


Yes, for a lot of stuff, especially for unit conversions and things like that.


Most of the libraries seem small and compact. Is there already one written in golang?



The github links for this are missing though


I really like this kind of articles


Interesting how something can explain how something works without giving a hint as to what it does.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: