As I understand it, Shazam sold their patent to Landmark Digital Services, which are a part of BMI the record label. They kept an exclusive license to make Shazam-like software for phones.
You can imagine BMI wanting it to make money from how a service such as Youtube fingerprints and detects copyright infringement...
And it was this BMI company that were trying to get this blog post explaining the patented algorithm removed from the internet.
One post from the BMI lawyers to Roy in the Netherlands was particularly broad bullying:
> Mr. Van Rijn,
> The two example patent numbers that I sent you are U.S. patents, but each of these patents has also been filed as patent applications in the Netherlands. Also, as I'm sure you are aware, your blogpost may be viewed internationally. As a result, you may contribute to someone infringing our patents in any part of the world.
> While we trust your good intentions, yes, we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm.
I think I heard that Shazam recently got the patent back. I speculate BMI found no-one to license their fingerprinting tech for copyright infringement.
That's actually the telling sign of the dysfunctional patent system. Companies want to use patents to prevent everybody else from doing something similar, and in this case, even from just talking about it (which is obviously ridiculous).
Patents used to be a framework for sharing technological progress without giving up ownership, i.e. make it easier for everybody else to build on other's progress - that's long gone.
This is very cool. Minimum clear implementation of the algorithm that replicates the effect of Shazam. It's refreshing to see a blog with actual code sample got voted up instead of all the press releases.
I mirrored this implementation a while ago since the full source isn't available. It was not nearly as successful as the blogger portrays. For example, if I used a high quality wav mono file to create a fingerprint it would have a hard time identifying a track that is an mp3. It seems the maximums actually get shifted and merged from compression. In other words there's a reason shazam uses entropy based anchor points to help it pick hashing values.
I'm wondering if they bound the fingerprint search to human audible frequencies. MP3 compression, as a lossy codec, works by discarding information in the input signal that corresponds to inaudible frequencies. I believe this could be mirrored in the implementation by running the frequency domain peak-pick algorithm only over specific bin ranges.
I don't recall if the paper specifies the frequency ranges used but my implementation was bound to audible frequencies. I was going to use hill climbing search to find optimal frequency ranges but came to the conclusion my implementation was too flawed regardless. If I looked at the two graphs side by side(compressed vs uncompressed) they looked nothing alike. For example, the peak might be in the same region but it would be shifted.
After using Shazam, I was kind of hoping there was more to it than just a time windowed frequency domain peak-pick algorithm. The algorithm itself is pretty basic from a signal processing perspective, but I think the key insight here was that the results are unique enough to store off and compare other samples against at some later point in time.
Yeah, the magic (if there is any) is doing the match across a silly amount of songs in a relatively short time. Not groundbreaking exactly, but operationally quite interesting.
This type of analysis is commonly used in tons of things, like communications systems, image processing, radar, etc. I used a similar technique when trying to identify an underutilized wifi channel in the vicinity of my apartment.
Well I'm sure they must be using a few tricks in their implementation. I've always been interested in knowing how Shazam actually works and had in mind that they must somehow split a song in intervals and "hash" every interval, then store them in some kind of indexed database for fast retrieval.
Seems I was not too far off:)
Yeah, this is the obvious implementation. As he said in his follow-up post:
>And second, I’d like to know which patents are in play. Because I just couldn’t think that something this easy (music-fingerprint is a hash, and we do a lookup) can be patented.. Maybe in the States, but in Europe?
I find it absurd how they try to threaten people with their legal teams. They had nothing on you but didn't want you to release the code so they threw their legal team at you and made them come up with some bullshit. That's just ridiculous! Good for you for standing up against that crap.
I wonder how the work is split between client/server in (actual) Shazam. (I suppose only the key points are sent to the server, but I may be wrong - Siri for example sends the server a compressed audio file of the recorded sound)
First time I used Shazam, was so amazed. Had to download the original paper, still couldn't understand well enough how it worked, in order to code it. now lets get to work on it.
One possible way to solve the legal troubles is to just remove any references to the product name 'Shazam'. You could title the blog post "Algorithm in Java that identifies music similar to other commercial products" (too long.. but use your imagination)
That wouldn't do a thing. Patents cover the code, not the names. (Well, the "embodiment", but since all there is here is code, it is clearly covering the code.) That would only help a trademark infringement, and there isn't one here.
I haven't had a chance to google for a source so take this as anecdotal but I vaguely remember reading an interview with the people behind it (when Shazam first launched in the UK) and in it they said they were ripping thousands of CDs a day/week (can't remember which) and running each track through their algo. Can't remember if they bought the CDs or had some deal in place with the record labels.
We had a 'build your own Shazam' as a lab for Berkeley's Intro. Signals & Systems class this semester. Super cool to see it working and quite an interesting application of Signals & Systems