Hacker News new | past | comments | ask | show | jobs | submit login
Creating Shazam in Java (2010) (royvanrijn.com)
213 points by headalgorithm on Aug 24, 2022 | hide | past | favorite | 36 comments



A more interesting story is behind the blog post linked in the article to someone else's implementation (which I also think contains a bit more detail than this one):

[0] https://www.royvanrijn.com/blog/2010/06/creating-shazam-in-j...

[1] https://www.royvanrijn.com/blog/2010/06/music-matching-part-...

Turns out he was contacted and threatened by some patent lawyers from Shazam for writing the above blog posts:

[2] https://www.royvanrijn.com/blog/2010/07/patent-infringement/

[3] https://www.royvanrijn.com/blog/2010/07/patent-publicity/

[4] https://www.royvanrijn.com/blog/2010/11/patent-infrigement-p...

And of course, the original HN post:

[5] https://news.ycombinator.com/item?id=1496683

It seems he eventually just told them to go pound sound (!!). Good for him!


Ok, we've changed to that (first) link from https://www.toptal.com/algorithms/shazam-it-music-processing.... Thanks!

Related:

Patent infringement claim re: “Creating Shazam in Java” blogpost (2010) - https://news.ycombinator.com/item?id=9594480 - May 2015 (18 comments)

Source code example of the Shazam algorithm - https://news.ycombinator.com/item?id=5724442 - May 2013 (16 comments)

Creating Shazam in Java - https://news.ycombinator.com/item?id=5723863 - May 2013 (42 comments)

Implementing Shazam with Java in a weekend - https://news.ycombinator.com/item?id=1702975 - Sept 2010 (22 comments)

Told to remove blog posts describing patented algorithm - https://news.ycombinator.com/item?id=1496683 - July 2010 (147 comments)



I wonder if they have such pattern in EU or Netherlands. US tends to patent stuff that wouldn't fly in Europe.


It's more complicated[1] that but essentially there are no software patents in EU or Europe. Sadly the EU Commission have been trying to get them into legislation for over a decade[2]. It's fits perfectly with their highly protectionist authoritarian centralised control and regulation agenda.

Having worked in companies it's immaterial anyway as big companies with a USA based vehicle will still be patent trolled if they do business there. Seen it happen

[1] https://en.m.wikipedia.org/wiki/Software_patents_under_the_E...

[2] http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//...


This was all making sense until the hashing part. To me it seemed that a slight change in recorded volume would change the magnitude and as a result the hash. Perhaps noramization over the recorded sample would help, but didn't see that. Still, I'm surprised the hash is so simple and still works.


Around the time ML/AI was starting to take off again, (I want to say 2011/2012 ish) I was doing research into applying OCR methods to recognise engineering symbols on drawings (both hand-drawn as a graphical rep and CAD as a vector rep). OCR uses locally-sensitive-hashing techniques, where minute changes, like a couple of pixels difference here and there, in theory result in similar hashes. What you really need to do is look at your windowing and overlaps and tune those to get something that actually gives you localities that are useful.

This worked quite well for me. After running a "training-set" of sorts I created a small tool that ran over a quarter-million engineering drawings to get counts of each symbol from the set. (I'm going to hand-wave some implementation complexity here but essentially) If an exact match couldn't be found, the item being searched would show with thumbnails of it's nearest neighbours off to the side, you could select if it was essentially the same as one of those. (Sort of like "is this, this person" in Google Photos and the Apple equivalent)

The next step after this POC was to expand to discover symbols used most next to other symbols to use as contextual menu items in other CAD software to speed up drawing production, since a lot of time was spent placing a symbol, then stopping to text search the item you were placing next.

Unfortunately I was retrenched soon after and didn't get to progress. A couple of years later I took this a dimension further and was prepping for a PHD that would look at this for 3D-objects and models using naively generated voxel representations. Almost by accident I found group at/backed-by DARPA had a patent pending on a similar method. In retrospect I should have just gone all in on a photogrammetry based method since that kind of won out as a superior method - but it was still early days.

The moral of the story is that everything is pattern recognition and simple methods from before this last decade of ML/AI could do some cool stuff too.


that's basically what it does, it finds peaks relative to total power.

you can think of it as like identifying continents by constellations of mountain tops which is robust to whatever chaos might be happening at sea level.


The patent infringement part 2 update was in 2010. Wonder if they ever got back to him.


> The Shazam patent holders lawyers are sending me emails to stop me from releasing the code and removing this blogpost

The fact that the justification for the patent system is to ensure knowledge about inventions is disseminated and yet it generates the above situation where they are literally threatening to sue him for disseminating the knowledge of how it works, tells you a lot about how badly the patent system is working / abused.


IMO software patents should be entirely abolished. If you look at patents from before computers entered the picture, a patent covered the idea + implementation.

Someone could create a better bucket design and patent that design, what one could not do is patent the idea of "a container that holds liquid".

But that seems to be exactly what has happened with software patents. With software, the idea is very much divorced with the implementation (I think Paul Graham or someone else said a while back that an idea by itself is worthless, it's all about the execution of the idea). It seems that with software patents is less "this is my invention, I want to protect it" and more "hey I had this cool idea, now no one can have the same idea again".


It means that, instead of incremental improvements on an idea, these can now be limited to 20 years per step.


Which is an excellent point because it may have been reasonable when the patent system was created and for hard objects because there were natural limits on how fast innovation could happen.

But software is literally only limited by our minds. Slowing down innovation to a 20 year cycle is devastating to the pace at which it can happen.


The justification of the patent system is to protect the original creator / innovator from being copied so they can better commercialise their invention (legal protection which can be practically enforced better than a trade secret, which can be replicated).

Since when is the idea of a patent to disseminate knowledge of how something works?

This seems to be patents working as-intended (ie Shazam wouldn’t want an open source competitor)


The purpose of a patent is to encourage global innovation by granting exclusive rights for commercialising technology for a limited time in response to sharing how it works.

Roughly, if you invent PageRank and keep it internal to your search company, you can exploit that idea but if someone else comes up with it they can exploit it too. If you invent PageRank and patent it, the whole world knows how it works but can’t use it for 20 years without paying you.

Generally in case of a violation you’d want to sue for damages, whether that be licensing costs not paid, or missed revenue on your own side.

This probably then ends up in a bit of a grey area. Shazam would be within their rights to exert the patent, but the damages are likely so minimal to be outweighed by the cost of litigating the case, which means that the C&D is a little too chilling for my liking. Basically, any penalty would be lawyers fees - author hasn’t tried to commercialise it, it doesn’t reduce Shazam’s revenue at all. Are there any damages at all?

And then, few would argue (I think) that simply implementing a patent for educational purposes constitutes infringement. My uni course contained an implementation of PageRank. Is publishing the course notes then infringement? Arguable it’s anti-goal to discourage actually spending time learning about the innovation beyond just reading the patent doc - otherwise the motivations of the system break down. So, just how different is this? Presumably on a conceptual basis you want your universities to be able to teach about things invented in the last 20 years without licensing patents?


The idea of disclosure though isn’t to teach others how to replicate your technology - it’s to provide enough information about your technology so that you lay a claim to it.

Without disclosure it would be impossible for others to know if they are infringing a patent - but the purpose of disclosure isn’t to teach others how to innovate on the back of your patent.


The etymology is from "letters patent", i.e. a public description of how the invention works. The social contract here is that the inventor gives up the details to the public, in exchange for temporary exclusive use rights granted by the public.


Why do people comment on things like patents when they don't know anything about them?


That phenomenon isn't limited to patents. That just happens to be an area you're knowledgeable enough in to recognize obvious falsehoods. Imagine the other topics you (we) read on this site that we aren't savvy in.


I generally don't comment on things if I haven't informed myself about it. I'm here for takes by experts, not people who are making guesses based on ignorance disguised as informed opinion.


I know how a patent works - I just mean the purpose isn’t to teach others how it works for the benefit of others, disclosure is the mechanism to protect others from implementing it.


I used to work for a spin-off of Philips Electronics that did audio and video watermarking and fingerprinting. The audio fingerprinting algorithm is described in this paper[0], it's a very interesting read if you're curious.

[0] http://ismir2002.ismir.net/proceedings/02-FP04-2.pdf


On a related note, a friend created some frequency-domain visualizations of interesting sounds using the Welch power spectral density estimation algorithm and Fast Fourier Transform, the same algorithm described in this blog post.

A few examples:

1. Dialtone using dual-tone multi-frequency signaling and 56K dial-up modem connection: https://www.youtube.com/watch?v=FomWraKuDFg&list=PLn67ccdhCs...

2. Deluxe Multitone Car Alarm: https://www.youtube.com/watch?v=A4uKcvZL7HM&list=PLn67ccdhCs...

3. Composition using only sounds from Windows 98 and XP https://www.youtube.com/watch?v=6lT-jr9sS6Y&list=PLn67ccdhCs...

4. Piano Music (Ballade Pour Adeline): https://www.youtube.com/watch?v=RnAfrEk429w&list=PLn67ccdhCs...

5. Electronic Music Demo: https://www.youtube.com/watch?v=MllJLIX1glg&list=PLn67ccdhCs...


I think this old whitepaper[0] is a better description of how Shazam works.

[0] https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf


The white paper has more information on why a certain choice of identifier for matching was made. The post on the other hand just tells you what Shazam does. Both are good for different use cases. From a pure readability perspective the post is marginally better, in my opinion. It has a bit too much of definition of common operations for my taste, but someone else may feel it is appropriate. On the whole it is more accessible.


Let's keep in mind that the article is from 2010 and the blog post is still up.


I find that this is among the many impressive things related to this story!


Are the comments in the `AudioFormat getFormat()` example useful to anyone?

The most useful one to me is the `channels = 1; // mono`; it gives a second way that is often used to describe the value and saved me maybe half a second.

All of the words in the signed comment are useless except "unsigned", but I think that is a clear opposite of `signed = true`. The fact that the setting pertains to data in SOME way is super obvious.

The endian comment adds in the keywords little-endian and order, but if you don't already know what endian means, googling "bigEndian" would already get you useful results.

The `format = getFormat();` comment is the most useless to me because it actually confused me. I thought because there was a comment it couldn't be referring the code above. Additionally, the only additional info it gives me is the word "settings".


The blog post was super interesting, but the TRUE gold is in the link at the bottom om the article.


He also did a couple of recordings/talk on YouTube about how it all works and also does a live demo:

https://youtu.be/T4PJoAh4X1g?t=1977

It even works to align/overlay music, which is really cool:

https://youtu.be/T4PJoAh4X1g?t=2136


> His biggest hit for example, Windowlicker has a spectrogram image in it.

I had no idea about this! Crazy cool !


Any databases of movies so that an image oe short clip can be searched? Shazam for film/TV?


I thought they use Cepstrum followed by machine learning.


windows fingerprinted by distances and angles between spectral peaks. robust to noise but not warping in time or frequency domains.

fingerprints stored in a scalable lsh style lookup table. (iirc) with the goal of a fast, scalable lookup table for all the 30s windows in all recorded music.


> robust to noise but not warping in time or frequency domains.

Could you use something like a dynamic time warping algorithm for this? (I'm not super acquainted with the technique and not sure if you could get away with it in the frequency domain used for the matching.)


Fantastic job. I would love to scrounge through it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: