
Show HN: Shazam-like acoustic fingerprinting of continuous audio streams - dest
https://github.com/dest4/stream-audio-fingerprint
======
dest
OP here.

This lib is a brick in an adblock for radio broadcasts I have been developing
for a while and that I am progressively open sourcing.

~~~
retox
Did you consider that this could be used by record labels to detect
unauthorized use of copyrighted music, etc?

~~~
justinjlynn
As a reason for not releasing it? If so, that's not a terribly good reason. I
mean, clearly they already do that -- this simply permits those with lesser
means to employ the technology. In general, I'm not fond of "but someone could
misuse it" as a reason for not releasing a technology -- especially if it
already exists in another form with limited accessibility.

------
hammock
Google Pixel 2 phones are doing this now as an out-of-the-box feature. It's
continuously listening and the song name appears on your lock screen.

[https://venturebeat.com/2017/10/19/how-googles-
pixel-2-now-p...](https://venturebeat.com/2017/10/19/how-googles-pixel-2-now-
playing-song-identification-works/)

~~~
dest
I wonder how much battery it drains.

~~~
goldenkey
Theres a whole bunch of crap Android phones do now in thr background like the
"Ok/Hey Google" assistant voice shortcuts. I turn it all off. But supposedly
it only uses a special lower power chip to do these passive listening actions.
Not sure about music recognization though -- seems like it would involve a
decent amount of memory even if performing a convolution..depends how much
buffer time.

~~~
0x00000000
A new one for me the other day was "Google Nearby". Enabled by default and
some company in the airport using it to push ads to your notifications.
Disgusting and maybe the final nail in the coffin for Android for me. As a
long time diehard android user, iPhone sounds better and better every day.

~~~
KitDuncan
Why not just use a rom without all the bloatware and maybe even without google
services alltogether?

~~~
j_s
[https://lineage.microg.org](https://lineage.microg.org)

discussion:
[https://news.ycombinator.com/item?id=15619416](https://news.ycombinator.com/item?id=15619416)

supported devices:
[https://wiki.lineageos.org/devices](https://wiki.lineageos.org/devices)

------
throwmenow_0140
Very cool stuff! It seems that all those solutions are based on the analysis
of visual representations of spectrograms. Is this common or could you just
use 2d arrays which encode the same information - would this be more
performant?

Nice blog post about this stuff: [http://willdrevo.com/fingerprinting-and-
audio-recognition-wi...](http://willdrevo.com/fingerprinting-and-audio-
recognition-with-python/) \-
[https://github.com/worldveil/dejavu](https://github.com/worldveil/dejavu)

~~~
dest
You mean 2d arrays containing the raw audio signal? No, this would not work
because you do not know the phase along the y dimension when you want to
compare to another signal.

Another method to detect an audio pattern is cross correlation on the raw
audio signal. But it is very expensive in computation power and memory.

The longest operation with fingerprinting is often the DB query that is
associated. Lots of work to do there. In that space, Will Drevo's work is
really good. I will share my DB implementation later.

~~~
throwmenow_0140
I meant the spectrogram encoded as a 2d array, but I guess there isn't a big
difference when the db query is the most expensive part.

I've always wondered: Is there a way to compare fingerprints with humming
sounds or live recordings?

Those fingerprinting techniques don't seem to be suitable for those tasks, do
you know of any methods to accomplish this?

~~~
dest
You have special fingerprint algorithms that are suited for sound
modifications like pitch
[https://biblio.ugent.be/publication/5754913](https://biblio.ugent.be/publication/5754913)
but it's not going to work with humming or live audio. I don't know if such a
thing exists.

If you want to do some research, here is a short review paper on the topic
[http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_IS...](http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_ISMIR2011.pdf)

As for 2d array spectrogram, it is not needed in my lib (expect when plotting
is activated). I only care about maxima in the spectrum of each data window.
In other words, 1d spectra are enough.

------
joren-
Another implementation of this algorithm can be found at [1]. It also includes
several other algorithms for acoustic fingerprinting that can serve as a
baseline. See [2] for a paper on one of the other implemented algorithms and a
comparison.

[1] [https://github.com/JorenSix/Panako](https://github.com/JorenSix/Panako)

[2]
[http://www.terasoft.com.tw/conf/ismir2014/proceedings/T048_1...](http://www.terasoft.com.tw/conf/ismir2014/proceedings/T048_122_Paper.pdf)

~~~
dest
Thank you for having released Panako. Note that I gave the link to the
relevant paper in a previous comment

[https://news.ycombinator.com/item?id=15811221](https://news.ycombinator.com/item?id=15811221)

~~~
joren-
Ah, I did not see that. Good to know that it is findable.

~~~
dest
I did not actually know that there was a Github for this, I only had the
paper.

------
StavrosK
I hacked something in an hour once, and made a program that would recognize
the song that was playing and played the video clip of that song from YouTube
in sync:

[https://www.youtube.com/watch?v=K6FxfZH_ZK4](https://www.youtube.com/watch?v=K6FxfZH_ZK4)

The phone in that video is just playing a song, it doesn't have any connection
to the computer at all.

~~~
dest
Nice. How did you recognize the song? Cross correlation or fingerprinting? How
big was your song database?

~~~
StavrosK
Unfortunately I didn't write my own code for that, I just used a pre-existing
fingerprinting API.

~~~
uitgewis
Not to discredit, but that's a lot less significant.

~~~
StavrosK
Yes, hence the "hacked together in an hour" part.

------
peterburkimsher
That's great! I was just thinking about rewriting Shazam as a machine learning
project.

I'm wondering how to use my Chord Progression data to make a different audio
fingerprinting algorithm.

[https://peterburk.github.io/chordProgressions/index.html](https://peterburk.github.io/chordProgressions/index.html)

------
Xeoncross
Thanks for sharing. Processing PCM audio signals is something that is actually
useful for more things that people realize.

~~~
dest
Hope it will be useful!

This lib is a brick in an adblock for radio broadcasts I have been developing
for a while and that I am progressively open sourcing.

~~~
vitovito
How closely can it correlate audio broadcasts of the same audio that were
captured at different offsets?

e.g. two independent streams, identifying the same 30-second commercial, but
the audio streams are offset from each other by half a sample length?

~~~
dest
It correlates quite well.

Maybe some fingerprints will be present in only one of the two streams, but
most of them will be present in both.

------
toomuchtodo
Thanks so much for your work on this. Interested in running it against the
Internet Archive’s audio collection.

------
durkie
do you think this could be useful for detecting changes in songs? like if i'm
listing to a big mix of songs and they don't have timestamps of when the song
changes, but that is info i would like to have...

~~~
dest
Yes it could be. You need a song database to detect changes, and that is hard
and/or expensive to gather.

Commercial services are available in that field. ACRCloud was mentioned in
another comment.

------
maephisto
Awesome share!

~~~
dest
thank you!

------
megamindbrian2
Can it fingerprint other streams?

~~~
dest
You mean audio streams? Of course. Just change the URL next to curl and that's
it.

------
ww520
Isn't Shazam patented?

~~~
dest
Maybe, but I don't know.

I'm in France and this lib is software only, so probably Shazam patents are
not enforcable here.

Anyway, IANAL and cheers to Shazam people

