

Show HN: Discover what songs were used in YouTube videos - tk42
http://www.mooma.sh

======
tk42
The user can paste a youtube url which will then be analysed, fingerprinted
and matched against a database of 7+ million audio fingerprints. It does not
only identify a single song but is able to identify multiple songs contained
in a single file or video and generates a timeline listing which tracks it
contains at which time.

Our matching algorithm is based on the open source echoprint-codegen
fingerprinting method, which we have built our own stack around:

\- Replaced Solr/Tokyo Tyrant with Elasticsearch

\- Reimplemented matching-logic

\- Crawlers search multiple sources for audio files to be indexed (mp3s arent
stored long term, only fingerprinted then deleted)

\- Indexing about 1 new track per second

\- Found method to verify unrealiable ID3 tags (in progress, current database
also includes unferified)

\- mogilefs as primary data store for fingerprints

\- perl everything

We also provide a free music identification API.

Any feedback would be much appreciated!

~~~
unltd
I've worked on the echoprint-codegen algorithm for my current project (
trak.rocks ) and I'm curious about how you reimplemented the matching logic ?

Do you plan to document/opensource you work ?

~~~
mo0
First we rewrote echonests truescore logic in perl and then altered slightly
and implemented some extra checks to further try to exclude false positives.
We also believe what they used in the late song/identify API might have been
different from what is open sourced in [https://github.com/echonest/echoprint-
server](https://github.com/echonest/echoprint-server)

Also we pack each individual hash before storing in Elasticsearch and gained
at least 50% storage space this way.

Our Fingerprint data is quite different from theirs(unreliable ID3 tags, N
versions of same track) which is why we needed some tweaks. So far the
matching is still far from perfect...

Whether we will open source the whole thing at some point we don't know yet.

~~~
caractacus
When you say the matching is far from perfect, is that at your end or on the
part of the echoprint / echonest code? You made tweaks because you found
issues with what they were doing....?

~~~
tk42
The reason for it being far from perfect is likely a combination of both. If
the correct song is indexed there is a high probabiliy for us to find the
right match. However if its not, with a bit of bad luck a false positive can
happen easily with the default solution (and ours too). Also when analysing a
youtube video it can happen that in a 30sec snippet only 10 secs are a
matching song and 20 are unrealated or 15 are one matching song the other 15
match a different one in which case 2 tracks or multiple versions of 2
different tracks will have relatively OK scores. Deciding what to consider a
match (or whether to try different queries for the same or slightly altered
timespan prior to deciding) is not trivial in these cases and our changes are
mostly concerning when a match will be considered a match by altering
thresholds and how matching truescores will be looked at in relation to other
fingerprints true scores. Due to issues like these, specifying a timeframe for
analysis will often produce better results.

[http://static.echonest.com/echoprint_ismir.pdf](http://static.echonest.com/echoprint_ismir.pdf)

------
wingerlang
For anyone often wondering about music in songs I will recommend Shazaams OSX
app. It sits in the menu bar and listens to music and if it recognises
something it will send a notification and add it to a list [0].

Watching youtube video, movies or just having someone else play something and
it usually finds it without problems.

It's a different use case than OPs app though, which is more on demand I
guess.

[0] [http://i.imgur.com/0At4lJ6.png](http://i.imgur.com/0At4lJ6.png)

~~~
Hates_
Does it still have to listen through a mic or can it detect songs from
internal audio now? I don't have any speakers and just use headphones.

~~~
unltd
Shazam Mac osx app is quite powerful. It doesn't only listen to your mic so it
will detect the song even when the sound is off. Also it often detects a song
playing in somebody's else headphones at our office. Kinda creepy sometimes.

------
DanBC
I love this. Thank you.

On a slight tangent: I'd love a client that could identify my MP3 collection,
and rename it and retag it (under some kind of supervision). Ideally it'd do
the dentification in a batchmode when it got Internet connectivity (but this
is perhaps an unreasonable requirement). And to make it perfect it would let
me listen to and delete tracks.

I have a huge unweildy collection of MP3s and I can't bring myself to just
delete gigabytes of music.

~~~
IanCal
Have you tried picard:
[https://picard.musicbrainz.org/](https://picard.musicbrainz.org/)

I used it to tag a massive amount of partially labelled and mostly metadata-
free music files some time ago and it worked a treat.

~~~
vidyesh
This is nice. Anything to tag TV Shows and Movies?

------
Tunecrew
This is very interesting - I see this as an ideal concept to be paired with
one of the existing commercial solutions, e.g. Shazam.

If you're indexing all the random and free stuff out there, you're picking up
a lot of material that may have never been commercially released or has not
been re-released digitally. At the same time, Shazam, YouTube ContentID,
Apple's iTunes Match, etc. have access to an extremely large set of references
which (more than you could have) contain 99% accurate metadata. ContentID
definitely picks up multiple songs in mixes, as well as pitch changes, with a
high degree of accuracy (assuming the master sound recording has been
submitted to YouTube).

A submission system would be great too, or some way for persons to tag stuff
themselves ala discogs, etc.

------
amelius
I think what we need most is a community-backed source of fingerprints.
Because the authority-based approach only works well for popular songs (at
least, that is my impression, based on frequently using commercial recognizing
apps).

~~~
mlinksva
[http://acousticbrainz.org/](http://acousticbrainz.org/) ?

------
volker48
YouTube already does this on some videos does anyone know how this technique
differs?

~~~
mo0
They display Audio info if tracks from the audioswap library were used:
[https://support.google.com/youtube/answer/94316?hl=en](https://support.google.com/youtube/answer/94316?hl=en)

------
unltd
Great Job !

Could you describe the usecases ? Is it for mixtapes uploaded on youtube by
DJs, or Over-The-Air recognition in music festival videos ?

Because Music ( single tracks ) uploaded on youtube is usually already
identified so it could be found.

~~~
scrapcode
One of the most common comments on YT videos I see are people asking for the
name of the song used in a video.

~~~
pvaldes
Yes, is one of the archetypical creatures of internet. I bet that 99% of those
people are lawyers... or maybe the author of the song.

------
huhtenberg
Tried with a couple of videos, went out to grab lunch and 30 minutes later it
is still stuck (with the progress bar extending to the first t in
"[http://"](http://")). Sorry :-/

~~~
tk42
Hug of death occured faster than expected :) scaling now

------
oxplot
OK, I'm not if this occurred to anyone else: how about a Shazam like app that
can search "the Internet" by listening a few seconds on your mic?

------
DevFactor
That's awesome. Now could you figure out why my intro #2 video:
[https://www.youtube.com/watch?v=dosy8zOooUU&list=PLP6PvXLevG...](https://www.youtube.com/watch?v=dosy8zOooUU&list=PLP6PvXLevG9J57vObsvi66uweSdfX87TV&index=2)

Gets flagged as copyrighted music even though there is no music?

So many YouTuber's would thank you for a service that did this.

