

Echoprint: Open-source music fingerprinting - chl
http://echoprint.me/

======
a3_nm
I'm a bit confused by the database license. Why do you want people to
contribute additional data specifically back to you, rather than requiring
them to release it under a compatible license which would allow you to
incorporate it if you wish? This is substantially different from usual
copyleft licenses.

As an example, notice that Wikipedia does not require people to send modified
articles back to the Wikimedia Foundation, or to allow them to use the data as
they see fit (this is clause D. 3. d. in your license). They just require
people to contribute under a copyleft license, and they can thus incorporate
derivative versions published elsewhere if they want. This is nice because it
ensures that the Wikipedia content can be useful even if the Wikimedia
Foundation disappears.

Anyway, awesome work, congrats!

~~~
jws
_Why do you want people to contribute additional data specifically back to
you_ …

One possibility is so that database can be created on the backs of the users,
then the database owner can slam the door and turn it proprietary, like the
CDDB did.

Their stated goals don't lean that way, but I don't see a clause that either
permits or forbids the database owner this action. Lawyers will be required to
figure that out. I suspect there is an implicit ability for the grantor to
terminate the license, but that is what lawyers are for.

(I did notice that the termination effects reference clauses that don't
exist.)

------
chl
Given how threat-, or, let's say, notification-happy Landmark was just a while
ago [1], does anyone have an idea regarding the patent situation? Is this
implementation different enough to be considered (reasonably) "safe"?

[1] <http://www.redcode.nl/blog/2010/07/patent-infringement/>

~~~
brianwhitman
We invented everything about Echoprint from scratch, working with some awesome
scientists and audio guys. I'm not a lawyer and won't comment on legal stuff
here though.

~~~
speckledjim
FWIW, inventing something from scratch, as far as I understand things, will
only save you from copyrighted code etc. It won't save you from overly
general, and trivial patents.

Good luck though :/

~~~
dgreensp
Exactly. Be prepared to be bullied by Shazam -- at least, that's what happened
the last time someone posted audio fingerprinting code online. And inventing
something yourself doesn't save you from patents. I'll be watching from the
sidelines to see how this plays out.

------
megamark16
This is really amazing, and I can't wait to see all of the possibilities it
opens up now that people can create their own databases. I'm tempted to set up
an app to fingerprint and dedupe all of the music spread out throughout the
network here at work.

~~~
VMG
This. Also proper tagging for once.

~~~
jokermatt999
I highly recommend MusicBrainz Picard for autotagging. I went through tagging
somewhere around 70 gigs of music with it, and the entire experience was
rather painless. It will also rename and organize the files for you as a
bonus.

It looks up the data (as it should be), so you rarely will have to enter
anything yourself. It sometimes helps to add data for 1 track for horribly
mistagged stuff, but after that you can usually drag and drop the rest of the
album.

~~~
18pfsmt
For others, like myself, that are unfamiliar, yet very interested.
MusicBrainz[1] is written in python, distributed under the GPL, and I will be
trying it ASAP.

[1][https://secure.wikimedia.org/wikipedia/en/wiki/MusicBrainz_P...](https://secure.wikimedia.org/wikipedia/en/wiki/MusicBrainz_Picard)

~~~
andrewcooke
MusicBrainz is written in Perl (and SQL). MB Picard is Python. MB is a
database and associated server. MB Picard is a client.

MB is a bit of a monster. If you're tagging a lot of music you may want to
install it locally. My experience with this is documented at
<http://acooke.org/cute/Installing3.html>

Personally, I was not too impressed with MB itself - ended up using LastFM's
API instead. But this was for generating playlists, not tagging.

------
brianwhitman
if you have any questions, let me know. we're very excited about this!

~~~
Aissen
About your data dumps: you're about to get hammered, so please share them in
torrent! This would be much better for the thousands of people wanting to
bootstrap.

Also, I understand json is very easy to use, etc. But those big dumps cry for
a binary format. Or at least add zlib/lzma compression so people don't waste
bandwidth on uuencoded binary data in json.

~~~
brianwhitman
the dumps are in a format that other code understands (fastingest.py)

The code data is compressed using zlib (and then base64'd.) It's all on s3--
in our experience, big data dumps like this get relatively little traffic
after the original hype dies down and a torrent doesn't make sense. We're
pretty sure amazon can handle the load :)

~~~
JonnieCache
S3 has bittorrent support built in, it's trivial to enable. But you probably
knew that.

------
denimboy
I think the fingerprinting part is similar to pHash: <http://www.phash.org/>
but echoprint is more focused on music and they are building a database of
fingerprints.

I think pHash also has functions for fingerprinting music but might not be as
precise since pHash is not strictly focused on music.

------
JonnieCache
Echonest are some cool people. Their earlier APIs enabled the illustrious
although sadly now defunct <http://www.donkdj.com> which was done by a
classmate of mine as a project for a Generative Creativity course we did at
uni.

Looks like their research has taken them a lot further!

------
natch
I'd love to see a project where the data is MIT licensed too, not just the
code.

------
regomodo
A very interesting project. I've whipped up a little test
program([https://github.com/regomodo/handy_scripts/blob/master/echopr...](https://github.com/regomodo/handy_scripts/blob/master/echoprint-
codegen-builder.py)) in Python and found either the codegen or echonest to be
a little buggy. Daft Punk fingerprints come back with some very unusual
results. <http://pastebin.com/8Tfvd0SZ>

~~~
brianwhitman
can you file an issue on echoprint-codegen or write us at the google group so
we don't lose that? That definitely shouldn't be happening, there's an issue
somewhere for sure.

~~~
regomodo
I've raised an issue over at github <https://github.com/echonest/echoprint-
codegen/issues/10>

------
caf
Why aren't the fingerprints in the database covered by the recording copyright
on the song that they were derived from?

~~~
roel_v
Maybe they are, to the best of my knowledge there's no definite answer that
they aren't. I find it unintuitive but I do think that rationally there's a
case to be made that they are, in fact, 'derived works'.

~~~
roc
Objectively, they're meta-data, on the order of box scores from a baseball
game or the cast credits from a movie. So there's no particularly solid
argument for them to infringe on the copyright of the original work.

~~~
roel_v
No:

\- game: a baseball game is not a "work" under the Berne convention

\- cast credits: they are descriptive, but not _derived_ from the movie.

One of the criteria to assess "fair use" is "was anything creative added in
the process of making the derivative", which is per definition not true in the
checksum or fingerprint case. I don't find your argument compelling - it's not
a closed and shut case.

------
highace
Woah. This is going to be massive. The amount of things that could potentially
be built on top of this is scaring me.

------
paulnelligan
How is this different from Shazam?

~~~
Stuk
Because it's open source.

------
stevenp
This is huge. I've been mulling some ideas for awhile that would require music
fingerprinting, but I've always been too overwhelmed by the available options,
from a licensing and implementation standpoint. I can't wait to play with
this!! :D

------
jbrennan
Looks incredible! One note though, the page seems to partially break for me in
Safari on the Mac (the sidebar overlaps onto the content as I scroll
horizontally).

But the tech looks incredible. Good work for releasing this!

~~~
brianwhitman
oops! as you can tell, we're pretty good with music data and not so much on
the web site design. i'll try to fix it :)

------
brianwhitman
for more on the whys, here is the EN blog post:
[http://blog.echonest.com/post/6824753703/announcing-
echoprin...](http://blog.echonest.com/post/6824753703/announcing-echoprint)

------
paisible
Holy balls you guys are awesome for releasing this.

