Hacker News new | comments | show | ask | jobs | submit login
MusicBrainz: an open music encyclopedia (musicbrainz.org)
432 points by hernantz 174 days ago | hide | past | web | favorite | 90 comments

Just to push home the awesomeness of crazy music nerds that together create MusicBrainz, please have a look at these two examples:

1. Number of releases per album individually tagged: https://musicbrainz.org/release-group/f5093c06-23e3-404f-aea...

2. The amount of metadata for an album: https://musicbrainz.org/release/b84ee12a-09ef-421b-82de-0441...

When you get used to this kind of high quality metadata, it's just so so sad to see how companies like Spotify treat metadata. As an example, look up Bob Marley & The Wailers on Spotify and try to find original releases, and then compare that to the list found here:


...and the sad part is that the metadata is freely available, with a permissive license.

The amount of data is amazing – but I find that it's both a blessing and a curse. It absolutely excels at the use case of tagging audio files (as a lot of people here are noting), and as an encyclopedic reference (its purpose). For other software integration use cases, where there is any ambiguity involved whatsoever, a huge portion of code needs to be dedicated to deciding which recording/release/etc. is the likely intended or "canonical" entity. I find that the rankings from the search API are not nearly good enough for this.

Consider a recording search for "smells like teen spirit". Any human with cursory knowledge of pop music would point you at the Nirvana single from the 1991 release of Nevermind (in this particular case, it's likely even true in every locale). But MusicBrainz has no notion of popularity, common sense, or the real-world context of any of its entities, so the recording from Nevermind isn't even on the first page of results. Heck, the first result isn't even a Nirvana recording. The second result is from an obscure live bootleg album. In my opinion this should be considered a bug. This stuff matters!

This is an area I've dedicated a lot of time to when integrating MusicBrainz with my project, and it strikes me as something that MetaBrainz could spend time on to make the platform more accessible. Answering simple questions about music is currently quite difficult to newcomers on account of the overwhelming amount of data. Consider a world where it's possible to stream every recording from every release in the MusicBrainz database: it should be easier to make "Alexa, play Dark Side of the Moon" work without it needing to ask whether I mean the 1994 Netherlands CD release.

(FWIW, it's totally possible to build these heuristics on top of MusicBrainz today, but having better built-in support for determining this stuff would be nice. Spotify is absolutely amazing at figuring out what song in its entire catalog should be the top result even when I've only typed a few characters.)

"Canonical" can't apply to music releases in an objective and definitive way.

Context certainly matters (first, modified, compilation, remaster, remix, audiophile pressing, and so on) but you can't even nail "canonical" to first release, especially for singles, because there may be early promo mixes, radio mixes, vinyl mixes, iTunes mixes, and so on - all mastered differently.

Most people's idea of "canonical" is really "The version I want to hear without having to specify other details". But that's subjective and likely to be significantly different for some non-trivial percentage of users, especially in different territories.

Spotify probably just makes an informed stab at "most popular" - which is a good heuristic and will work most of the time, but is hard to calculate when you don't have Spotify's stats.

We may never have the stats Spotify has, but we are trying to get listening information via the in-development ListenBrainz: https://listenbrainz.org

I'm not sure when/if we'll be able to tie it in with MusicBrainz directly, but for someone like exogen, ListenBrainz may be a good basis to figure out relative popularity of various Recordings/Tracks regardless.

You've described the issue pretty well, and I understand (and agree with!) all of that – like I said, I've devoted a LOT of time to solving this.

> Most people's idea of "canonical" is really "The version I want to hear without having to specify other details". But that's subjective and likely to be significantly different for some non-trivial percentage of users, especially in different territories.

Yup! You are describing the problem literally any search engine faces. And yet, Google/Bing/etc. provide pretty smart results. So, do you think the "Smells Like Teen Spirit" recording by Francis Drake is the BEST first result, as MusicBrainz says it is? Is a live bootleg recording the BEST second result? In any locale? MusicBrainz is NOT primarily a search engine, but all that data has very little value if people (and other software) can't actually find it! This absolutely harms adoption.

OK, so we might not need to nail down a "canonical" version when we live in a world with search ranking scores. I totally realize "canonical" is a bad word choice on my part – but it's really how people think of these things!

> Spotify probably just makes an informed stab at "most popular" - which is a good heuristic and will work most of the time, but is hard to calculate when you don't have Spotify's stats.

I bet they do it that way too, but I think you're throwing in the towel way too early here. :) I have a system that works amazingly well and nearly always chooses the most likely intended recording without any listen count data. MusicBrainz has a LOT of data available to it, what type of heuristics might make sense here? I use a ranking system that takes all these factors into account and, like Lucene, assigns a score:

• Number of releases & release groups the recording appears on (the most well-known recording is more likely to appear on additional albums like compilations, and more likely to be widely released in lots of countries).

• How old the release is relative to the other search results (earlier matches are more likely to be the original).

• Whether the recording is from a release with a "single from" relation to another album (the target LP is more likely to hold the recording we want).

• Whether it's from a release that's an Album or EP (positive weighting), or Live (negative weighting), whether the recording ONLY appears on Compilation albums (negative weighting), whether it's any other type of release like Bootleg (strong negative weighting).

• Whether the recording has ISRCs entered for it (more well-known recordings are more likely to have ISRCs in the first place, and also more likely for people to have entered them into MusicBrainz).

• Whether MusicBrainz users have entered any tags and ratings for it (weak but positive correlation with how popular it is).

• Domain-specific string similarity metrics; essentially, query expansion that makes sense specifically for song titles & artist names. This lets certain matches remain equivalent when it makes sense (e.g. "mambo number 5", "mambo no. 5", "mambo #5", "mambo number five" should all be exactly equivalent in terms of string matching. Lucene does some of this already of course, but not nearly enough – I have a query expander with hundreds of examples where Lucene does a worse job)

I can think of more too, that my system doesn't currently use. All that's without relying on any external data source! But if you want to go one better, it's also possible to correlate results with other APIs like WikiData, DBpedia, Spotify, YouTube…

In most cases, I've found that there's enough of a delta between the top score and the second-best score to determine which one is "correct". (Yes, that word, I know…)

Ideally MusicBrainz would be on par with a human expert in determining which recording you most likely meant, and I believe that it CAN do this today, but it doesn't.

Note that our current search server software is in "minimal maintenance" mode. We're working on a replacement which will hopefully allow for a lot of improvements to search rankings etc., but a lot of other things have higher priority (like actually being able to serve requests in spite of getting hammered by bots and spammers).

Of course, MusicBrainz is an open source endeavour. The old search server maintainer was a volunteer from the community. If you believe you can do a better job at running our search server, please join us in #metabrainz at Freenode and introduce yourself.

The more hands we are, the more we can lift. :)

Also, note: in theory MusicBrainz already has metrics for the number of clicks, views, lookups, and edits certain entities get through their site and API. I bet these are strongly correlated with listens/popularity.

>In theory MusicBrainz already has metrics

What does "in theory" mean here? Do those tables exist in whole or some part? Is this a matter of indexing an existing data set or hoping some data was acquired by accidental consequence?

I'm assuming it already exists due to the existence of pages like https://musicbrainz.org/tops/mb_top_stuff.html and https://stats.metabrainz.org, I'm just not positive of it. :)

Even if it's not collected though, it's data that they at least already have the ability to collect by simply flipping a switch, as opposed to spinning up a whole new ListenBrainz service and hoping it gains traction.

Oh, and the mp_top_stuff.html page was created less than a week ago.

You're absolutely right that we could, in theory, have that data, but we do not currently. Not in any usable form (for this purpose) anyway.

Note that ListenBrainz may be used for getting "popularity" metrics, but that is not its intended goal.

I agree, musicbrainz is absolutely awesome.

I use it daily for a couple of year now (to clean up the tags in my collection - quodlibet integrates nicely with musicbrainz for that) - and very rarely I am playing something that's not there.

When this is the case I try to add/edit the metadata, but most of the time, they're way ahead of me.

I believe.. Spotify got sued and paid for it.. And now they bought an Ethereum Startup to fix the problem of paying royalties.


Hashtags don't work on HN, sorry.

I've been contributing data and code to MB and it's sibling projects for over two years now and the community has been great from day one!

Just to name a few of the other projects, there's AcousticBrainz [1] collecting acoustic information which may be pretty useful for machine learning, CritiqueBrainz [2] for collecting user reviews of songs, albums and more, ListenBrainz [3], an open scrobbling service a group of people including former last.fm employees initially hacked together in a weekend, and finally BookBrainz [4], which tries to be what MB is but for books.

During the last year the people running MB have worked on getting companies using the data to support the project resulting in a quite impressive list of supporters [5] including big names like Google, Spotify and the BBC.

MB has also collaborated with our fellow data nerds over at the Internet Archive to create the Cover Art Archive. [6]

In general the project is run by people who equally love both data and hacking. Feel free to stop by on the IRC channels #musicbrainz and #metabrainz on freenode!

[1]: https://acousticbrainz.org/ [2]: https://critiquebrainz.org/ [3]: https://listenbrainz.org/ [4]: https://bookbrainz.org/ [5]: https://metabrainz.org/supporters [6]: https://coverartarchive.org/

Cover Art Archive is an amazing database and I try to contribute as often as I can. I have been able to find entire sleeves including lyrics, short anecdotes etc. for a lot of obscure music I have.

Wow. This brings back memories. At uni in the early 2000's I hacked up a geeky "last.fm" inspired music stat service. The idea was to be able to reliably track music being played without needing a plugin for winamp/foobar2000/other media player and without needing the mp3 file to have meta data.

I lightly modified a version of the Filemon driver from Sysinternals and wrote a little C program that used the driver to monitor for mp3s being played and then grab the perceptual audio hash of the file using trm.exe from Musicbrainz. It then sent the resulting fingerprint off to my website (written in glorious PHP3 no less!) and you could login with an account to see stats on the music you'd been listening to (done with meta data pulled from Musicbrainz).

Surprisingly, it worked reasonably well ...though very sure if I looked at the code now I'd run away screaming.

Really cool to see they're still going strong after all these years!

FWIW, you may be interested in ListenBrainz[1], an open alternative to Last.FM. We're just about to launch the beta which should be the milestone for when we'll promise to keep submitted listens around Forever™. :)

[1]: https://listenbrainz.org/

Great work! Could you comment on the difference between your offering with ListenBrainz and LibreFM?

Libre.FM sets out to be a ~1:1 (open) "clone" of Last.FM (or least the AudioScrobbler part of Last.FM), while ListenBrainz aims to improve on Last.FM/AudioScrobbler. E.g., the AudioScrobbler protocol only allows for a given subset of metadata items to be submitted, while ListenBrainz's native API allows you to submit basically all the data you have on the file.

Compare http://www.last.fm/api/show/track.scrobble 's 7 item specific metadata fields (artist, track, album, trackNumber, mbid, albumArtist, duration) to https://listenbrainz.readthedocs.io/en/latest/dev/json.html#... - as ListenBrainz is part of the MetaBrainz "umbrella", one of our own main highlights is that we can now actually submit all MBIDs associated with a file, not just the Recording MBID (ie., Artist MBID(s), Release MBID, Release Group MBID, Track MBID, Work MBID(s), possibly Label MBID(s), etc., etc.), but also stuff like language, performers, AcoustIDs, ...

Also, ListenBrainz is linked up with MessyBrainz[1], which should work as a buffer to have even listens submitted without MusicBrainz identifiers be able to eventually get linked up to the MusicBrainz database.

[1] https://messybrainz.org/

Hah, I did my own stat service too, inspired in early 2000's by AudioScrobbler and Last.fm before they joined forces. I was disappointed by the completeness and variety of statistics they offered, so the only option was to create my own.

My solution was to fill in full metadata for all my tracks, though, a pretty big task in itself that I also had to half-automate to achieve. I did consider how to release the stats service for others, too, but realized the number of people with impeccable metadata would be way too low.

My system is actually still running in the background, listening for MPRIS D-Bus track change events and writing them to a simple text file, occasionally flushing the changes from the file to a database for the stats display service -- written also with PHP like most web interfaces of the time.

What (and when something) ends up on the first page never ceases to surprise. I've used this I don't know how long. Could it be 15 years? Their official tagging client (Picard) is OK, but I prefer tagging using Mp3tag and the MusicBrainz database.

I fear now an entire generation (Gen Z) grows up with services like Youtube, Musical.ly, Spotify and iCloud/GoogleCloud for their photos.

They never interact with images and audio files, they don't know about metadata at all. They don't use notebook or a PC. They are vendor locked to iOS or Android. They are not dumb, but less and less have the initiative search around and try out new things outside the box. They stay inside their apps, they don't know the vast web outside that can be searched with Google search engine. (it depends on parents and schools to inspire them to try out more)

Aww piffle. :)

It's not like the previous generation of people all explored the vast web. They didn't. They didn't even use it until relatively recently.

The percentage of those that are intrigued by technology and have the wanderlust to explore the digital landscape are probably exactly the same (perhaps more now) as the previous generation. Those people that you refer to as "vendor locked" now would never have even used computers in the past generations, or used them only for Office apps.

Seconded. mp3tag and the MusicBrainz database are marvelous. I used to have a music collection of 25.000+ tracks, and once I discovered aforementioned tools it was an absolute blessing, especially for tracks that that had poor (or non-existant!) tags.

I'm fully into Spotify now, minus the 1000 or so tracks I couldn't match, but damn, does talking about this take me back. Lugging around my 160Gb iPod Classic and still not being able to fit all my music. UT2004 instagib & Counter Strike LAN parties and swapping entire media libraries.. movies and series included. It was a fun time :)

I upvote many things that I already know of to let others become aware of them. It's a bit like xkcd #1053 [1], when I encourage others to read about something, they might then write comments about that thing offering new insights etc. that is of value to me. Additionally, them having become aware of that thing now might lead to them caring about that thing and doing something related to that thing that is of value to me in the future. But really my primary motivation is simply that I think that things I like deserve that I upvote them so that others can enjoy those things too regardless of future utility to me.

[1]: https://xkcd.com/1053/

Try Jaikoz (http://www.jthink.net/jaikoz/). It's Shareware but very capable. Uses MusicBrainz and Discogs, fingerprinting and whatever.

I'd just like to reiterate how utterly amazing MusicBrainz is. It's so extremely useful that I decided to make it the backbone of a new playlist format I developed[1], one which (roughly) uses MusicBrainz IDs instead of filenames for playlists.

This makes playlists resistant to filename changes, moves, or even losing all the actual audio tracks and having to buy them again, all because MusicBrainz provides so accurate metadata.

[1]: http://universalplaylist.stavros.io/

I really like the idea of your universal playlist format, are there any players that support it?

None yet, I'm afraid, although I'm writing a beets plugin to convert from pls to UPL and back. The problem with the plugin is that it's just not that useful unless you have a whole playlist manager to go along with it, which beets doesn't do very well right now. I'd be very happy if there was a player with good playlist management functionality that would support it, or that I could write support for, but I don't personally use any...

I love MusicBrainz and have been using it for a project of mine for the past few years. In the course of developing that project, I ended up making a GraphQL interface to the MusicBrainz API: https://github.com/exogen/graphbrainz

You should try out the demo queries linked from that README if you want to get a sense of the depth of information available in their database.

I've been using MusicBrainz' Picard to tag my music files (that I acquired 100% legitimately, I assure you.) for a few months now.

They seem to have everything I throw at them, except for: 1) Extremely new releases (on the order of a-few-hours-after-release) 2) Some niche songs that haven't been officially released (soundtracks for some Korean television shows)

I also use it to organize music. Once it identifies something you can have it save to a artist/album/00-name type pattern (the path is configurable). Before I had issues like track listing problems, artists or albums with 'and' vs '&', capitalization inconsistencies etc. This resolves most things like that.

How does that work? How do you use MusicBrainz to tag music files?

> MusicBrainz Picard

is a metadata editor, doing both audio fingerprinting and manual tagging(while fetching data from musicbrainz) where that is not available: https://picard.musicbrainz.org/

It's good that MusicBrainz exists as open data project and continues to stand up against Sony America & Sony DADC defacto monopoly on audio+video metadata and digital supply for the media industry.

MusicBrainz is the third project of it's kind. Two previous older projects got bought by the media industry (Sony and Magix). Such a database gets useless if it doesn't receive updates.

First there was CDDB, short for Compact Disc Database, is a database for software applications to look up audio CD (compact disc) information over the Internet. This is performed by a client which calculates a (nearly) unique disc ID and then queries the database. As a result, the client is able to display the artist name, CD title, track list and some additional information. CDDB was invented by Ti Kan around late 1993 as a local database that was delivered with his popular xmcd music player application. CDDB is a licensed trademark of Gracenote. In March 2001, CDDB, now owned by Gracenote, banned all unlicensed applications from accessing their database. As of June 2, 2008, Sony Corp. of America completed acquisition (full ownership) of Gracenote. https://en.wikipedia.org/wiki/CDDB

Then there was freedb. freedb is a database of compact disc track listings, where all the content is under the GNU General Public License. To look up CD information over the Internet, a client program calculates a hash function from the CD table of contents and uses it as a disc ID to query the database. If the disc is in the database, the client is able to retrieve and display the artist, album title, track list and some additional information. It was originally based on the now-proprietary CDDB (Compact Disc DataBase). On October 4, 2006, freedb owner Michael Kaiser announced that Magix had acquired freedb. On June 25, 2007, MusicBrainz – a project with similar goals – officially released their freedb gateway. The latter allows users to harvest information from the MusicBrainz database rather than freedb. https://en.wikipedia.org/wiki/Freedb

Heh. I was on a team of Amazon engineers in Edinburgh back in 2007 who were tasked with building "another IMDB that we can sell ads on", and we ended up using a MusicBrainz dump to start up a music encyclopedia website. The idea was to take the raw data but organise it in a more user friendly way, add easy click-to-edit user participation and gamification, etc.

I remember seeing Robert Kaye wandering around the office when he visited us to talk licensing terms, although as the most junior employee I didn't get to talk to him myself. We also chatted to Col Needham, the founder of IMDB, and asked him "so, how do you become a massive media-encyclopedia site?"; his answer was "it's easy, just start 17 years ago."

Really we had no idea what we were doing, and although we got some surprisingly dedicated users (we sent T-shirts to a couple who'd contributed hundreds of thousands of edits!), the site folded after a few years.

I'm very glad to see that MusicBrainz outlived us and continued to thrive :)


Heh. It's about 17 years since MusicBrainz started now. :)

I discovered MusicBrainz Picard about a year ago and it handled my collection pretty flawlessly.

I was always wanting to know since then if there are other maintained/curated music databases.

I also didn't realize at first that they offer a public API. The Picard client was decent, but I'd be interested in a command-line solution. Does anyone know if this exists?

I will wholeheartedly echo the sibling's suggestion for beets (http://beets.io/).

No music makes it into my collection unless it's been imported via beets. It has a powerful import/query/alter API, sufficient config options, and a nice plugin system.

I am not entirely sure this is what you are looking for, but Beets [0] is a really cool command-line music organiser which uses the MusicBrainz database.

[0] http://beets.io/

Full props to Robert Kaye, the founder. He's been raising this child for 15 years now.

I don't think I know anyone who loves open data and transparency more than Robert. MusicBrainz is open source to the core and all the development as well as most of the business stuff happens on public IRC channels and our JIRA.

It's been a crazy time lurking on IRC and seeing him juggle both huge technical problems that are worth more than a few blog posts as well as most of the business work for the last two years (until he recently hired someone to handle most of the latter).

Of course one shouldn't make this too much of a personality cult and there's always been an amazing team working with him but I can't imagine anyone other than Robert taking MusicBrainz to what it's become today.

They had a pretty close call at some point and nearly died. All is well that ends well.

There also was freedb (http://www.freedb.org/), not sure if that is still kept up-to-date.

It seems like MAGIX Software is now running FreeDB.

As far as I understand it, it's data quality really isn't great and according to their statistics [1], they've only had just over 7,000 album requests last week.

[1]: http://www.freedb.org/en/statistics__album_requests.14.html

I used the MusicBrainz API a while back for a side project that got me sued for some reason (http://tcrn.ch/2rEox3h).

As I recall, it was pleasant to work with and did what I needed it to quite nicely, aside from a feature that my code had depended on being removed — anonymous/unauthenticated search — at which point the project was already basically dead and not worth trying to fix (that was just the last nail in the coffin). In any case, nice to see that it's still active.

Unauthenticated search is still possible and allowed, just as long as you use a proper User Agent string. See https://musicbrainz.org/doc/Development/XML_Web_Service/Vers... "Do I need an API key?" and "Do I need to provide authentication?" (In fact, I don't think there is a way to do authenticated searches.)

So, given that you had already predicted you'd get sued, who sued you and what for?

Well, I hadn't exactly predicted it — the quoted joke was from the top of the site's FAQ page, meant to imply that users of the service would get in trouble for being complicit in copyright infringement / piracy — but it also wasn't all that surprising when it happened. I was sued by the current Napster (https://en.wikipedia.org/wiki/Napster_(streaming_music_servi...) for alleged trademark infringement.

In the end we settled before it ever went to trial, so unfortunately that quote never found its way into a court transcript.

The code that runs it is open too: https://github.com/metabrainz/

always nice to see something of such utility pop up. musicbrainz has most assuredly been around for what seems like forever now, and there's a reason for that. their tag database is second to none as far as im concerned. unfortunately for me, the only music I keep locally is my own music that I've made, and I can almost guarantee that that wouldn't be on there. plus i tag all my music properly to a point that might seem religious and obsessive because I hate music files without metadata (which is why I export in mp3 as well as wav; wav for higher quality, and mp3 for labeling purposes; I could probably just use flac but compressed audio like mp3 also has the benefit of being less space intensive).

either way, nice to see it

If you want your own music to be properly tagged with MBIDs you could just add it to the database yourself!

Unlike Wikipedia, there is truly nothing too niche to be part of MB's database. My favourite is a CD with the music that plays while you're on hold calling Lidl Spain. [1]

[1]: https://musicbrainz.org/release/c9a6a6e5-4bb8-4738-a1d6-411f...

> which is why I export in mp3 as well as wav; wav for higher quality, and mp3 for labeling purposes; I could probably just use flac but compressed audio like mp3 also has the benefit of being less space intensive

You could replace the WAVs with FLACs and save a bunch of space on that account. Maybe enough that you realise you don't need to keep both lossless (WAV/FLAC) and lossy (MP3) copies around. Since FLAC is lossless, it will have the exact same audio quality as the WAV files they'll be sourced from. (And, as you mention, FLAC supports tags directly.)

If all you're using MP3 for is the space saving and metadata, you might like ogg/opus. It's the Opus codec (the state of the art in audio codecs right now) in an Ogg wrapper.

You'll find a portable player capable of handling vinyl before you find one that plays ogg/opus, but desktop software is mostly fine.

Not quite, you can install Rockbox on pretty much any portable player. It supports Opus and FLAC among many other formats.


VLC on Android supports opus. Unfortunately, most other players don't (I think this is a actually a bug in the Android media scanning library and not the players themselves though)

If an open source music format isn't a requirement for you, AAC is as efficient as Opus and playback support is universal – no version of iOS or Android has ever shipped without AAC decode support, for example.

Mp3tag also handles MP4/M4A files wonderfully.

AAC when well encoded sounds great, but there are too many sub-par encoder implementations that just don't sound good - sometimes worse than MP3.

If you're encoding AAC use either Apples encoder, or one of those from Fraunhofer:- Fhg-AAC available as a paid plug-in, or free with WinAMP, or the Open Source FDKAAC implementation originally for Android, but now available cross platform.

> …there are too many sub-par encoder implementations that just don't sound good - sometimes worse than MP3.

Whoa, I'd never imagined that could be the case and appreciate the heads-up. I've only had experience with the Apple and Fraunhofer encoders so far.

Mind calling out the offender(s)? This summer I'm releasing a best practices guide (and hopefully tools) to help podcasters move to AAC, and that would be useful to mention.

The FFMpeg wiki has some information, including calling out its own "vorbis" encoder as sub-par and not recommended (but apparently they also support a different "libvorbis" encoder that does not suffer from problems?) See https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio

Also for AAC in particular there are a few options https://trac.ffmpeg.org/wiki/Encode/AAC

One workaround is to put your Opus audio in either a webm or mka container.

Offtopic question: Is there a similar tool for managing or tagging metadata for movies and television?


Very powerful file renaming tool that also leverages TheTVDB, AniDB, TheMovieDB, etc for TV and movies. Both GUI and command-line versions available, I always use the (free) command-line version even though I paid for the GUI.

It also accepts scripts, has smart/fuzzy matching, and is an all-around good renaming tool for other things too.

For television, you might find http://thetvdb.com/ does what you want.

If your filenames are already decent and you just want to be able to better browse/manage your content, I fully recommend Plex. It'll pull down names, artwork, descriptions, and everything else.

tvnamer uses thetvdb.com's API in a similar way to picard / musicbrainz.

It's a great tool. I had one problem / wishlist that I couldn't work out, and got a response from the author a few minutes after filing an issue, explaining how to solve my problem.

There's some risk that thetvdb.com's change of API later this year may be a problem, but I suspect a low risk.

https://github.com/dbr/tvnamer , with packages available for most distros AFAIK.

I hope these guys don't flip like Gracenote did...Gracenote was all crowdsourced user contributions (for a long time at least) but then they closed off the data and sold it to Sony for $250+ million.

MusicBrainz was created partly as a reaction to Gracenote's handling of CDDB; see https://musicbrainz.org/doc/About and https://musicbrainz.org/doc/About/History

It would be completely against the spirit of the project to close in on itself, and as Leo_Verto mentioned, also pretty hard. With the core data available as CC0 and all the source code needed to run the servers, anyone could legally take all the data and set up a "LibreMusicBrainz" in some hours in the unlikely event that the MetaBrainz Foundation (the organisation created to support MusicBrainz and the other *Brainz projects) should ever flip.

All MB data is licensed as either CC0 (for core data) or CC BY-NC-SA for user-related data. [1] As far as I understand it, this can't be changed unless every user who ever edited data agreed to that.

[1]: https://musicbrainz.org/doc/About/Data_License

Whilst it is great to be able to tag music files with masses of MB metadata, I have a feeling that the true value of the MB database has yet to be realised.

Because of the underlying design and relationships between albums and recordings and musical pieces (or works), once it reaches some level of critical mass you can start to mine the data for things like:

Who has recorded versions of Vivaldi's Four Seasons Spring in London?

Which artists have recorded both Greig's Piano Concerto and Chopsticks?

Who has recorded "A Day in the Life" other than by the Beatles?

A related site - freesound.org

Both MusicBrainz and Freesound are truly international in scope. They cover metadata and sound for Indian classical music and such genres too. The CompMusic research team publishes to both of these.

Edit: CompMusic url - http://compmusic.upf.edu/

One of my side projects is a music recommendation system. Music brainz has been great for this. Tying together all the music services out there. In addition the biggest perk is you can do a slave of their database, and have it replicate on an interval.

Are you aware of the AcousticBrainz[1] and (upcoming/in-development) ListenBrainz[2] projects? We at MetaBrainz (the organisation behind MusicBrainz and the other *Brainz projects) really hope that the combination of data MusicBrainz, AcousticBrainz, and ListenBrainz will enable powering a lot of open recommendation engines. :)

[1]: https://acousticbrainz.org/ [2]: https://listenbrainz.org/

No I was not thank you for sharing :).

Any more info on your recommendation system? Anything we can check out?

It's dead in the water right now. The code base I wrote is atrocious it was my first python learning project. I was initially working on a system that added stuff to MPD. Then pivoted to generating spotify playlists. Then last week something changed adding songs in my mopidy from spotify stopped working.

I guess it's important to start with my listening style. I go something like David Guetta -> Frank Sinatra -> Depeche Mode -> Slipknot -> Rocky Horror Picture show. This is a very nomadic pattern. I essentially wanted a really random system of music. I had alot of thoughts on this, and sort of wish I had more time to work on this almost full time.

* In built positive weighting of the song. If you try to increase volume, re-queue that song up. It's done well. * In built negative weighting of the song/selection. If you skip within the first minute of the song, negative weight that genre and song. This goes into. * Smart shuffling. None of this random integer crap. I want it to see there are X genres in my playlist. The next song will be of a different genre, year, and country. * Proper geographical distribution. If you're skewing towards just the US then blacklist adding artists from US. My system went on a south african kick for whatever reason, and <3. * Ease discovery, you're having new artists pop up all the time. This was the web interface I was working off of. I have a surface pro at my desk with the frontend. Easy access to history, lyrics, etc. * Analyze the song BPM, Key, Wave pattern etc. * IBM tone analysis of lyrics to denote whether you want happy, sad, or whatever style of music. * The frontend tied into youtube, providing a link to the official music video if present. * Show a tree of how the artists were added. My absolute favorite was 32 levels deep going from DropKick Murphies to Blondie.

I'm a heavy spotify user. Weekly Discover is usually sticking to EBM music for me. Discover weekly, rarely gives me anything new. I want to listen to stuff from Korea, India, Africa, the 20s, 50s and now. None of the systems I've seen work like that. I had a couple of friends use it, and they really like it barring the MPD requirement.


TouchaTouchaTouchMe and HarmonyGen

This was done once before. It was called CDDB.[1] That went from open to limited access to totally proprietary. Fortunately, this new one is under GPLv3, which makes it tough to pull that one again.

There's FreeDB (http://www.freedb.org) which does roughly the same thing, starting from the old CDDB database before Gracenote, and then Sony, bought it. Their database dump is supposedly available.

[1] https://en.wikipedia.org/wiki/CDDB

GPLv3 is a software license, i.e. not appropriate for data. I suspected the folks behind MusicBrainz would know that, so I did some looking around and found this:

"The MusicBrainz Database is split into two components for licensing purposes.

"Core data

"The core data of the database is licensed under the CC0, which is effectively placing the data into the Public Domain. This means that anyone can download and use the core data in any way they see fit. No restrictions, no worries!

"Supplementary data

"The remaining portions of the database are released under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license. This allows for non-commercial use of the data as long as MusicBrainz is given credit and that derivative works (works based on the CC licensed data) are also made available under the same license."

Source: https://musicbrainz.org/doc/About/Data_License

I've been trying to figure out which music metadata database is worth my time "improving", since there are three that are commonly used. MusicBrainz, Discogs and Rate Your Music. I use Discogs currently because you can expect high quality metadata, and I use that data in a Foobar2000 plugin to tag my music correctly.

It's the constant questioning I do for Wiki sites, since there are multiple for most subjects. Am I alone in this struggle? I wouldn't mind being talked out of using Discogs for the sake of creating / managing metadata that will be the most useful.

I tried to play with the API, but I almost always get "The MusicBrainz web server is currently busy. Please try again later.".

Is there a more reliable way to query this data without running a full server on your own?

MB is a victim primarily of its own success - with a lot of Kodi clients requesting data as well as MB's own tagger, Picard - but also of spammers leeching bandwidth.

This is not the first occasion when demand has exceeded capacity, but any capacity added soon gets swallowed up.

Suffice it to say that the MB team is urgently looking into how to stop this happening.

What about http://linkedbrainz.org ? Is it a dead project?

LinkedBrainz is not something that the MetaBrainz team is directly involved with, however, according to themselves, «[they are] back, if basic for now.»

(Note that there's also a "GraphBrainz" project (also not something MetaBrainz is involved with), posted about earlier here: https://news.ycombinator.com/item?id=14479031 )

Looks like GraphBrainz stands for "MusicBrainz + GraphQL". Unfortunately, this has nothing to do with RDF.

mappable.com uses this to allow users to build a hierarchical artist/album/song voronoi exploration tool.

Datomic, Rich Hickey's majestic datalog-driven, time traveling database uses musicbrainz data for their tutorials. Check it out if you want to play with this data in a really novel way.

Link? I tried looking around in http://www.datomic.com/support.html but I couldn't find any references to MusicBrainz (or music data of any kind).

Thanks! Looks like it's a fairly old post (and thus working with an old version of the schema), but still interesting to see how it's being used "out there". :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact