
The music classifying nightmare - ux
http://blog.pkh.me/p/15-the-music-classifying-nightmare.html
======
redbad
You're making the problem a lot harder than it needs, or ought, to be. The
most telling example, IMO, is the Aphex Twin one...

    
    
        > Oh, and his real name is Richard David James. What are you 
        > supposed to use for the file system directories and files 
        > name? His name? The most common nickname? Both? One file 
        > system solution is to have symbolic links (do you link 
        > Richard David James to Aphex Twin, or vice versa?). For 
        > tags, if you don't want to lose information, this is 
        > another story...
    

You should—obviously—not attempt to "link" AFX to Aphex Twin to GAK to Blue
Calx to etc. etc. The artist behind all of the monikers has made a deliberate
decision to release work under different names. Organize accordingly.

Many of your nightmare scenarios appear to be a result of the same kind of
over-thinking, or invention of nonsensical requirements. How are you supposed
to deal with Japanese artist names? It literally doesn't matter—pick a scheme
you can understand, and be consistent. How are you supposed to deal with
multiple artists? List them, separated by commas, in the artist field. If they
appear on an album released by a different group or person, use the "album
artist" ID3 field. And since (if?) you use the ID3 fields to store your
metadata, and presumably navigate your collection through an interface over
that metadata, all of your questions regarding how to store files physically
on disk are totally irrelevant, as long as you pick some scheme which doesn't
generate conflicts. The default iTunes structure (Artist/Album/01 - Song
Title.mp3) seems to work fine, for example.

~~~
ux
Well, it's the same person behind. Even if the artist tried to give a
different "personality" to his music over the time, following all his
creations is actually a good listening experience. The "hey I just want to
listen to the musical evolution of this guy" isn't a rare feeling IMO.
Grabbing all his nicknames is kind of a problem.

Of course, you have a point, I'm making the problem harder than it needs, and
you can just don't care about most of these issues. Hey, that's actually to
the conclusion I reached. But still, I believe the current solutions are not
optimal if you want to match N songs, store all the related "context"
information, or just keep a consistent way of storing them.

~~~
mquander
But that's not a problem, either. In my music collection I have, e.g. tags
ARTIST=AFX and FILED_UNDER=Aphex Twin. Then when you make a playlist with your
favorite music player, you can just sort things by FILED_UNDER, if you prefer.
What's so hard about that?

~~~
ux
Interesting, so you have a way of making the relationship at the end. Now I
could start nitpicking about how you decide to make the link to Aphex Twin and
not his name (or another nick), and how would you make that decision for any
similar cases.

Note that I don't consider this issue the main problem, it's just one I hit a
few times, and I wasn't able to select a correct solution.

~~~
adamzochowski
Discogs tracks Alternative Nicks that artists use, and can help you link songs
together. Here is Aphex Twin example:

<http://www.discogs.com/artist/Aphex+Twin>

Also, discogs will track which albums were released under different artist
tags. Foobar+discogs tagger combination lets you pick, if you want to use most
common general name (so all tracks are Aphex twin), or the alternative name
that songs were released under (aka: AFX songs stay as AFX).

\---

This is same issue as a book ''Running Man'' (yes, followed by a movie with
Arnold). Was it written by Richard Bachman, or Steven King? How does your
public library list book?

~~~
dangravell
Same for MusicBrainz: 'aliases' <http://musicbrainz.org/doc/Artist#Alias>

------
saucerful
Have you tried Quodlibet? <https://code.google.com/p/quodlibet/>

It comes with a Musicbrainz plugin which allows you to select an album in your
library and search for it on musicbrainz (e.g. by artist name and year or
number of tracks-- so you really dont need to have much info) and then you can
select a match (there are usually many releases of a given album and it will
actually differentiate between them) and then tag your album with the
musicbrainz tags.

In fact you don't even need to use quodlibet to get this feature. It has a
separate tagging component called "ex falso" which you can run standalone and
then use the player of your choice.

But I would strongly recommend Quodlibet for its organizational capabilities
as well. For example it uses special internal tags (not id3 but stored in a
separate db) that allow you to associate "people" and "performers" to a track
so that the track will appear when you search for any of those people.

Also there are sort tags that allow you to customize where stuff shows up,
e.g. I can have a track with Artist tag "London Symphony Orchestra", composer
tag "Ludwig von Beethoven" (the proper ID3 tags) BUT I give it the artistsort
tag "Beethoven" so it shows up under B. Perfect!

See <https://code.google.com/p/quodlibet/wiki/AudioTags> or more info.

Lastly it has regex search! And you can make "saved searhes" e.g. playlists.
And it's lightweight and has an uncluttered (but highly customizable)
interface. And it's very easy to write plugins for it.

~~~
ux
Nice, I'm sure a lot of people would be interested in trying this.

But talking to myself, I'm disgusted with all of this, so I'm just maintaining
my mess in its current state for now.

Still, I like to see such solution, and I'd be really interested in a counter
article to what I wrote dealing with each issue. Even if at the end, I will
likely not use the given solution.

About the regex search, I'm not sure that's really the solution to the
"textual problem". As mentioned in the article, the music content retrieval
system is in my opinion the future. Echonest and similar services are trying
to achieve something like this. Looking for one artist isn't really what you
actually want most of the time. It's likely you are looking for good music,
and just want to listen to things who sound "like this".

~~~
saucerful
Sorry-- I was unclear. The regex is not for organization. It's just an easy
and flexible way to browse subsets of your library, like smart playlists in
iTunes, but more powerful.

~~~
ux
I was actually talking about browsing. The musical content analysis has the
goal to provide new ways of representing your music, and browse it.

------
cletus
I only go so far as sanitizing and standardizing my music collection through
Tag&Rename (and I haven't found a good OSX equivalent to this yet sadly). It
gets the data from Amazon in 98% of cases, adds the album art (which I like
having on my player), etc. Then I store the files in:

Artist/Album/Track# - Song name [- Artist name]

The last is only there for soundtracks and other "Various Artists" type
collections.

This is Good Enough [tm] for me. I can sync this across hard drives (backup),
minimize duplicates (although I end up with these through compilations of
various sorts), etc.

Unfortunately the ID3 tag system is All Wrong [tm] for this in many ways
stated (in this post and elsewhere). For example:

\- Albums don't really have an artist; songs do;

\- Programs for automating this that get info from Amazon and elsewhere tend
to use what year the particular CD was released rather than when it was
_originally_ released, which is far more useful and relevant (eg if you want
the Beatles White album you don't care the CD was released in 1998, it should
come up under 1968;

\- Albums don't really have years either. Or at least they have publication
years. The songs have years. Normal studio albums have a common year.
Compilations and soundtracks do not;

\- Genres are coarse-grained, arbitrary and (IMHO) mostly useless;

\- What I like is greatly influenced by the circumstances around the song,
sometimes more than the song itself. I might like a song because it reminds me
of a particular person, place or event. Or even _mood_. Sometimes its the
lyrics. Sometimes its the sound. No recommendation engine is going to capture
this sort of angle.

This goes beyond music: people just aren't interested in classifying, well,
pretty much anything. Playlists seem to be about as far as most people are
willing to go. Playlists are a fairly convenient way of coming up with s
event-specific music eg for working out, for relaxing, for dancing, for a
party, etc.

Efforts at far strong and more accurate metadata, classification and
organization speak more about one's festidious--even anal-retentiveness--more
than any real need or better outcome (IMHO). It's just rabbit-holing really.

~~~
wrekkuh
I'm not sure how much of this is conjecture. My thinking differs from yours
with respect to the following -

"Albums don't really have years either. Or at least they have publication
years. The songs have years. Normal studio albums have a common year.
Compilations and soundtracks do not."

An album that is released by a record label does have a year, as upon release
it becomes a publication. By this train of thinking i would also say a
soundtrack does have a year, too, as it is published as a collection timely to
the context of its release. The same can be said of compilations, as they are
a contemporary release.

~~~
qbrass
Yes, but you're usually more interested in when the song came out than when
the album came out.

If you sort your playlist by date so you can pick songs from the 90's, you
probably wanted to hear something like Nirvana instead of The Best of The Who:
Volume 3.

~~~
adamzochowski
I agree that year is screwed up, but some taggers (like the picard-lastfm
tagger) reads also decade feel tags. So music that sounds like 90s is tagged
as 90s music, even if it isn't recorded in 90s.

For example, ""La Roux -- In It For The Kill"" will get the song tagged as
sounding 80s like:

<http://www.last.fm/music/La+Roux/_/In+for+the+Kill/+tags>

------
nwienert
Somewhat related but as I've been building a rails project I've been meaning
to open source the song parser I've been building alongside it. It scans an
mp3 and pulls out the artists along with the type of role they played on the
song. Here's a quick gist I pulled from my model:

<https://gist.github.com/3680949>

Some examples:

Drake - The Motto (Jon Bellion Cover)

=> [["Jon Bellion", :cover], ["Drake", :original]]

David Byrne and Brian Eno - Strange Overtones

=> [["David Byrne", :original], ["Brian Eno", :original]]

Cheri Coke, MELO-X - Free

=> [["Cheri Coke", :original], ["MELO-X", :original]]

Avicii - Street Dancer (Whelan & Discala Remix)

=> [["Whelan", :remixer], ["Discala", :remixer], ["Avicii", :original]]

RAC - Hollywood featuring Penguin Prison (The Magician Remix)

=> [["Penguin Prison", :featured], ["The Magician", :remixer], ["RAC",
:original]]

And a ridiculous example:

Eight, Nine & Ten (Eleven cover - Song name feat. One, Two & Three (Prod. by
Four) (Five & Six remix) (Seven cover)

=> [["One", :featured], ["Three", :featured], ["Two", :featured], ["Four",
:producer], ["Five", :remixer], ["Six", :remixer], ["Seven", :cover],
["Eight", :original], ["Nine", :original], ["Ten", :original], ["Eleven",
:cover]]

If there's any interest, I'd love to turn it into a proper github repo and
accept some pull requests.. it's far from perfect (both code-wise and
generally) but works well for most cases.

~~~
notJim
This looks awesome. You should consider sticking it somewhere and giving it a
name, so that I might find it 6 months from now when I'm motivated to fix my
shitty Python script [1].

Also, a side question: is it normal/considered a best practice in ruby to
monkey-patch built-in libraries like that? I know ruby _has_ open classes, but
I'm just curious how using them in this way is regarded.

[1]: <http://news.ycombinator.com/item?id=4494437>

~~~
nwienert
I put the link to the gist with the code for now, when I get a couple hours
free sometime (soon I promise) I will do just that and put it on my github, so
feel free to watch my account there.

As for patching string.rb I don't know 100% but I believe thats the point of
being full OO is that I can make a patch like that where it makes sense to do
it. I was using those functions in multiple places, and given they are meant
for strings it made sense. I can see the potential pitfalls when it comes to
sharing things like this, but again, Ruby makes it easy for anyone to make the
same patch for a reason.

------
dj2stein9
Use hierarchical directories. They work. I sort my mp3's into _two_ distinct
music collections:

    
    
      /Albums/
      /Singles/
    

Then I sort by genre:

    
    
      /Albums/%GENRE
      /Singles/%GENRE
    

In Albums, I then sort by artist, then by album:

    
    
      /Albums/%GENRE/%ARTIST/%ALBUM/# - %SONG.mp3
    

Whereas in Singles, I organize by:

    
    
      /Singles/%GENRE/%ARTIST - %SONG.mp3
    

This system scales very well. I have a 100GB collection and can nail down any
song or album in my collection in a few clicks.

~~~
adamzochowski
I can see few issues with this that wouldn't work with me:

1) How do you deal with songs that have multiple genres?

It is a standard part of the TCON (content type / genre tag) to separate
multiple values through either coma, slash.

For example this allows you to select songs that are tagged as both 'Ambient'
and 'Rock'.

2) how do you deal with album that has songs of varying styles?

For example: "Prodigy - Music for Jilted Generation" has a dance track like
"Voodoo People", and a chillout track "3 Kilos"

<http://www.last.fm/music/The+Prodigy/_/Voodoo+People/+tags>

[http://www.last.fm/music/The+Prodigy/_/The+Narcotic+Suite:+3...](http://www.last.fm/music/The+Prodigy/_/The+Narcotic+Suite:+3+Kilos/+tags)

3) Compilations

Let's say we have a compilation made by a group called 'Air'. This compilation
is called 'Deck Safari'. Do you use Album Artist instead of artist if it is a
compilations (aka: if TCMP flag exists, use TPE2 instead of TPE1?)

Checkout the album:

<http://www.discogs.com/AIR-Deck-Safari-Part-1/release/215068>

4) I don't understand how you handle singles.

Let's say you have a single ""Wamdue Project ‎– King Of My Castle"" released
in 1999 by Urban (563 891-2). Your formatting doesn't store album at all?

[http://www.discogs.com/Wamdue-Project-King-Of-My-
Castle/rele...](http://www.discogs.com/Wamdue-Project-King-Of-My-
Castle/release/44719)

And then, how do you combine it with the fact that a year later was a
different mix released by Avex Asia Ltd. (AVTCDS-235) with different set of
songs?

[http://www.discogs.com/Wamdue-Project-King-Of-My-
Castle/rele...](http://www.discogs.com/Wamdue-Project-King-Of-My-
Castle/release/1352484)

To me it looks like your singles folder is intended for random songs, not to
handle actual CDS/CDM style singles.

~~~
dj2stein9
1) How do you deal with songs that have multiple genres?

In "Singles" I sometimes copy individual MP3's into both Rock and Metal. It
comes up less often than you'd think.

2) how do you deal with album that has songs of varying styles?

I just pick the one genre that makes the most sense. As long as I know, or can
quickly find which genre an artist is in, that's what matters. How long does
it really take to check if I put an AC/DC album in Rock, or Metal?

3) Compilations

I either place it in /Albums/Soundtracks (if it's a soundtrack), or make a
"Various Artists" directory in /Albums/Electronic or whichever genre makes
sense. Almost all of my compilations are dance or electronic music.

4) I don't understand how you handle singles.... To me it looks like your
singles folder is intended for random songs, not to handle actual CDS/CDM
style singles

I actually removed most of the EP's and CDS rips from my MP3 collection
because I found I never listened to them. I would take the individual MP3's
that I really liked and placed those into my /Singles/ directories. Otherwise
I could treat them just like I do my Albums.

~~~
adamzochowski
I guess I enjoy putting more tags into collection. I try to fit few from
last.fm , and few that came with the tagging. So a single track can have
multiple tags (I think I average 10 tags per track). It is normal to have
things like like 'Symphonic' and 'Rock' and '80s' and 'Krautrock' as separate
tags on one song. Or in case of some electronica (like ones found at
<http://ektoplasm.com> ): 'IDM', 'downtempo', 'goa', 'psy-dub'.

For compilations I try to keep track of who made compilation, especially if it
is a mix by a DJ. I do have many that are just VAs too.

In the end, I have three types of music plays:

\- everything by an artist -- this is why I keep CDS / CDM / EP to have
remixes, and to track them with covers, etc

\- specific album -- this works great for movie scores, or albums that have
songs tailored to go one to another (Jean Michelle Jarre work from 70s and 80s
is awesome in this)

\- party mode -- something like pandora -- last.fm is queried about tracks
similar to current one (and past one), and sets up playlist of similar songs
to play. See <http://forum.xbmc.org/showthread.php?tid=83915>

------
przemoc
What I lack (and I doubt I'm the only one) is well-thought-out tag system and
more advanced players.

All textual entries (title, album, author, ...) should be stored in original
language using original alphabet. Player could transliterate them if user
doesn't know the alphabet (e.g. doing romanization of hiragana, katakana and
kanji using Hepburn system for japanese music). Such entry should be able to
store also a translated text, usually at least English one for non-English
stuff.

That would solve also another problem the blog post author mentions, first
name and last name ordering issue. Quoting Wikipedia:

"In Hungary, along with China, Korea, Japan and in many other East Asian
countries, the family name is placed before a person's given name."

Thus in original language the order would be original, but in English one,
Western-style, i.e. placing last name after the first name (and of course
transliterated already).

It would be up to a user to choose what player should show her or him:
original names, transliterated names, translated names.

But AFAIK ID3v2 and Vorbis don't support such stuff (well, you can try going
with custom keys, but non-standard means mess) and I haven't heard about
player that would do any transliteration either.

\---

As for filenames I think that the best is Latin alphabet, with the most-widely
used romanizations of non-Latin alphabets and simplifications of extended
Latin alphabets (like removing diactric marks etc.). Clean visible ASCII!

I know that Asian people would mostly disagree on such file naming rule. :)

------
andrewcooke
doesn't the musicbrainz schema cover most/some of this?
<http://musicbrainz.org/doc/MusicBrainz_Database/Schema>

also, given the ubiquity of UTF-8, why the need for ASCII?

~~~
ux
It would be interesting to make an analysis on how MusicBrainz deals with all
these problems, I admit I didn't look much into this. But AFAICT it wouldn't
really solve the file system problem, except if you decide to name your files
with a hash. Also, you might still want at some point to keep extra
information MusicBrainz wouldn't handle, even if you have a MusicBrainz ID
stored in the file to identify the music.

About the ASCII, my point was just all about the fact that you can't actually
keep only the international name and you need to store the name in different
language version. Obviously, I don't have any problem with using UTF-8.

PS: note that UTF-8 won't be able to represent properly mathematical
formula... :)

~~~
andrewcooke
yeah, sorry, was in a rush to go eat. didn't really mean that your worries
would all go away, only that musicbrainz might be a good place to look for
more info.

------
webjunkie
The single most annoying thing I encountered was the inability of ID3 to
handle multiple albums. Every artist sooner or later releases the exact same
piece of music on another album. WHY DIDN'T THEY THINK OF THAT?

~~~
andreasvc
That's irrelevant, it's either ripped from one or the other, and that's the
information that should be there. Lookinng up on which albums a song has
appeared should be a database / wikipedia lookup. Think about it, is it
relevant on which christmas compilations "so this is christmas" has appeared?

~~~
baddox
It's not irrelevant. If the exact same recording of a song appears on two
albums, an ideal categorization system would allow it to appear as such
without storing multiple (redundant) copies of the song. An easy way to think
of it is a normalized relational database. If the song has an id of 5 and is
on two albums with id 2 and 3, then there would be an AlbumSong join table
with schema (album_id, song_id) and two rows: (2, 5) and (3, 5). It should be
irrelevant which album you ripped or downloaded first.

~~~
redbad

        >  If the exact same recording of a song appears on two 
        >  albums, an ideal categorization system would allow it to 
        >  appear as such without storing multiple (redundant) copies 
        >  of the song.
    

I don't agree. The song was released twice; if you have both albums, you ought
to have two copies of it.

~~~
baddox
Yeah, it's not going to be a big storage hit in reality (and even if it were,
you could just rely on file system level compression/deduplication). I, like
the author, just enjoy musing about the "perfect" way to organize a music
library.

~~~
adamzochowski
Some artists re-record or alter their song on secondary release. Or if it is
live performance.

Deduplication should be done by algorithm that detects that it is same binary
copy -- not through tags.

------
buro9
There seems to be confusion between the filesystem and the metadata.

The filesystem is for storage... it's only important to be able to group
tracks together in small batches (releases - albums/singles/EPs) to be able to
manage the files

The metadata is for searching, grouping and locating in your music player.

With that in mind, a lot of the problems he's cited vanish.

I have ~84,000 tracks from over 6,000 albums. The result of a 5 year ripping
spree after a decade worked in the music industry.

Every track, with no exception, has been scanned by MusicBrainz Picard, had
it's PUID generated and meta data normalised.

I've allowed my definition of genre be influenced by general opinion... I
simply learned how the mass tag things.

The result is that I can find everything in my interface (Squeezebox) within
seconds.

The file storage I only need to care about for management of the files, and
copying to my portable player or deleting old releases (which I do rarely, but
it does happen).

The file storage starts with file type:

MP3|FLAC|FLAC_9624

Below those are folders for high level type of content:

Artists|Classical|Spoken Word|Compilations|Soundtracks

Within Artists I used

Artist/Release Name[ - Catalogue Number]/[Volume/][Media/]Track Number - Track
Name

I only fill in catalogue number if I have two versions of the same titled
release... i.e. Quadrophenia by The Who I have a couple of versions of.

I only fill out Volume if this is a multi-volume release.

I only fill out Media if this is a release that spans multiple CDs or DVDs.

So the short version of that might be:

FLAC/Artists/The Who/A Quick One/02 - Boris the Spider.flac

And a long version might look like:

FLAC/Artists/The Who/Quadrophenia - Polydor 2777840/CD2/01 - 5.15.flac

I have no problem at all storing and finding tracks, and I've no problem at
all searching for tracks.

One of the good things about Squeezebox is that when you search it searches
all metadata and the full file path. So a search for "quadro poly" would turn
up the Polydor version of Quadrophenia.

~~~
mylittlepony
What is Squeezebox exactly? I found many programs named like that and the one
I think is you are talking about seems to have been discontinued now. Do you
have a link or something?

~~~
danieldk
I think he is referring to the Logitech Squeezebox:

[http://en.wikipedia.org/wiki/Squeezebox_(network_music_playe...](http://en.wikipedia.org/wiki/Squeezebox_\(network_music_player\))

~~~
buro9
That's the one.

I use Squeezebox server on a QNAP NAS storage thing.

Then I use a Squeezebox Transporter for the superlative DAC that it has.

Squeezebox server allows me to listen remotely too, so for more than a decade
I've had the ability to listen to my home music collection from wherever I am
in the world (assuming I have an internet connection). I use DNSMadeEasy for
their dyndns stuff.

------
mjw
Having worked with it for a few years, modelling music metadata is indeed an
absolute nightmare.

There are some efforts to standardise this stuff though, see
<http://www.ddex.net> which a lot of the digital supply chain is starting to
adopt. It's something of a set of scary great kitchen-sink XML schemas
(schemata?) but might be of interest to those who get massively nerdy about
this sort of thing.

------
Figs
This sounds like exactly the kind of problem that relational databases were
designed to solve. You can organize it with an entity-relationship model
fairly easily. Once you have stored your information in a database, the
filename doesn't really matter as long as each mp3, ogg, etc. gets a unique
name; you can look up the file by querying the database for files that have
the properties you care about.

~~~
adamzochowski
This isn't something that relational database fixes. It is about schema and
data tagging. It can all still sit in ID3v2 tags, as long as people use all
tags properly.

The problem is that most tags are badly written, and most software that
displays songs only displays bare minimum. Typically tags I miss seeing: Style
/ Moods / Genres / Composers / Performers. Even Album Artist or Compilation
tags are bad. Same with some misspelled tags, or inconsistent (is it 'D&B' or
'DnB' or 'Drum-n-Base', etc).

~~~
fusiongyro
I think Figs is onto something though: the OP is clearly worrying about how to
stuff all this information into the filesystem, and the plain fact is that
while all of the information needs to be accessible beyond the database,
either in the hierarchy or in the tag embedded in the file itself, a lot of
that information is not structured in a manner conducive to access via those
two rather limited methods.

For the sake of argument, let's be platonic idealists for a second and say
that a song is a concept and the files are simply recordings of
performances—they're not "songs" at all, per se. Say I play "Black Betty" by
Lead Belly. I get curious about the song, so I want to hear other versions of
it. There's no way to express this data in ID3 or through a naming
convention—it's just a reference to a concept that has no existence in a pile
of MP3s—and it's hard to imagine a linking regime that would be maintainable.
To do this, you need a place to put metadata that isn't attached to the files
themselves, which can point to them. Note that this kind of scenario is not
outside the OP's scope: they say in the article they are interested in
providing all music to everyone, so these kinds of relationships would need to
be documented and searchable.

I personally think a relational database is a fine tool for this problem, but
lately I've been looking at RDF for things and I think RDF might be an even
better choice, since it's really designed for metadata description, ontologies
(like music categorization) and graphs. I think it would turn out to be a ton
of work, but I think it would satisfy many of the requirements.

------
detox
Oh god, I tried renaming files on my own (and I'm kind of a beginner
programmer). Then I found out about unicode problems and then came 2 problems.

problem #1 was apparently I set the encoding wrong while renaming the ID3 for
the music files so the foreign languages turned into question marks. I thought
scratch that, time to use someone else's tool. foobar2000 solved everything I
had a problem with except problem #2.

for problem #2, I have no idea if it'll ever be solved. The question marks or
blank boxes ending up on my computer deal with the encoding for the OS itself
(or something). And even with multi-language support, windows doesn't let me
fix that. There's obviously something wrong when my blank box issue disappears
when I restart my computer from time to time...

~~~
notJim
This is the shitty python script I use to organize music, which seems to work
pretty well except in one case[1]: <https://gist.github.com/3681165>.

For the love of god, try it on a small sample of your music or it will wreak
all kinds of havoc and you will yell and curse my name.

What it does: takes a directory of folders with MP3s, and makes them into the
a directory of folders like yourdir/[ARTIST]/[ALBUM]/[TRACK No.] - [Track
name].mp3. More importantly though, I _believe_ it handles unicode correctly,
and it's just over 100 lines of code, so you could adapt it to whatever system
you prefer.

Find the python library called Mutagen that deals with parsing ID3 tags and
put it in the same directory as this horrible script.

[1]: The one case it handles spectacularly poorly is when you have an album
with multiple artists. This could be a soundtrack, a compilation, or even that
thing that happens especially often in rap where the artist is like "[Some
dude] ft. [Some other dude]". Honestly, I just handle those cases manually,
but it might not be that hard to make the script do it right.

By the way, I've really only run this script on Windows. There are undoubtedly
some tweaks that should be made to make it work correctly on Unix-y systems.

~~~
Radix_
That's nice, thanks for sharing. I used exfalso, for which mutagen was
written, to build a similar directory structure for my music. Mine was
"Audio/[ALBUM_ARTIST]else[ARTIST]/[ALBUM]/[TRACK 00]. - [TITLE].mp3" The one
thing it did that your script doesn't is it will enter TPE2 if present and
backtrack to TPE1. And similar considerations can be entered for adding a disc
prefix for multi disc albums. (Since that anachronism is still relevant.) The
only problem was I needed to go back through and delete a bunch of empty
directories.

------
teyc
Rob Pike said it best when he quoted his friend:

    
    
       My late friend Alain Fournier once told 
       me that he considered the lowest form of 
       academic work to be taxonomy. 
       And you know what? Type hierarchies are 
       just taxonomy. You need to decide what 
       piece goes in what box, every type's parent, 
       whether A inherits from B or B from A.  
       Is a sortable array an array that sorts or a sorter 
       represented by an array? 
       If you believe that types address 
       all design issues you must make that decision.
    

Reference: [http://commandcenter.blogspot.com/2012/06/less-is-
exponentia...](http://commandcenter.blogspot.com/2012/06/less-is-
exponentially-more.html)

------
samps
It looks like the author has come to an understanding with his/her
disorganized music collection, but beets makes it pretty easy to bring sanity
back to a messy music library: <http://beets.radbox.org>

~~~
petepete
I second beets, it handles big libraries better than any tool I've come
across. My once messy library is now pretty much perfectly tagged (thanks to
MusicBrainz) and organised.

------
niklaslogren
I feel your pain. I have also stopped caring about music classifying, for much
the same reasons you listed. Nowadays I use only Spotify, and I trust in their
tagging abilities, and in Last.FM's auto-correcting ability.

My main concern when I used foobar used to be how to handle multiple artists
on the same album, which always resulted in the album being split up when
displayed in a list. I was unbelievably happy when I discovered the "Album
artist"-tag, which unites all songs in an album under the same banner, while
still preserving (and scrobbling) the original artist name.

~~~
anonymoushn
Last.fm and Spotify generally give me terrible recommendations and no
discovery at all. I think this is because they use machine learning on the
person <-> track graph to determine which tracks are related, so they're very
good at finding you DragonForce or IOSYS, but very bad at finding you Hammers
of Misfortune, Gallowbraid, Umbrella, or ｍ’ｓ. Of course, if you listen to
music that Last.fm thinks is adjacent to IOSYS or DragonForce, you've probably
already heard of IOSYS or DragonForce.

Pandora doesn't have this issue because they use human analysts to determine
the properties of the tracks and give you similar tracks rather than tracks
that are listened to by similar people. Unfortunately their music library is
pretty limited. Google Music's automatic playlist feature does something
similar without the human analysts, but this only works if you already have
all the music you want and it fits onto your Google Music account. They don't
seem to provide an option to pay for extra space.

------
ANTSANTS
I don't think there is a perfect way to organize a music collection in a
hierarchical manner, so I don't even bother. Good tag metadata and foobar's
search does all the work for me.

To me, the purpose of a filesystem is not to implement fine-grained
categorization, but to provide basic grouping of related files so that I can
easily operate on them all at once. To this end, my music collection mostly
consists of one folder per album in a root music directory. Folders are
usually named "Artist/Group Name - Album Title". That naming scheme doesn't
always fit (albums featuring various artists, soundtracks in which I'm more
likely to care about the title of the work rather than the artist that
composed it, etc), but I don't try to separate soundtracks from regular albums
or anything like that, I just throw them in the same root directory. With this
scheme, it's easy to delete/share/transcode an album when it's contained
within a single directory, convenient for people I share with, and I don't
waste any time obsessing over something that I rarely need to see.

Some people have advocated a more database or metadata-oriented approach where
you strip all metadata from the filename and folder hierarchy and stuff all
your files in one directory. It's an interesting idea, for sure, more closely
resembling the way web services like Youtube store their content. It makes one
begin to imagine a desktop operating system that featured a metadata database
as the primary filesystem organization scheme in place of the traditional
hierarchical filesystem.

With our currently available tools, however, having some kind of useful
metadata in the filename and/or filesystem hierarchy, even if it is redundant,
is incredibly useful when performing manual file manipulation, especially the
aforementioned sharing of files. You'd need ubiquitous categorization metadata
in files (that is, not just ID3 and company for music files) and ubiquitous
support for parsing this metadata in everyday applications (that is to say,
when beginning a download of a song or a document, your web browser would show
you the relevant metadata and hide the filename, if it exists. when opening a
file, one would have to be greeted with a search box instead of a traditional
hierarchy dialog) before we could ever entirely transition from having
meaningful filenames to having meaningless hashes, timestamps, or garbage as
the primary identifiers of files.

------
michaelhoffman
Some of these problems (like creators with the same name) have been solved by
librarians years ago.

~~~
ux
You mean by having a guy to ask where to find something because you're lost in
the store and can't find shit?

~~~
michaelhoffman
No, there is a whole field that deals with issues like this. It's called
library science. Specifically, dealing with this problem is called [authority
control](<http://en.wikipedia.org/wiki/Authority_control>).

Some of the other problems are also solved by cataloging experts.

------
rishonik
I have two classifications for my digital collection: Old Music and New Music.
Old Music is everything recorded before I was born. New Music is everything
recorded after I was born. It cuts down on putting too much time into it all.

------
Zakharov
Another annoyance the author didn't mention is that unzippers frequently
mangle the filename and/or metadata if it uses Unicode. Archive Manager is the
worst at this.

