Hacker News new | comments | show | ask | jobs | submit login

This looks at the first so many lyrics in each rapper's career. Aesop Rock came out with some weird stuff right off the bat. I wonder if some of these other rappers became more sophisticated over time. Maybe an average per song would be better, or average uniques per word, would be better.

The problem with average per song is that you "use up" words in every new song, so all things being equal each marginal song has progressively fewer new words.

I bet you could get something insightful from plotting "unique words" versus "total words" - That might give a good idea of the amount of repetition over time, the length or quantity of output, and the total vocabulary.

here's what this looks like. ugly as sin as useless for comparing rappers.


love your other ideas – hopefully can do them later.

Strange comment, you realise that's not an inherent truth of language? Unique words per song is trivial to calculate

If a rapper released one song using n distinct words their score would be n/1, and if they released a second song using the same set of words their score would halve, to n/2, despite the fact their demonstrated vocabulary is still n words.

In fact, if their first song used n distinct words and their second used a completely distinct set of words, but the second song was shorter than the first, their score would drop.

That would be unusual behaviour for a measure of vocabulary.

I don't think that's what the poster meant. By "average unique words per song" I take it to mean, within each song words are only counted once, but across songs, words can be counted multiple times. So if song A had the words "I like cats" and song B had the words "I like dogs", then the average unique word count would be ((3 + 3) / 2) = 3, not ((3 + 1)/2) = 2.

That's definitely one solution, but it still wouldn't quite capture it. As an extreme example, if rapper A produced 100 songs, each with exactly the same lyrics, they should surely be penalized compared with rapper B producing 100 songs with no shared words— even if rapper A's average unique-words-per-song is higher than rapper B's.

I agree, perhaps the 35,000 most recent words would be better.

OP here: the challenge is that most artists' best work is in their earlier years. I'd rather have Jay-z's first album than last, ya know?

Would sorting by popularity, or critical acclaim, or something along those lines be a possibility?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact