
Repetitiveness and compressibility analysis in song lyrics - iheredia
https://pudding.cool/2017/05/song-repetition/
======
6stringmerc
Great chart, unfortunate conclusion with some erroneous allusions. Bear with
me here: Starting with ABBA, and fully blossomed in the form of Max Martin,
Top 40 Pop has been dominated by Swedish composition techniques.

Taylor Swift, Britney Spears, Kelly Clarkson, NSYNC, Bieber, Katy Perry, Demi
Lovato...Max Martin's fingerprints are all over the hits. He has a defined
style as well. Balanced lines. It's brilliant. Thus, it's not about the
performers if you want to study the composition - you have to go to the actual
composer(s).

Just saying that this is a sound technique and approach but looking at the
data set at the exclusion of pertinent considerations. Revised, it would make
for an interesting story.

~~~
derefr
You could actually use this analysis technique to _fingerprint_ composers
who've ghostwritten for bands, no? Bands pumping out music exactly as
compressible are likely the same songwriter, whether they acknowledge that or
not.

~~~
anigbrowl
Composers' identities are rarely a secret, because there are separate royalty
streams for performance and composition. OK, pop fans may lean towards the
assumption that their idols' songs are always deep personal confessions but
there isn't some grand conspiracy to conceal the facts of authorship from the
public. Most people just don't think about it very much.

~~~
derefr
Rarely a _secret_ , yes. But it's not like there's a public database of
composer:song mappings anywhere†, for you to easily feed your ML algorithm;
you'd have to do a bunch of original research to build that dataset. If
there's some statistical way to infer the data with nearly the same quality,
so that you don't need to go to the effort of building the dataset yourself,
that'd be nice.

† An assumption on my part—is there, in fact, a place where you can look up
who gets each portion of the royalties for a given song?

~~~
anigbrowl
I like to fact-check my assertions _prior_ to making them, unless I'm being
metaphysical. Not snark; it's just a huge time saver.

There are a bunch of such datasets, some commercial but accessible for a small
fee (like ASCAP), others freely available. I personally like the Discogs one
best but some of the commercial offerings are better curated.

[https://en.wikipedia.org/wiki/List_of_online_music_databases](https://en.wikipedia.org/wiki/List_of_online_music_databases)

------
gwern
Looks like interesting work, but the main chart doesn't work for me in Firefox
or Chromium - it seems to be yoked to your scroll position (why?!) so by the
time you've scrolled down to the '2014' paragraph, which makes it chart the
full time-series, you can't even see the graph in the first place... Data viz
run amok.

~~~
danielsf
I worked on this project. Can you share your screen size, device, and browser
version?

~~~
gwern
FF 53.0.2 (64-bit), Chromium Version 58.0.3029.96 Built on Ubuntu , running on
Ubuntu 16.04 (64-bit); using a big Dell in portrait mode. Screenshots:
[https://imgur.com/a/vFhNc](https://imgur.com/a/vFhNc) By the time I've
scrolled down far enough to activate the animation, the graph has disappeared.

~~~
danielsf
ah I think it's due to the high viewport height that we didn't account for
(it's rare, but in your case, it broke the code). thanks!

~~~
jiaweihli
Like @Asdfbla, I'm on 2560x1440 and have the same issue. The viewport is
likely the issue =)

------
geluso
Where'd they get the lyric data for this analysis? In my experience this data
in bulk is all incredibly locked down!

------
IgorPartola
Next: lossy lyrics compression where using words that sound the same could
yield higher ratios! I wonder how well that would work for Sting where nobody
can understand anything anyways.

------
stuffedBelly
Interesting analysis and great visual presentation. Would also be interested
to see analysis on repetitiveness of intervals and rhythmic patterns used
among popular songs. In many occasions people tend not to care about lyrics
much in presence of addictive grooves/riffs, "Get Lucky" by Daft Punk being a
good example.

------
woliveirajr
I like the use of compression to find out about repetitions.

There is some theory out there called Kolmogorov Complexity [0]. It says that
something is as complex as how much information you need to express it. In
your case, lyrics are as complex as how many symbols (letters? words? bytes?)
you need to represent it.

And one good way to calculate it is as you done: compress it. If you're using
the same compression method for all the lyrics, you'll find that the ones that
are more simple (and more repetitive) are the ones that have a great reduction
on their sizes. In that case, the choice of which compression method you use
is somehow irrelevant. Had you used Bzip, PPMD, etc., the results probably
would be similar.

In case you want to extend your research, for example, as 6stringmerc said,
you might consider that the composer matters more than the actual artist.

And, for that, you can use Normalized Compression Distance (NCD) [1]. That way
you can measure how two lyrics are similar. Basicaly, you compress those
lyrics together. When they are similar, clues from one are used by the
compression to also compress the second one, so similar lyrics get more
compression than lyrics that aren't related.

And by doing that you can even discover who was the composer of the songs,
i.e., the authorship of the lyrics, since each person usually has the same
writing style... [2]

[0]
[https://en.wikipedia.org/wiki/Kolmogorov_complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity)

[1]
[https://en.wikipedia.org/wiki/Normalized_compression_distanc...](https://en.wikipedia.org/wiki/Normalized_compression_distance)

[2]
[https://link.springer.com/chapter/10.1007%2F978-3-642-34475-...](https://link.springer.com/chapter/10.1007%2F978-3-642-34475-6_76)

------
marzell
The visualization animations as you scroll through this article are fantastic,
and a great way to implement storytelling with data. Also, the content is
pretty interesting too, I like the emphasis on aligning data metrics with
intuition to really make the point.

------
twiss
According to the main chart, songs in the top 10 are more likely to be
repetitive, and that discrepancy has been growing. That raises the question,
is there a causal link between being repetitive and reaching the top 10? If
so, the answer to "Who's responsible for this madness?" is: the listeners.

~~~
marzell
I think this is partly true. However, there is a significant phenomenon where
there is financial/political pressure to play songs (pay to play) which then
creates demand for songs since they are then familiar to listeners. So
companies/firms can pay stations to play otherwise unremarkable music (perhaps
simple songs or those with the highest profit margins/potential available to
stakeholders) and in a general sense this influences listeners to want to keep
hearing those now familiar tunes. The whole pop-ephemerality of music-as-a-
commodity feels like the new opiate of the masses - in general people are more
content as long as they have a meaningless tune as the background soundtrack
for their day-to-day repetitive tasks.

I wonder what the correlation is between acceptance of repetitive music and
the repetitiveness of a listener's daily tasks.

------
rayuela
What was used to make these visualizations? These are beautiful.

~~~
danielsf
D3!

------
sn9
I would love to see the source code for how this post was created! (Or at
least pointers for resources on how to create something similar.)

~~~
mynewtb
Ctrl-u

~~~
sn9
Oh right. Thanks.

------
ashark
Some data messiness in that final chart. Maroon 5 versus Maroon5, Surfin'
U.S.A but also Surfin' U.s.a

------
cttet
Music itself is about repetition, without repetition is just noise..

------
marzell
Any idea how to do the downres transition at the top? It's really cool.

