
Heavy Metal and Natural Language Processing - colinprince
http://www.degeneratestate.org/posts/2016/Apr/20/heavy-metal-and-natural-language-processing-part-1/
======
SwellJoe
The "least metal words" almost seems like a challenge. I now feel strangely
compelled to write metal songs with those words, and I haven't written a metal
song in 20 years. I may not be particularly metal, anymore.

~~~
jonathankoren
It surprisingly holds together.

[The] academic chairman noted [the] secretary['s] considerable measurements. /

University employee[']s attorney indicated [the] administrative committee. /

Literary agencies particularly fiscal. /

~~~
insin
Which Bad Religion album is this off?

------
amelius
> In the face of this complexity, it is not surprising that understanding
> natural language, in the same way humans do, with computers is still a
> unsolved problem.

I have the feeling it is not just an unsolved problem, but also an undefined
problem.

------
Bahamut
I noticed there was song data errors in places - for example, in this chart
([http://www.degeneratestate.org/static/metal_lyrics/clusters....](http://www.degeneratestate.org/static/metal_lyrics/clusters.html)),
in the most representative songs for Tiamat, it lists White Pearl, Black
Oceans. That is a song by Sonata Arctica. It also lists 3 versions of the same
song for Nightwish in most representative songs (Elan). Similar mistakes are
made for various bands in that chart (Symphony X, Hammerfall, Therion,
Sabaton, Stratovarius, Helloween, Within Temptation, etc.).

This was interesting to read in general though, especially as someone who
listens to metal quite frequently (mostly of the power metal variety).

~~~
simonbw
He mentioned that in the article: > What's interesting is that while the most
representative songs for each band are mostly their own songs, occasionally
other bands songs creep in. For example, "Wrathchild", is an Iron Maiden song,
not a Diamondhead song.

I don't think it's a problem with the song data. When the algorithm picks a
song it considers most lyrically representative of a band, it chooses from
_all_ the songs in the dataset, and it doesn't always pick a song by that
band.

------
vonnik
People interested in ML and heavy metal may appreciate:

LSTMetallica: Generation drum tracks by learning the drum tracks of 60
Metallica songs
[https://keunwoochoi.wordpress.com/2016/02/23/lstmetallica/](https://keunwoochoi.wordpress.com/2016/02/23/lstmetallica/)

Keunwoo is also the translator of Deeplearning4j's Korean site:
[http://deeplearning4j.org/kr-index](http://deeplearning4j.org/kr-index)

------
abecedarius
That list of most and least metal words, if I didn't know where they were
from, I'd have guessed was the 19th-C. Romantic reaction to the Enlightenment
(except for a few anachronisms like 'gonna'). How metal was Edgar Allan Poe?

~~~
bigger_cheese
Pretty Metal I know at least 3 bands who have set his words to music:
Arcturus, Green Carnation (Both did 'Alone') Carpathian Forest ('The Raven').

~~~
tragic
There's also Ahab, who based a fantastic album on the narrative of Arthur
Gordon Pym[0]. It's sort of halfway between funeral doom and your
Isis/Neurosis style stuff.

[0]
[https://en.m.wikipedia.org/wiki/The_Giant_(Ahab_album)](https://en.m.wikipedia.org/wiki/The_Giant_\(Ahab_album\))

------
Drup
Regarding the non-metalness of long words, it's just that you are not trying
hard enough: [http://www.invisibleoranges.com/death-metal-
english/](http://www.invisibleoranges.com/death-metal-english/)

------
mkozakov
There is something to be said about song structure. Most songs will have a
chorus, which is repeated throughout the song. Choruses will generally have
higher 'readability' because they are meant to be more memorable. If you don't
pre-process your corpus by removing repeating paragraphs, then your
comparisons to the brown corpus are not as valid, since brown doesn't use
repetition.

~~~
jessemoeller
I believe that the convention on darklyrics is to either elide or replace with
"[chorus]" all repetitions of the chorus after the first. A few spot checks
seem to confirm this, but I can't find an official policy on it.

------
lucb1e
This made me wonder how "metal" a given text is. Wrote a small toy here:
[https://lucb1e.com/rp/crapware/metalness.htm](https://lucb1e.com/rp/crapware/metalness.htm)

Given a text it will split each word, add up the metalness score and divide it
by the number of words.

~~~
qbrass
You need to strip |,.?!| from the text because it won't recognize words with a
punctuation mark at the end.

------
xefer
My favorite example of "least metal words" is Iron Maiden's use of the word
"cuddle" in their song "Drifter"

------
4e1a
This is really cool! Is this on github or anything? I'm having my friends and
myself search for lyrics of different genres to compare this to.

~~~
zo1
The author specifically mentions that he is not releasing the lyrics dataset.

------
1ris
I'm suprised no difference between american and european metal was found. I
feel like they are quite different, but it seems this is not supported by this
data.

I found it amusing that Venom and Running Wild are grouped together in the
first step. But well, by the lyrics it fits. The rest matches my expectations
surprisingly well.

------
NautilusWave
I originally assumed this was about how heavy metal contamination affects
natural language processing.

~~~
hrnnnnnn
In a roundabout way it is :)

------
ryuuchin
Reminds me of of the Power Metal Lyric Generator[1]. Although obviously that
was done more as a joke instead of actual analysis on metal lyrics.

[1] [https://youtu.be/wpe8eNdpAiM?t=307](https://youtu.be/wpe8eNdpAiM?t=307)

------
kgc
Lol. "Least Metal Words"

~~~
sevenless
Reminds me of Dethklok's song about tax receipts...
[https://www.youtube.com/watch?v=2zMQtXP7F5k](https://www.youtube.com/watch?v=2zMQtXP7F5k)

------
bionsuba
Fantastic article.

I wonder if the clustering method can be used/is used by apps like Spotify to
create a list of "related bands", as the graph at the end was fairly accurate.

~~~
4e1a
I thought the same thing, but this method seems like it actually works. Could
this be expanded to not just analyse words but frequency makeup also?

------
foobarbecue
Enjoying the article so far, but the graphs are unreadable. You should either
embed dynamic ones or use bigger fonts.

