Hacker News new | past | comments | ask | show | jobs | submit login
Heavy Metal and Natural Language Processing (degeneratestate.org)
247 points by colinprince on July 3, 2016 | hide | past | favorite | 42 comments



The "least metal words" almost seems like a challenge. I now feel strangely compelled to write metal songs with those words, and I haven't written a metal song in 20 years. I may not be particularly metal, anymore.


It surprisingly holds together.

[The] academic chairman noted [the] secretary['s] considerable measurements. /

University employee[']s attorney indicated [the] administrative committee. /

Literary agencies particularly fiscal. /


Which Bad Religion album is this off?


It almost seems like something happened and it was some kind of text about administrative work that accidentally slipped in.


I could see this as an introductory voiceover, with a faint guitar riff in the background.

drums come in

rhythm guitar and bass come in

main riff begins

begin song


Good use of "particularily" right there -- #1 on the nonmetal chart.


Motorhead managed to fit the word "parallelogram" into their title song, so I'd say nothing is impossible.


I always thought of Motorhead more in the spectrum of "thrash punk" than metal.


Not that "parallelogram" is a particularly punk word, either.


Nothing's impossible, re: Death Metal Pizza https://www.youtube.com/watch?v=G3gTBpLHzLY


Turbonegro did it in 1998: https://www.youtube.com/watch?v=YtCvz6-7D7g

Carnivore (Pete Steele's band before Type O Negative) also did it in 1987 (although it's probably not going to be classified as a metal song unless your name is Marcel Duchamp or Tracy Enim): https://www.youtube.com/watch?v=eGDTrWw_APM


Do you folks like coffee?


You could try to sell the songs to System of a Down.

(which may not be particularly metal, but they have an interesting contrast between their sound and their lyrics)


According to this list, "I Ejaculate Fire" by Deathklok is not a very metal song.


As long as you growl them, they could totally be Death lyrics


The lyrics for the first couple of Carcass albums were pulled from medical texts, so I think lyrical content is pretty flexible. But, the nonmetal words almost sound like a very boring BBC program about people in an office in the 19th century; like Eastenders with stuffy accountants. It'd be an interesting challenge.


There's also Die Eier Von Satan by Tool:

https://www.youtube.com/watch?v=VPST2nE0KIo

The lyrics are a german recipe on how to make cookies without eggs....


hashish-laced donut holes, but your point still stands.


> In the face of this complexity, it is not surprising that understanding natural language, in the same way humans do, with computers is still a unsolved problem.

I have the feeling it is not just an unsolved problem, but also an undefined problem.


I noticed there was song data errors in places - for example, in this chart (http://www.degeneratestate.org/static/metal_lyrics/clusters....), in the most representative songs for Tiamat, it lists White Pearl, Black Oceans. That is a song by Sonata Arctica. It also lists 3 versions of the same song for Nightwish in most representative songs (Elan). Similar mistakes are made for various bands in that chart (Symphony X, Hammerfall, Therion, Sabaton, Stratovarius, Helloween, Within Temptation, etc.).

This was interesting to read in general though, especially as someone who listens to metal quite frequently (mostly of the power metal variety).


He mentioned that in the article: > What's interesting is that while the most representative songs for each band are mostly their own songs, occasionally other bands songs creep in. For example, "Wrathchild", is an Iron Maiden song, not a Diamondhead song.

I don't think it's a problem with the song data. When the algorithm picks a song it considers most lyrically representative of a band, it chooses from all the songs in the dataset, and it doesn't always pick a song by that band.


People interested in ML and heavy metal may appreciate:

LSTMetallica: Generation drum tracks by learning the drum tracks of 60 Metallica songs https://keunwoochoi.wordpress.com/2016/02/23/lstmetallica/

Keunwoo is also the translator of Deeplearning4j's Korean site: http://deeplearning4j.org/kr-index


That list of most and least metal words, if I didn't know where they were from, I'd have guessed was the 19th-C. Romantic reaction to the Enlightenment (except for a few anachronisms like 'gonna'). How metal was Edgar Allan Poe?


Pretty Metal I know at least 3 bands who have set his words to music: Arcturus, Green Carnation (Both did 'Alone') Carpathian Forest ('The Raven').


There's also Ahab, who based a fantastic album on the narrative of Arthur Gordon Pym[0]. It's sort of halfway between funeral doom and your Isis/Neurosis style stuff.

[0] https://en.m.wikipedia.org/wiki/The_Giant_(Ahab_album)


Regarding the non-metalness of long words, it's just that you are not trying hard enough: http://www.invisibleoranges.com/death-metal-english/


There is something to be said about song structure. Most songs will have a chorus, which is repeated throughout the song. Choruses will generally have higher 'readability' because they are meant to be more memorable. If you don't pre-process your corpus by removing repeating paragraphs, then your comparisons to the brown corpus are not as valid, since brown doesn't use repetition.


I believe that the convention on darklyrics is to either elide or replace with "[chorus]" all repetitions of the chorus after the first. A few spot checks seem to confirm this, but I can't find an official policy on it.


This made me wonder how "metal" a given text is. Wrote a small toy here: https://lucb1e.com/rp/crapware/metalness.htm

Given a text it will split each word, add up the metalness score and divide it by the number of words.


You need to strip |,.?!| from the text because it won't recognize words with a punctuation mark at the end.


My favorite example of "least metal words" is Iron Maiden's use of the word "cuddle" in their song "Drifter"


This is really cool! Is this on github or anything? I'm having my friends and myself search for lyrics of different genres to compare this to.


The author specifically mentions that he is not releasing the lyrics dataset.


I'm suprised no difference between american and european metal was found. I feel like they are quite different, but it seems this is not supported by this data.

I found it amusing that Venom and Running Wild are grouped together in the first step. But well, by the lyrics it fits. The rest matches my expectations surprisingly well.


I originally assumed this was about how heavy metal contamination affects natural language processing.


In a roundabout way it is :)


Reminds me of of the Power Metal Lyric Generator[1]. Although obviously that was done more as a joke instead of actual analysis on metal lyrics.

[1] https://youtu.be/wpe8eNdpAiM?t=307


Lol. "Least Metal Words"


Reminds me of Dethklok's song about tax receipts... https://www.youtube.com/watch?v=2zMQtXP7F5k


Fantastic article.

I wonder if the clustering method can be used/is used by apps like Spotify to create a list of "related bands", as the graph at the end was fairly accurate.


I thought the same thing, but this method seems like it actually works. Could this be expanded to not just analyse words but frequency makeup also?


Enjoying the article so far, but the graphs are unreadable. You should either embed dynamic ones or use bigger fonts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: