Hacker News new | past | comments | ask | show | jobs | submit login
Evolution of words frequencies in porn (sexualitics.org)
106 points by mazr on Jan 30, 2014 | hide | past | web | favorite | 48 comments

To speak of evolution when your time frame is 2008-2012 is somewhat far fetched. But I believe I see a reassuring trend here: http://porngram.sexualitics.org/?q=erlang%2C+clojure%2C+Ada%...

Still, when performance matters, C is the blows all the others away by large:


Disclaimer: all puns intended

Omg, look at C. It outperforms everything... http://porngram.sexualitics.org/?q=erlang%2Cclojure%2CAda%2C...

As someone pointed out below, this just looks at substrings so looking for "C" means looking for any words that has the letter "C" in it.

I would be terrified if "brainfuck" had a nonzero frequency.

I wouldn't bet hard money that it doesn't exist, there is always Rule #34.


And yet, somehow very surprised.

And Ruby is way ahead of Python:


This comment made my day :D

1. It doesn't count word frequencies, but sub-string frequencies. Moreover, if a sub-string appears more than once-per-title, then it is counted more than once. I draw this conclusion by submitting "a,b,c". And from their paper [1]:

   our algorithm strips out dashes and catches any 
   occurrence of the query in the title, for example, 
   'blow' catches 'blowing', 'blowjobs'
This explains the results of these queries: "ada,erlang", "tea,beer". As an alternative they could have used a stemmer [2].

2. The "slow,fast" and "love,hardcore" trends illustrate an interesting trend. Perhaps towards women or mainstream viewers.

[1] http://sexualitics.org/wp-content/uploads/2014/01/PORNSTUDIE...

[2] http://nlp.stanford.edu/IR-book/html/htmledition/stemming-an...

> 2. The "slow,fast" and "love,hardcore" trends illustrate an interesting trend. Perhaps towards women or mainstream viewers.

I don't think so [1]

[1] https://www.google.nl/search?q=teen+loves+to&oq=teen+loves+t...

In my first 2 weeks of working at an adult company (as a dev yes, it's sad) one of my tasks was to watch/scan 200+ video's and describe them.. It's true, you run out of inspiration fast. Also I could hint you: the "love" in the titles is probably explained by "love(s) to <insert profanity> ". I don't think I ever used hardcore in a title.

I used to work with a guy who once worked as a dev at an adult company. He said it was the most soul-sucking experience of his career, and that the owners knew absolutely nothing about technology and treated their tech staff terribly.

Based on the fact that you had to spend your first two weeks doing data entry, it sounds like his experience wasn't unique.

I didn't quite learn if this was an industry thing. Never met other devs in the industry from other companies. He painted a perfect picture.

Just talking about it makes me unhappy :)

Traditional professions are still on top:


I sense a business opportunity there.


Teaching and prostitution clearly dominate.

> Traditional professions are still on top:

Time to change positions, then!

Quite fun!

Next: provide the porn industry a simple markov chain script to generate probabilistic porn movie titles, and save them all those incredibly tiresome brainstrom sessions they must have to create new titles :)

This reminds me of the first time I implemented a markov chain text generator. We were at a LAN party so we looked on the public network shares to find a corpus of text files to use as input.

The first thing we found was a copy of the bible and the second thing we found was someones collection of porn stories.

The start of the output was "He slipped his tongue into the lord..."

"... and Lord came unto him"?

Standard missionary work.

I think the interesting part about all this is how it changes over time. I have this impression that the whole area of sex is still and always has been a weird reflection on society at large. I would be curious, for example, for a much longer term graph comparing frequency of the two words 'hardcore', and 'love'.

I find the graph shapes amusing given the data source :)

I wonder how we survived before 2009 without hipster porn.

Not sure that that is so strange when you consider tea might appear more often due to a fairly well-known n-grame including its delivery container...

It counts sub-string frequency, not word frequency. Try "a,b,c".

I wonder if we'll soon see an Upworthy for Porn: "You will never believe what this girl did to pay her rent"

"He was just there to fix the cable. What happened next will shock you"

Very interesting to see the dataset being made available. Whenever I want to do this kind of analysis, I always stumble at 'how to get the data?'. In their paper, it is mentioned that "We created a dedicated computer program to carry out the navigation and data collection tasks required to gather the metadata for all available videos...". I would love to see this program. More broadly, can anyone help me with best resources (pref python) where one can learn to crawl/scrape this type of information?

Hi, I didn't release the code of the crawler, first, because it was not well-crafted enough to be released (quick and dirty linear programming), and second, because any change in the site you crawl calls for recrafting your code.

I used python, sometimes with Beautifulsoup, sometimes with lxml, both are very good for crawling. I would say BS is easier, and LXML cleaner.

words frequency

Is the title a British thing? Like maths vs. math?

As a native Brit, looks like a mistake to me.

Probably just an error: the guys behind it are French http://sexualitics.org/#theteam

So I did gay vs lesbian and I was confused why there was a big spike in 2010 for gay that has since dropped off. Is this an anomaly in their sampling?

Also Obama's numbers have really dropped compared to Bush: http://porngram.sexualitics.org/?q=bush%2Cobama

I suppose we may as well do the obvious ones



I dunno, the less obvious ones are more interesting.


Kind of interesting, but really needs statistical error bounds

"sister" :-)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact