
Evolution of words frequencies in porn - mazr
http://porngram.sexualitics.org/
======
route66
To speak of evolution when your time frame is 2008-2012 is somewhat far
fetched. But I believe I see a reassuring trend here:
[http://porngram.sexualitics.org/?q=erlang%2C+clojure%2C+Ada%...](http://porngram.sexualitics.org/?q=erlang%2C+clojure%2C+Ada%2C+scala%2C+rust)

~~~
mappum
I would be terrified if "brainfuck" had a nonzero frequency.

~~~
zorlem
I wouldn't bet hard money that it doesn't exist, there is always Rule #34.

[http://xkcd.com/305/](http://xkcd.com/305/)

------
sushirain
1\. It doesn't count word frequencies, but sub-string frequencies. Moreover,
if a sub-string appears more than once-per-title, then it is counted more than
once. I draw this conclusion by submitting "a,b,c". And from their paper [1]:

    
    
       our algorithm strips out dashes and catches any 
       occurrence of the query in the title, for example, 
       'blow' catches 'blowing', 'blowjobs'
    

This explains the results of these queries: "ada,erlang", "tea,beer". As an
alternative they could have used a stemmer [2].

2\. The "slow,fast" and "love,hardcore" trends illustrate an interesting
trend. Perhaps towards women or mainstream viewers.

[1] [http://sexualitics.org/wp-
content/uploads/2014/01/PORNSTUDIE...](http://sexualitics.org/wp-
content/uploads/2014/01/PORNSTUDIES_preprint.pdf)

[2] [http://nlp.stanford.edu/IR-book/html/htmledition/stemming-
an...](http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-
lemmatization-1.html)

~~~
easy_rider
> 2\. The "slow,fast" and "love,hardcore" trends illustrate an interesting
> trend. Perhaps towards women or mainstream viewers.

I don't think so [1]

[1]
[https://www.google.nl/search?q=teen+loves+to&oq=teen+loves+t...](https://www.google.nl/search?q=teen+loves+to&oq=teen+loves+to+fuck)

------
easy_rider
In my first 2 weeks of working at an adult company (as a dev yes, it's sad)
one of my tasks was to watch/scan 200+ video's and describe them.. It's true,
you run out of inspiration fast. Also I could hint you: the "love" in the
titles is probably explained by "love(s) to <insert profanity> ". I don't
think I ever used hardcore in a title.

~~~
mratzloff
I used to work with a guy who once worked as a dev at an adult company. He
said it was the most soul-sucking experience of his career, and that the
owners knew absolutely nothing about technology and treated their tech staff
terribly.

Based on the fact that you had to spend your first two weeks doing data entry,
it sounds like his experience wasn't unique.

~~~
easy_rider
I didn't quite learn if this was an industry thing. Never met other devs in
the industry from other companies. He painted a perfect picture.

Just talking about it makes me unhappy :)

------
probably_wrong
Traditional professions are still on top:

[http://porngram.sexualitics.org/?q=pizza%2Cdelivery%2Cplumbe...](http://porngram.sexualitics.org/?q=pizza%2Cdelivery%2Cplumber%2C+programmer)

I sense a business opportunity there.

~~~
jeltz
[http://porngram.sexualitics.org/?q=pizza%2Cdelivery%2Cplumbe...](http://porngram.sexualitics.org/?q=pizza%2Cdelivery%2Cplumber%2Cprogrammer%2Cteacher%2Cwhore)

Teaching and prostitution clearly dominate.

------
ozh
Quite fun!

Next: provide the porn industry a simple markov chain script to generate
probabilistic porn movie titles, and save them all those incredibly tiresome
brainstrom sessions they must have to create new titles :)

~~~
Negitivefrags
This reminds me of the first time I implemented a markov chain text generator.
We were at a LAN party so we looked on the public network shares to find a
corpus of text files to use as input.

The first thing we found was a copy of the bible and the second thing we found
was someones collection of porn stories.

The start of the output was "He slipped his tongue into the lord..."

~~~
Ygg2
"... and Lord came unto him"?

~~~
angersock
Standard missionary work.

------
endriju
[http://porngram.sexualitics.org/?q=btc%2Cusd](http://porngram.sexualitics.org/?q=btc%2Cusd)

------
uloweb
[http://porngram.sexualitics.org/?q=iphone%2Candroid](http://porngram.sexualitics.org/?q=iphone%2Candroid)

~~~
viraptor
I find the graph shapes amusing given the data source :)

------
mapleoin
[http://porngram.sexualitics.org/?q=hipster](http://porngram.sexualitics.org/?q=hipster)

~~~
duiker101
I wonder how we survived before 2009 without hipster porn.

------
edoloughlin
Strange:
[http://porngram.sexualitics.org/?q=tea,wine,beer,coffee](http://porngram.sexualitics.org/?q=tea,wine,beer,coffee)

~~~
RyanMcGreal
In a similar vein:

[http://porngram.sexualitics.org/?q=breakfast%2Clunch%2Cdinne...](http://porngram.sexualitics.org/?q=breakfast%2Clunch%2Cdinner%2Csnack)

------
thinker
I wonder if we'll soon see an Upworthy for Porn: "You will never believe what
this girl did to pay her rent"

~~~
keammo1
"He was just there to fix the cable. What happened next will shock you"

------
guybrushT
Very interesting to see the dataset being made available. Whenever I want to
do this kind of analysis, I always stumble at 'how to get the data?'. In their
paper, it is mentioned that "We created a dedicated computer program to carry
out the navigation and data collection tasks required to gather the metadata
for all available videos...". I would love to see this program. More broadly,
can anyone help me with best resources (pref python) where one can learn to
crawl/scrape this type of information?

~~~
mazr
Hi, I didn't release the code of the crawler, first, because it was not well-
crafted enough to be released (quick and dirty linear programming), and
second, because any change in the site you crawl calls for recrafting your
code.

I used python, sometimes with Beautifulsoup, sometimes with lxml, both are
very good for crawling. I would say BS is easier, and LXML cleaner.

------
stcredzero
_words frequency_

Is the title a British thing? Like maths vs. math?

~~~
polymatter
As a native Brit, looks like a mistake to me.

------
graylights
So I did gay vs lesbian and I was confused why there was a big spike in 2010
for gay that has since dropped off. Is this an anomaly in their sampling?

Also Obama's numbers have really dropped compared to Bush:
[http://porngram.sexualitics.org/?q=bush%2Cobama](http://porngram.sexualitics.org/?q=bush%2Cobama)

------
6d0debc071
I suppose we may as well do the obvious ones

[http://porngram.sexualitics.org/?q=BDSM%2Ctorture%2Cpain%2Cr...](http://porngram.sexualitics.org/?q=BDSM%2Ctorture%2Cpain%2Crape)

:/

~~~
catFishery
I dunno, the less obvious ones are more interesting.

[http://porngram.sexualitics.org/?q=twat%2Ccunt%2Cfist%2Cprol...](http://porngram.sexualitics.org/?q=twat%2Ccunt%2Cfist%2Cprolapse%2Cscrote%2Cbeast%2Cschlong)

------
misnome
Kind of interesting, but really needs statistical error bounds

------
himal
[http://porngram.sexualitics.org/?q=HIV%2CAIDS](http://porngram.sexualitics.org/?q=HIV%2CAIDS)

------
jpswade
[http://porngram.sexualitics.org/?q=cup](http://porngram.sexualitics.org/?q=cup)

------
bobowzki
"sister" :-)

