
The Real ‘Stuff White People Like’ - dfreidin
http://blog.okcupid.com/index.php/the-real-stuff-white-people-like/
======
powrtoch
On the proper use of this data:

Perhaps the easiest way to think of it is that the phrases are predictors for
the race/sex, not the other way around. For example, you shouldn't expect
every white male you meet to like Van Halen. However if someone says to you "I
have a friend who's a big Van Halen fan", you're pretty safe in assuming that
the friend is a white male.

Likewise, it might be that only 10% of blacks like soul food. But if almost no
other demographics like it, it will still show up high on their list. So "is
black" does not strongly imply "loves soul food", but "loves soul food" does
strongly imply "is black".

In other words, <http://en.wikipedia.org/wiki/Bayes_theorem>

~~~
moultano
For that reason, I'd be happier if they sorted by KL Divergence instead of
just log-odds. That'd give a much better tradeoff between commonality and
predictive power.

~~~
lazyjeff
You have a valid point but I'm guessing they avoided kl-divergence because
it's 1) harder to explain 2) it's not symmetric, e.g. kl(asian, white) !=
kl(white, asian) 3) it needs a smoothing function for comparing distributions
where not every element is in both distributions.

------
patio11
There's just so much that they execute well on that I hate to pick any bit of
it, but one thing everybody with linkbait should probably do is create
something spiritually similar to the bar which pops up when you're done with
the article. It is a force multiplier for all pillar content you write, it
increases the viral factor, and the way it grabs someone's attention _just_
when their brain is known to be vacant is sixteen flavors of brilliant.

I did something _very_ similar for a client today, and after I get a little
better at manipulating code to do it, I'm probably going to try something
similar for getting trial signups. ("Looks like you're done reading about it.
Feeling confused about what to do next? WHAM, signup box.")

~~~
tuesdays
Interesting, I was thinking the exact opposite, since those bars are all super
annoying.

~~~
xiongchiamiov
Oh wow, that is annoying.

I do like the pop-out bars that the NYT has at the end of their articles,
though. See, for example,
[http://www.nytimes.com/2010/09/07/business/economy/07jobs.ht...](http://www.nytimes.com/2010/09/07/business/economy/07jobs.html?pagewanted=2&_r=4&hp)

~~~
bcaulf
Interesting. I also was reminded of the NYT popout bars, but I hate them. Pop-
anything that I didn't ask for really bothers me.

------
yalurker
Interesting read, a couple things come to mind though: how does "the people
who have ok cupid profiles" vary against "the general population". Several
things I suspect are skewed because of the population bias.

Also, they say most of their users are urban, but I'm curious if people aren't
prone to list themselves as the nearest big city rather than where they really
live. For instance, I suspect everyone within 45 minutes of Des Moines is
listing themselves as living there, rather than the tiny farm town / suburb
they really live in.

~~~
steveklabnik
Generally, OKC's demographics are perceived as being college age, liberal,
middle class, and alt friendly.

I'm not sure they've ever directly posted demographic data, but that's their
perceived demographic, at least.

------
cullenking
I found this, like most of the other blog posts by the OK Cupid team, pretty
genius. I am glad the people who sit on top of this goldmine of social
information have a good sense of humor.

It would be cool to see a statistician guest post. The OK Cupid people are
great at coming up with ideas for analysis, but I'd love to see some solid
stats behind some of their analyses.

------
dnautics
It's interesting, I am asian, like soul food, but it's not something that
would occur to me as putting on my profile. Similarly, I would write sashimi
(if I ate it anymore) versus sushi, and I suspect that non-asians like sashimi
just fine but wouldn't know to put it on... So the statistics point to self-
cultural broadcasting, I think, more than preferences.

~~~
oiuygtfrtghyju
On the other hand this is a data point.

Knowing that the stuff you like with fish is called "sashimi" and the rice
stuff is called "sushi" is a good indicator that you are asian.

Just like knowing that different types of cheese should actually taste
different is a good indicator that you are european.

~~~
Natsu
> Knowing that the stuff you like with fish is called "sashimi" and the rice
> stuff is called "sushi" is a good indicator that you are asian.

Probably true statistically, but there are plenty of non-Asians like me who
know the difference (sashimi is the raw fish, sushi is that which has sushi
rice). But I'll never show up on their tests because I don't have an OKCupid
profile and neither food is a favorite (they're okay, but definitely not
favorites).

------
reader5000
Those results are highly odd, but I don't think okcupid is mainstream enough
to really glean any insight into racial psychology (if such a thing exists)
from their data. Furthermore, they don't provide enough info on their analysis
method, but I would be interested in seeing the results of a null-run:
randomly assigning profiles to groups (rather than by race) and seeing what
"statistically distinct" phrases arise (if the analysis is valid no phrases
should arise).

It would also be interesting to see them do the same analysis for other
features such as height, income, photo attractiveness, etc.

Similar analysis for craigslist personals by city:
<http://blog.kiwitobes.com/?p=42>

~~~
narkee
They do have an enormous sample size. Much, much larger than any rigorous
scientific studies based on getting undergrads to fill out surveys.

They've done similar analyses for things like self-reported heights, and
photos:

[http://blog.okcupid.com/index.php/the-biggest-lies-in-
online...](http://blog.okcupid.com/index.php/the-biggest-lies-in-online-
dating/)

<http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/>

~~~
reader5000
Not really sure why this is getting upvoted and mine down, but it doesn't
matter how large their sample size is since it's restricted to people who
voluntarily sign up for an online dating website, specifically okcupid. As okc
becomes more mainstream the data is more valid for the general population. But
sample size is irrelevant here.

~~~
narkee
Fair enough. Although the data is presented in a non-rigorous, light-hearted
manner, I guess a more accurate title would be "Stuff single, web savvy white
people like"

------
ttdi
I'm really surprised by how they didn't mention one of the most striking
results from the data: Latinos on OKCupid are much more likely to have the
word "stationed" in their profile than other demographics. Based on this, it
looks like the military contains a large proportion of Latinos ("stationed in
[location]"). What are the demographics of the military versus the general
population?

~~~
oiuygtfrtghyju
According to
[http://www.armyg1.army.mil/HR/docs/demographics/FY05%20Army%...](http://www.armyg1.army.mil/HR/docs/demographics/FY05%20Army%20Profile.pdf)

Army: White 63.9%, Black 19.0%, Hispanic 10.3%, Asian 3.8%

USA: White 75%, Black 12%, Hispanic 15%, Asian 4%

So hispanic is underrepresented in the army and black over represented - of
course you would have to balance these by age profile for each group and also
consider US overseas territories that can join the army.

~~~
haroldp
Maybe we looked at the wrong service, since the latino profile also mentioned
"marines":

<http://en.wikipedia.org/wiki/File:HispanicMilitary.jpg>

~~~
sesqu
According to this 2007 report on marines:
<http://www.cna.org/documents/D0016910.A1.pdf>

Young blacks became underrepresented after the Iraq wars, but high NCO ranks
are still heavily overrepresented. Conversely, marine accessions are on the
rise for young hispanics, and they are now slightly overrepresented. High NCO
ranks are underrepresented, but diminishingly so.

So this is definitely a trend, but not a major one. Those mentions may have
more to do with what contributes to the trend than be caused by it.

------
acon
Really interesting, but isn't this more about what white people want other
people to think that they like, rather than what they actually like.

It would be interesting to compare this to what they actually like, but I have
no idea how to get that data.

~~~
josefresco
Facebook maybe? And I don't mean pulling information from a profile but rather
gleaning clues from status updates and comments. I would love to see a public
vs private list of works these people would assign to themselves.

------
reader5000
Eh their analysis method is not too hot. From the comments section:

"The phrases included in the black boxes are the top 50 phrases most
statistically correlated to that group. We calculated this as follows:

1\. We calculated the frequency of every 1, 2, and 3 word phrase for the whole
population. 2\. We calculated those same frequencies within each race/gender
pair. 3\. For each phrase, we divided #2 by #1. 4\. This is the propensity of
a given group to use a given phrase. 5\. The list you see is the phrases with
the 50 highest ratios of #2/#1."

So even if a group uses a phrase 1.001x more than the population average, it
might still be listed, _if there are no actual phrase-usage differences_
(i.e., all phrase ratios will be small, and the top 50 will be arbitrary).

~~~
byrneseyeview
They're talking about the top fifty such phrases. It seems unlikely that there
would be a demographic group that is proportionately less likely to use _any
arbitrary phrase_. The only possibility is for a group's phrase usage to be
statistically indistinguishable from the average.

Fortunately, we can perform a sanity check: read some of the phrases to
someone, and ask that person which group they think the phrases came from. I
bet people will guess with high enough accuracy to establish that it's
nonrandom.

~~~
reader5000
I'm not saying the results were random. I'm saying they're not really allowing
for a "confidence interval" in how many more times a certain group uses a
phrase than average. For example if black men use "soul food" 30x greater than
average that seems like a solid result. But if it's only 1.01x more than
average that seems like noise.

~~~
mshron
Max Shron, OkCupid Data Scientist here.

In an older version of the post, we did have the actual numbers, but they
didn't seem to add much. Black women use "soul food" 20 times (!) more
frequently than the site-wide average; for black men, "soul food" is 11 times
as frequent.

AFAIK, nothing we put up for this article is less than twice as frequent for
that group as it is for the general (OkC) population.

~~~
moultano
Just for completeness, you guys should compute g-test statistics for this so
that the statisticians see something they're used to.

<http://en.wikipedia.org/wiki/G-test>

~~~
mshron
I'll check it out. Thanks!

~~~
moultano
I think it should actually give you better results. It's monotonic in kl
divergence, and does a much better job of taking into account how common the
feature is rather than just how different it is. You no longer need to do
things like throwing out phrases that appear less than x times if you use it.

------
superk
I thought it was interesting how the largest countries aren't the most
nationalistic - no Brazil, Mexico, China, Japan, etc.. I also came away with
an identity crisis - #1 good food (Soul) and seeing Mos Def, Lupe Fiasco and
Talib Kwali in the top "stuff"... dammit I might be black.

-

~~~
Lukeas14
As a black man it was refreshing to see a "description" of my demographic that
I actually related too, as opposed to a stereotype based one that you might
find on BET.

------
alexophile
I doubt I'm alone here, but when confronted with the "insert fucking theory" I
promptly went through the list of what white people like, inserting "fucking"
anywhere I could...

"Groundhog Fucking Day" kind of left a bad taste in my mouth.

------
colinprince
Re; "I am cool" being in the top phrases. Could be used in a sentence, like "I
am cool with that".

Jus saying.

------
sjtgraham
I am white and like none of those things other than guitar and software. I
suspect the case is similar for most of the HN readership.

This should be called "what the lowest common denominator like"

------
dasil003
It struck me as sort of funny that the minority each had a common self-
description (cool, funny, simple), but the closest thing white guys have is
"I'm a country boy".

------
mkyc
<http://stats.grok.se/en/201009/Soul_food>

------
proemeth
Judging by the stats (The Red Sox), we have strong US East Coast bias (due to
user base).

~~~
Lukeas14
I noticed this as well and came to the same conclusion considering I've never
met a single Red Sox fan on the left coast. Although, it could be that east
coasters all rally around one team, where as we're split between the Dodgers
and the Giants, which would present a false bias.

~~~
lanstein
You have now.

------
awongh
seventh word for asian males: software developer....

~~~
ohashi
Yeah I found that a bit strange... mechanical engineer was there too... Also,
no Japanese? Many of the other major asian countries were broadcasting.

~~~
ohashi
a software engineer #3 for indians.

------
jbooth
Yeah Sox!!

I'm taking this as scientific evidence that liking the Red Sox makes people
more attractive.

~~~
bmelton
Considering this is based on profiles from users at a dating site, wouldn't
the inverse be more likely?

~~~
steveklabnik
While normally, I'd agree with you, OKC has a high poly and hookup population,
so normal relationship scarcity constraints do not necessarily apply.

------
aristus
That last stat about reading level bothers me. "Ok, before anyone gets
offended about reading level vs race, let's show you a stat that confirms
another stereotype: religious people are stupid! And atheists are smartest of
all! Scientifically proven with a reading test based on the lengths of words,
and metrics I just made up. And don't worry that almost half of the data
points belie my analysis, ha ha ha, it confirms your prejudices, so it's ok!"

~~~
untamedmedley
If the goal of black and latino OKC members is to attract members of the
opposite sex within their race (which is most likely the case), then it would
make sense that they would communicate in their own vernacular.

That is, a black person who is otherwise educated might use a phrase that
makes perfect sense to other black people (e.g. "where dey do dat at?"; often
used to express confusion at someone's ridiculous behavior) but that isn't
clear to the mainstream. Latinos may pepper their profiles with Spanish words.

Asians might do this too if there wasn't so much ethnic/linguistic diversity
among the Asian American population. As such, they likely use "safe"
mainstream wording.

All this is to say that there are reasons that have nothing to do with
intelligence that could have caused this sort of result.

~~~
xiongchiamiov
My technical writing professor always told us to try and reduce the complexity
of our writing (I think she used Flesh Kincaid). Perhaps more of us white
people have had similar teachers, and took their advice to heart.*

* Unlikely, but hey, I might as well bring it up, since we're talking about flaws in the analysis.

