

Heat map of homophobic and racist tweets - washedup
http://users.humboldt.edu/mstephens/hate/hate_map.html#

======
deltaqueue
Nearly all questions answered in the "about map" link:

=========================================

The data behind this map is based on every geocoded tweet in the United States
from June 2012 - April 2013 containing one of the 'hate words'. This equated
to over 150,000 tweets and was drawn from the DOLLY project based at the
University of Kentucky. Because algorithmic sentiment analysis would
automatically classify any tweet containing 'hate words' as "negative," this
project relied upon the HSU students to read the entirety of tweet and
classify it as positive, neutral or negative based on a predefined rubric.
Only those tweets that were identified by human readers as negative were used
in this analysis.

To produce the map all tweets containing each 'hate word' were aggregated to
the county level and normalized by the total twitter traffic in each county.
Counties were reduced to their centroids and assigned a weight derived from
this normalization process. This was used to generate a heat map that
demonstrates the variability in the frequency of hateful tweets relative to
all tweets over space. Where there is a larger proportion of negative tweets
referencing a particular 'hate word' the region appears red on the map, where
the proportion is moderate, the word was used less (although still more than
the national average) and appears a pale blue on the map. Areas without
shading indicate places that have a lower proportion of negative tweets
relative to the national average.

The numbers that appear in the map during a mouse hover indicate the total
number of hateful tweets and number of unique users sending them in each
county.

==========================================

EDIT: The mouse overs don't appear to work very well in Chrome or Firefox, but
from the one or two times I was able to see some numbers it appears that each
red circle may be a dozen or less tweets. Also, the hot zones dissipate
significantly the further you zoom in, so without any statistics or numbers
it's difficult to draw conclusions.

A very interesting experiment, but given that the data is only normalized by
Twitter traffic (non-response bias) this is in no way indicative of the actual
distribution of racism.

~~~
benjamincburns
> Because algorithmic sentiment analysis would automatically classify any
> tweet containing 'hate words' as "negative," this project relied upon the
> HSU students to read the entirety of tweet and classify it as positive,
> neutral or negative based on a predefined rubric. Only those tweets that
> were identified by human readers as negative were used in this analysis.

I wonder how well a Bayesian classifier would work if the this was used as a
training set. If it worked relatively well, there's no reason why you couldn't
create a live version of the map.

Something like <http://aworldoftweets.frogdesign.com/> maybe?

~~~
bravura
Not very well. Twitter sentiment is a difficult problem.

Consider using millions of training examples (vs. thousands). This was done as
part of the "distant supervision" Twitter sentiment technique. What this means
is that tweets with positive emoticons were labeled as positive sentiment, and
negative emoticons were labeled as having negative sentiment. Emoticons were
stripped before training. This system got 80% accuracy.

[http://cs.wmich.edu/~tllake/fileshare/TwitterDistantSupervis...](http://cs.wmich.edu/~tllake/fileshare/TwitterDistantSupervision09.pdf)

------
mxfh
There's a xkcd for that: <http://xkcd.com/1138/>

"Pet Peeve #208: Geographic profile maps which are basically just population
maps."

~~~
martythemaniak
None of the major population centres are well-represented in that map.

~~~
Millennium
Come again? It looks to me like the map tags almost every one of the major
population centers. Southern California seems abnormally low, but that's just
about it.

~~~
j2kun
Chicago?

------
JoeAltmaier
Would like to see per-capita, or age profile. One hot spot is centered on
eastern Iowa/western Illinois. I live there, and am utterly surprised by this.
That coincides on a cluster of college campuses; perhaps its careless language
by college students?

~~~
benjamincburns
Total wild ass guess, but I'd bet the total number of tweets for a region is
roughly proportional to its total population. If that's true, the normalized
results you see here will be very similar to the per-capita results. It'd be
interesting to verify this, however. It would tell you "this region's twitter
users are X% [more|less] 'hateful' than the region's general population."

~~~
mtowle
There's no way Montanans tweet just as often as Californians.

~~~
benjamincburns
It'd be interesting to find out. There are a lot of very diverse populations
in California.

[Edit: I should place extra emphasis on the word "roughly," in my OP as well.]

------
rgejman
A fundamental problem with this strategy is that is may bias the results
towards prolific haters—i.e. people who are frequently hateful on Twitter.
These people may live in places where social norms permit online hateful
speech without repercussions. This strategy will miss people who are haters,
but must be more subdued in their expressions of hatred online.

It would be interesting to identify people who have posted just a few hateful
messages—perhaps few enough that they can get away with it in their local
social context. This may more sensitive to occult haters.

Something like this: 1\. Identify individual twitter users as being hateful in
a particular category. For instance, user A uses the word "chink" 3x and "fag"
5x in 100 tweets, so he gets added to the "chink" and "fag" categories. Play
with these threshold to see what makes sense. 2\. Divide the # of hateful
users in each category by the total # of users in that location. Allocation of
users to location can be done proportional to the # of tweets they make from
each location.

Cool project :).

------
showerst
I don't see a scale anywhere; am I just missing it?

It makes a big difference if red is 4-5 tweets (per what time period), or 4-5
thousand.

~~~
benjamincburns
The data is normalized by number of tweets for a particular region (county, I
believe). So what you're looking at is a comparison between the portion of
total tweets for a region which is racist/homophobic/disabled-biased.

Put differently: If county A has 100 tweets over the time period and 1 of them
are racist it will appear identical to county B which has 1000 tweets of which
10 are racist over the same time period.

------
dkarl
You can't zoom in very far without the heat map degenerating into points for
individual counties. I suspect in many areas they represent individual tweets
or tweeters, but I'm not seeing any hover-over data, so I'm not sure. I'm
surprised -- 150k tweets sounds like a lot, but there isn't enough data for a
heat map effect unless you're zoomed out to the national level.

~~~
astine
There are 3000+ counties and county equivalents in the US, so that makes about
50 tweets per county. That could easily be output by one or two people per
county. Seeing as a lot of southern cities seem clean but with one or two
really bright points nearby, I suspect that your thesis is correct and that
most of these points are caused by particularly vociferous individual racists.

------
0xdeadbeefbabe
It's funny (or annoying) how charts invite criticism and feature creep.

Thanks for an interesting chart.

What did you do to cut down on lying human classifiers? Do you give them an
incentive? Did you have them vote as a group on the classification of a
sentiment?

Whites aren't allowed to say the N word, and Anita can say the S word
[http://www.youtube.com/watch?v=Qy6wo2wpT2k&t=0m45s](http://www.youtube.com/watch?v=Qy6wo2wpT2k&t=0m45s).
But maybe your human classifiers can handle this problem too? I hope they got
a good grade.

------
downandout
I have always disliked the term "homophobic". "Phobic" means that someone has
a fear of something. Given that the people using these words in tweets are
doing so in a very public fashion, they are not very likely to be afraid of
gay people. They are perhaps hateful and/or prejudiced, but phobic is just an
entirely inaccurate term in this and most other instances where that term is
used.

~~~
lotsofcows
Bullies are only brave when targeting a minority. They're scared of things
they don't understand but will only make a noise about it if they feel in a
strong position.

A better complaint about the word homophobic is that its meaning is dependant
on one understanding "homo" to mean homosexual. Otherwise it means one who
fears same-ness. Most homophobes actually, obviously, fear difference.

~~~
downandout
But again, that makes a tremendous number of assumptions as to why the person
is saying those words. I get that it is a term meant to degrade and shame
those who use these terms, but I just don't think it's an accurate description
in the majority of cases.

~~~
glomph
Words don't have to mean the same as their etymology though. This one has
clearly moved away from that.

------
batbomb
In Idaho, New Mexico, and Wyoming, the counties that look really hot are very,
very low population counties. For example, I'm assuming the giant red spot in
New Mexico is Sierra County, population 11,988. The southern most county in
Idaho is Oneida county, population 2900 (My high school was nearly that big).

------
wtvanhest
Another reason why it may seem off to the viewer, (Mississippi is less racist
than other areas may have to do with the populations in particular areas.)

If you have 50% people of the racial group you are sampling racist tweets from
living in a particular state, you should see less racist tweets by %.

------
jiggy2011
Some of these words can also have other meanings. For example "fag" is english
slang for cigarette and "chink" obviously has other meanings. I also know gay
people who prefer the term 'queer' in describing themselves. I guess you could
get a lot of "nigger" too by retweeting hip hop lyrics..

~~~
fuzzybassoon
From the 'Details about this map' popup:

Because algorithmic sentiment analysis would automatically classify any tweet
containing 'hate words' as "negative," this project relied upon the HSU
students to read the entirety of tweet and classify it as positive, neutral or
negative based on a predefined rubric. Only those tweets that were identified
by human readers as negative were used in this analysis.

------
benlower
interesting idea but this map needs work: you really have to zoom in to see
what's really happening. when zoomed out it looks like practically the entire
USofA is full of hate.

------
dbg31415
Should also include, at minimum, the percentage of Tweets that were racist /
sexist / whatever. We need to know the total volume of tweets before we can
tell if it's higher or lower than just "average" racism.

~~~
KC8ZKF
"To produce the map all tweets containing each 'hate word' were aggregated to
the county level and _normalized by the total twitter traffic in each
county_."

Emphasis mine.

------
youngerdryas
Holy map fails batman. Zoom out and feel the hate.

