
Open-source Xenophobia Classifier for Tweets - starostaabe
https://medium.com/sculpt/xenophobic-tweets-78a9b316635
======
bifrost
This is pretty cool!

I'd be curious to see if this could be used against facebook groups as well.
I'm in a bunch of "locals" and "special interest" groups and they're
constantly full of "send the techies home" or "we don't want outsiders here"
or "we don't want $X_VARIETY_PERSON in our $GROUP because $TERRIBLE_REASONS".

~~~
starostaabe
Yeah for sure we could build a model that. We need to protect the techies :p
Facebook data is hard to get so it's hard for anyone to use it even if we made
that model it. Do you see that in other social media sites? for example
twitter?

~~~
bifrost
I see much less serious Xenophobia on Twitter, its not really a community.

On community oriented sites I see lots of issues with race/national origin,
cultural practices, religious intolerance. Complaints about gentrification are
also cryptoxenophobia and there's a lot of nuance there.

~~~
starostaabe
That's really interesting, if you know how to find a dataset from those
communities that could be a great model to build in the next weeks

~~~
bifrost
It looks like you can use the "Facebook Explorer" API to pull out some of the
text contents of groups you're members of. I could send you some names of
groups to join where there's definite out xenophobia. I've also noticed that
there is a correlation with attempts to "revise" history which seems weird at
first but then you realize its being used to silence a group of people.

~~~
starostaabe
Awesome, yes please share the names of those groups and we can look into it.

~~~
bifrost
This is my current favorite:

[https://www.facebook.com/groups/SFCurrentEvents/](https://www.facebook.com/groups/SFCurrentEvents/)

There's a lot of good people in this group, but there's also a lot of insane
xenophobia in it.

------
starostaabe
We just open-sourced an NLP model to detect Xenophobia on Twitter. You can
just copy-paste the code in this blog to try it out!

~~~
bifrost
Its a good start but this is maybe like 5-10% of Xenophobia. Border
protectionism is not good but there's a huge amount of systemized xenophobia
thats unrelated to borders.

~~~
starostaabe
Thank you, and yes we're definitely missing many different types of
xenophobia, it's a complex issue. Border protectionism seems to be on the rise
now given the political situation and probably the most common today. So
hopefully it'll be interesting to use.

~~~
bifrost
One thing I hadn't considered - with Twitter's limited post size, you can do
better statistical analysis of word usage. If someone mentions race or racial
characteristics an unusual number of times - you can probably count the tweet
as being racist or "racialist".

There was a recent HN thread that I was commenting on where another commenter
mentioned that the author of the article discussed mentioned race and national
origin a certain number of times. While it was relevant to the story I could
also see where it might not have been for another type of article. You'd
basically need to apply some NLP and some sort of geopolitical filter...

~~~
starostaabe
Yes, I followed until the part about how the geopolitical filter would help. I
might not be understanding well since I don't know what the story was about.
Could you please elaborate on that?

~~~
bifrost
It was an American article discussing how an Iranian sourced/proxied attack
had XYZ effects.

The author mentioned the attack was traced/lost in Iran. Talking about Iran
was called a cultural attack, however since this had more to do about Iran's
lack of cooperation with American law enforcement it was actually somewhat
relevant.

If the article was talking about Iranians as being bad people simply for being
Iranian, that would have been a cultural attack.

------
emh26
Love the model, thanks! Will let you know what I build with it

