
Gender and verbs across 100K stories: a tidy analysis - var_explained
http://varianceexplained.org/r/tidytext-gender-plots/
======
tomp
> I think this paints a somewhat dark picture of gender roles within typical
> story plots. Women are more likely to be in the role of victims- “she
> screams”, “she cries”, or “she pleads.” Men tend to be the aggressor: “he
> kidnaps” or “he beats”. Not all male-oriented terms are negative- many, like
> “he saves”/”he rescues” are distinctly positive- but almost all are active
> rather than receptive.

Got mathematical analysis, but terrible semantic interpretation.

There are only 3 really negative verbs for men ("murdered", "kills",
"kidnaps"), and 3 distinctly positive ("saves", "defeats", "rescues"). "Beat"
is ambiguous, personally I would more likely interpret it as "defeats" rather
than "hit"/"punch". This is in no way a "dark" picture, and the only relevant
conclusion is the one he briefly mentions in passing: women are described as
more passive, men as more active.

~~~
belorn
The term aggressor is misplaced. The data shows how men as a general rule
always play the antagonist and the disposable henchmen. The mustache-twirling,
eye-rolling, leering, cackling, and hand-rubbing villain is by definition
male, and in the rare occasion a plot is different and switches the gender of
the antagonist it also adds additional changes like poison and stabbing
instead of guns and punching.

~~~
chongli
_mustache-twirling, eye-rolling, leering, cackling, and hand-rubbing villain
is by definition male_

Apart from the mustache (or maybe not) this is an apt description of the
wicked witch trope of female villains. So I'm going to say, no?

~~~
belorn
The wicked witch trope is named after the wicked witch of the west character,
which don't do much leering or hand rubbing. There is a lot of cackling,
possible some eye-rolling, but that's about it. An other famous witch villain
is the white witch (Narnia), which has none of those attributes.

But lets look at a famous villain that has been portrait as both female and
male roles and see how they differ. Moriarty is male in the TV show Sherlock
([https://www.youtube.com/watch?v=YN7DYPJLXkc](https://www.youtube.com/watch?v=YN7DYPJLXkc)),
and female in Elementary
([https://www.youtube.com/watch?v=lY7i8cQontg](https://www.youtube.com/watch?v=lY7i8cQontg)).
Which character is show more as the typical villain, and which is portrait as
more complex with redeeming values? If they keep the script identical with the
exact same lines, what would happened if the genders was reversed for the two
shows?

~~~
chongli
_The wicked witch trope is named after the wicked witch of the west character_

Yes, but the actual details of the trope go back much farther in time. The
Fates of Norse and Greek mythology, the Three Witches in Macbeth, Baba Yaga
from Slavic folklore. There is a long, long tradition of female characters
with supernatural power, unsettling appearance and behaviour, and a tendency
toward transgressing taboos such as cannibalism.

As for your Moriarty example, you're butting up against the TV industry's
insistence on casting beautiful actresses in almost every role. This is a
problem! If you replaced Natalie Dormer with a woman in her late 70s who's
never had plastic surgeries you'd have a different comparison to think about.

~~~
belorn
The villain as plotted in current shows share roots in the older mystic
stories but they also mirror current cultural values rather than past cultural
values.

If for example we would have Hansel and Gretel, which is full of old symbolism
about past cultural views of older women, it would be completely impossible to
gender swap the antagonist. Interpreting a male antagonist in that story in
current culture we would invoking feelings of pedophilia and not cannibalism,
changing the story in a very radical way.

With current cultural values, stories generally cast female antagonists with a
strong theme of betrayal. Even in the few cases of female henchmen like that
one specific James Bond movie with female henchmen, we still see that arc.
Male henchmen in other movies don't get such arcs, and is generally simply
thrown in as an obstacle to be disposed of. Female henchmen are also generally
not killed, which further limits the possibilities.

Last, the Sherlock version would not work with a late 70 old woman if they
kept all other aspect of the character intact. Listen on the clip and try to
imagine a old woman saying it. The character explicitly say they want to be
seen as the classic villain, and as such it do not work unless its male. Even
the wicked witch do not go around and say to the protagonist that _" I am the
villain and you should hate me"_.

------
hasenj
People who have poor understanding of human nature will consider this an
evidence of sexism.

Think for a minute about why "Fifty shades of grey" is popular among female
readers.

If there are many stories are filled with males competing over resources and
women (in the form of violent criminals kidnapping women and brave heroes
rescuing them) then might it be that these kinds of stories are what people
are looking for?

Try to imagine a story about a weak man who cannot defend himself against a
gang of three women who kidnap him, only to get saved by his brave girlfriend,
who upon rescuing him promises him to stay by his side for ever.

Just try it. Does it sound like an interesting story?

~~~
paulddraper
Unfortunately, every single difference of any significance is often took to be
sexist and (of course) wrong sexist.

Why are the top 100 chess players men? Sexism. Why are there more male than
female programmers? Sexism. In non-romance movies, why are there more male
leads? Sexism.

There is legitimate wrongful sexism, but they aren't usually found in first
world countries.

It's as if we ever agreed there was a significant innate difference between
the sexes (or races for that matter), the world would come crashing down and
put us in the Stone Age.

~~~
zepto
Right - there's no wrongful sexism in first world countries and there is no
difference in the messages males and females are given during their upbringing
about how they should behave in society.

Also, family courts make perfectly fair decisions.

/s

~~~
paulddraper
> Right - there's no wrongful sexism in first world countries

I don't think you read my comment, because it disagrees with that statement.

~~~
zepto
You said:

"There is legitimate wrongful sexism, but they aren't usually found in first
world countries."

------
openasocket
The reference at the end talking about comparing changes over time reminded me
of a problem I've been kicking around. For this particular system that's not
too hard, since you're putting each word on a one-dimensional
femininity/masculinity scale, so you could plot a word on a line graph or
something. But what do you do if you want to evaluate the changes of more
complex relationships over time? Not just mapping words to a constant vector
space, but modeling the relationships between words, such as clusters or
word2vec representations. With something like word2vec you can take a bunch of
words and project the vector space onto a plane so you can see the relative
distances, but how do you express changes over time? You could show a bunch of
planar projections for different instants in time, but it's hard to look at
that and capture the changes.

So how do you visualize changes to these more complex interactions between
data points, and also how do you mathematically quantify some of these
changes? I'd really appreciate any advice on this. And sorry that this is kind
of off topic for the article :)

~~~
swf
I did something like this in a project for my former work at a research group,
and it was very difficult to visualize due to the number of variables we
needed to communicate in the visualization.

What we were trying to do was use word2vec to model changes in relationships
between 700 proteins and one in particular, which is related to cancer/tumor
growth. We created multiple word2vec models based on year windows of medical
journals (so the 2003 model had input journals from 2000 - 2003).

To visualize the models we used a D3 force graph where the nodes were the 700
proteins and the edges were known discoveries of relationships between
proteins that had years associated with them - as in X protein was discovered
to be related to Y protein in 2007. The relationship data was curated by
people, independently of the word2vec models. The size of each of the nodes
was determined by the word2vec model for the particular year's similarity
score between that protein and the cancer-related protein we were interested
in.

To see the changes between years, we used a year slider which the force graph
would respond to by animating the sizes of the nodes in line with the
particular year model's similarity score for each protein. In addition, the
color of the nodes represented changes in the models' similarities between the
proteins - more green meant that the word2vec model's similarity score had
increased compared to the previous year's model, and red meant it had
decreased.

The visualization is useful, but it is a bit of a mess considering there are
700 . I'll message you the link and if anyone else want the link I can send
it, but I'd rather not post it here since it's hosted on my college's CS
department server and it's not equipped to deal with a lot of traffic.

Also if anyone has an idea of how else we might have done it I'd be interested
to hear it.

Edit: Didn't realize HN doesn't have a PM feature - if there's another way to
send it to you and you're interested let me know.

~~~
openasocket
Thanks, that's helpful! And you can email me at <my HN username> at gmail.com.

I'm also curious if there are any ways to quantify, mathematically, the
changes over time. There's the simple sum of the squares of the changes
distance to get a sense of the "kinetic energy" of the system, but I'm
wondering if there are some more clever analyses, especially something that
can quantify localized changes versus global changes.

Edit: so are you running a separate word2vec thing for each year's dataset? If
so, how to you map between them, because the orientation the word2vec mapping
generates will be random, and I worry that trying to rotate the mappings to
some common axis could obscure some of the data.

~~~
swf
Sent! Yeah, we made a model for each year's dataset. In our case, we were only
interested in the similarity between our target protein and the others, so we
used the model's similarity measure between those in order to avoid problems
with varying orientations between models.

------
zakk
The gender is strongly correlated to the biological sex. The biological sex
is, in turn, strongly correlated to physical strength, aggressivity and other
traits.

I enjoyed the technical description, but the results didn't exactly shock me!

------
wyldfire
> what verbs are used after “he” and “she”, and therefore what roles male and
> female characters tend to have within stories.

It might've been better had the author (and the Jane Austen article's author)
used some NLP processing to see whether the pronoun was actually the subject
of those verbs. But I'll grant that it's usually the case.

Also interesting: gender and the object of those verbs.

EDIT: after some research (that I should've done before posting), it's a
remarkably effective technique and it seems only the most contorted sentences
might get tripped up. English is nearly always Subject-Verb-Object.

~~~
dahart
Do you need NLP for that?

The forms he, she and they are used when a pronoun is the subject of a
sentence. The forms him, her and them are used when a pronoun is the object of
a sentence.

[https://www.englishgrammar.org/words-heshe-himher-
hishers/](https://www.englishgrammar.org/words-heshe-himher-hishers/)

I would think looking at the verb that comes after he/she also doubles as a
subject/object filter.

~~~
coroxout
I agree that the nominative case endings and usual SVO order of English should
prevent (nearly all) false positives.

(The only one I can think of is that comparatives are meant to take the
nominative case, e.g. pedantically it's supposed to be "he" rather than "him"
in "A man smarter than he would decline". However, this rather convoluted
sentence structure is rare, even more unlikely in a plot summary rather than
the full novel text, and almost always followed by the subjunctive rather than
an indicative verb.)

My bigger concern was what doesn't get caught because it appears after a noun
rather than a pronoun, or has an adverb in the way. I thought major dramatic
plot points might be more likely to use the character's name rather than a
pronoun, and so we might see fewer words like "murders", "defeats", etc. From
the results it seems like those words are still present in large quantities,
so perhaps I'm wrong.

I'd like to see it done with "they" too, partly as a control case and partly
to see if any verbs are more common for individuals rather than groups
(although the rise of impersonal "they" may hinder that aspect of the
analysis).

~~~
wyldfire
> My bigger concern was what doesn't get caught because it appears after a
> noun rather than a pronoun,

Yeah, some clever gender/name lookup would be a good idea, many subjects in
plots are the characters' names. Maybe even more subject names than pronouns.

That said, it seems unlikely that there's a pronoun-substitution gender bias,
so it would probably just yield more samples of the same trend.

~~~
dahart
What doesn't get caught is a super interesting question. (As is object genders
too.) How often he occurs vs she is important. How long he is talked about vs
how long she is talked about; I'm not certain about screenplay conventions,
but usually the first sentence would name the character and some number of
subsequent sentences might use she rather than repeat her name.

So yeah, there's plenty more room for making this analysis scientific, and no
reason to assume it's unbiased.

------
empath75
It's interesting how thoroughly ingrained sexist concepts are in the language.
Even a verb that's fairly active like 'resist' assumes a power relation in
which they are in a worse position.

I'd like to see this done by country or year or language or genre.

~~~
artursapek
Is it sexist if it's reflective of reality? Men commit more aggressive crimes
than women by far.

~~~
shouldbworking
Shhhhhh! Theres no room for reality in here, were pretending that gender is a
made up sociological concept

~~~
ue_
To my knowledge, gender is thought of as a complex of at least two aspects:
those differences between what is thought of as masculine and feminine, and
the roles assigned to genders. In these cases, gender pertains to the ideas of
gender identity and further gender expression. In other cases, gender is also
thought of as including or being inextricably related to biological sex.

Gender isn't "made up" in the same way that words aren't "made up"; it is an
aspect of society, but unlike words, gender is with some relation to
observable physical biological expression, usually termed as 'sex'. But the
concept of gender as I have seen it used is certainly depending on society,
rather than physical sex.

Variations within sex (such as intersex) is another question and is unrelated
to gender, as far as I know.

~~~
tomp
The problem with "gender" as a concept is that they hijacked existing words
("man" and "woman") that were previously used for "sex". This can be clearly
seen by trying to define these words _without_ reference to sex - you can't
except by defining them recursively ("men are people who identify as men"), in
which case, I think a better choice would be to invent new words for this new
concept.

~~~
ue_
Sex is to male and female as gender is to masculine and feminine.

------
stefanwlb
The author failed to mention the context of the data chosen; specifically the
dates. For example, rob vs. steal, would change depending on the century the
work was created.

~~~
swf
The author mentions in the second paragraph that the data comes from scraping
Wikipedia articles' plot descriptions. So the plots might be old, but the
descriptions (and language) were all written recently.

~~~
SerLava
Hm. Wouldn't that have little bearing on the result? Can't really say "he
poisons" when that wasn't the plot of the story.

~~~
lmkg
But that's the point. "She poisons" is more likely to be the plot of the story
than "he poisons."

~~~
davrosthedalek
There could be a bias in the summary writers too, in that they prefer "he
murders" and "she poisons" for the same method of killing.

------
_nalply
Direct link to a scatter plot (x axis: quantity, y axis: gender):

[http://varianceexplained.org/figs/2017-04-27-tidytext-
gender...](http://varianceexplained.org/figs/2017-04-27-tidytext-gender-
plots/total_log_ratio_scatter-1.png)

------
unignorant
We did a similar analysis of gender stereotypes in the Wattpad online writing
community last year:
[https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/...](https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13112)

The conformity with existing stereotypes was pretty depressing, and they're
perpetuated more or less equally be male and female authors.

------
coldtea
> _I think this paints a somewhat dark picture of gender roles within typical
> story plots. Women are more likely to be in the role of victims- “she
> screams”, “she cries”, or “she pleads.” Men tend to be the aggressor: “he
> kidnaps” or “he beats”._

How's that different from actual real life beatings, murders and kidnappings?

Aren't men usually the aggressors?

~~~
kaybe
> _It’s interesting to compare this to an analysis from the Washington Post of
> real murders in America. Based on this text analysis, a fictional murderer
> is about 2.5X as likely to be male than female, but in America (and likely
> elsewhere) murderers are about 9X more likely to be male than female. This
> means female murderers may be overrepresented in fiction relative to
> reality._

------
darkerside
I'd wager that much of this could be explained by the "masculine" verbs being
used in relation to a protagonist more often, whereas the "feminine" verbs
describe reactions to the actions of a protagonist.

------
merricksb
Related post from a week or two ago:

[https://news.ycombinator.com/item?id=14156079](https://news.ycombinator.com/item?id=14156079)

------
hanoz
If you're finding some disonnence between 100K stories and your own
interpretation of reality, you've really got to ask yourself where the problem
might lie.

~~~
empath75
Because I should adjust my perception of reality to match fiction?

~~~
hasenj
The kind of fiction that gets popular reflects _something_ about reality.

~~~
tptacek
Probably a lot of what's been most popular is stuff that very few people on HN
would appreciate.

------
jameskegel
HN : Social Issues :: general conversation : a vegan

~~~
Neliquat
That cuts so many ways... But more than ever, I see a great diversity of
social outlooks on HN. Might be recent politics, but the result is the same. I
love the sub arguments on lab meat, tech ethics, gender equality and
representation, and vi vs emacs. Even when I do not agree, I find coherent and
consistent alternate points of view useful in seeing the world.

