
The Erdős paradox: When a mathematical number and Wikipedia collide - skybrian
https://blog.wikimedia.org/2018/02/27/erdos-paradox/
======
contravariant
Just playing devil's advocate here, but with the numbers given in the essay
(10 out of 250 male entries tagged for deletion vs 2 out of 12 female entries)
I got a p-value of approximately 10% [1]. By most criteria this wouldn't
qualify as statistically significant. This doesn't immediately disqualify the
entire essay, but the claim that women are 'over four times more likely to be
deleted' is somewhat dubious.

[1]:
[http://www.wolframalpha.com/input/?i=1+-+(1+-+12%2F262)%5E12...](http://www.wolframalpha.com/input/?i=1+-+\(1+-+12%2F262\)%5E12+-12+\(12%2F262\)\(1+-+12%2F262\)%5E11)

~~~
stareatgoats
There are many things that are not statistically significant, but still
interesting when it seems likely* that a larger sample would give a similar
result.

 _Only 17.47% (as of 26 February 2018) of the English Wikipedia 's biographies
are about women:
[https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_in...](https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_in_Red)

EDIT:_ I rechecked the quote from the article about the likelihood of deletion
and found it to be " _in this cohort_ of Erdős Ones, the pages for women were
over four times more likely than men’s to be nominated for speedy deletion."
(my italics). FYI.

~~~
dahdum
That % isn't really surprising considering the patriarchal history of the
world. You couldn't possibly get to 50% unless you over-represented women in
the near present or adjusted notability standards.

However if it were 17.47% for current times, that would be a significant bias
that needs to be addressed. It still might not hit 50/50 given reduced
opportunities for women still today, but it should be moving ever closer.

~~~
shele
The fraction of women with Erdos number one is already very small, no reason
to assume that they are less important than corresponding man with the same
Erdos number.

------
matt4077
This is the essay the interview references: [https://medium.com/q-e-d/whos-
important-a-tale-from-wikipedi...](https://medium.com/q-e-d/whos-important-a-
tale-from-wikipedia-a370dc6ef078)

------
thriftwy
> “were over four times more likely to have their importance questioned” and
> their pages marked for speedy deletion

Deletionists of Wikipedia are just destinied to fall on their face, and
embarrass the whole project.

Deletionism is the single big thing that makes Wikipedia somewhat sour in my
view.

~~~
thriftwy
Here, I have a Venn diagram to illustrate a point:

[https://www.meta-chart.com/share/how-wikipedia-
deletionism-w...](https://www.meta-chart.com/share/how-wikipedia-deletionism-
works)

Two important takeaways are: A lot of stuff on Wikipedia just isn't
encyclopedic. For example, there's a lot of sports trivia about obscure teams
and local leagues. Such data deserves structured storage and not Wikipedia,
and I just don't feel "2013 Richmond Raiders season" is more important than
any single matematician who worked with Erdős.

Second takeaway, a whole world of topics is just not ever noted by Wikipedia.
For examples, you can have whole branches of local music scenes ineligible for
note because every single artist fails to "be mentioned in a paper made of
paper" (in 2010s, seriously) or "sell 3k discs made of plastic and foil"
(again, this is in 2010s).

If there's some kind of consistent policy I definitely don't see the value of
it.

~~~
ZeroGravitas
The justification for the policy I have heard is that people should be able
easily verify the information.

If it's in the New York Times you can probably believe that it's not been
created to game Wikipedia, the same cant be said for a random blog.

I regularly see people complain about this without mentioning how they would
solve this problem.

~~~
thriftwy
Most of papers made of paper are absolutely not available for verification.
Ditto for research articles behind paywall or just cited as paper sources. You
could invent dozens of such credible sources with no chances for anybody to
ever check.

I can understand "we could not verify this so we are deleting this article",
but not "we speedy deleted this article without checking anything". The latter
has no excuses except for obvious commercial spam.

~~~
ghaff
The correlation between notability and verifiability certainly starts to break
down quickly once you get past easily accessible, well-known, and well-
regarded sources. No one is going to chase down paper copies of obscure
magazines, newspapers, or out-of-print books to verify some statement,
especially if it's not highly questionable for some reason. The system does
kinda work because most people aren't outright inventing fake references to
game the system.

~~~
thriftwy
What would be your criteria for the system failing to work?

What can go visibly wrong if Wikipedia would cover different subset of obscure
topics than it does currently?

We're not talking about faking facts in well-visited articles, since a) that's
not what is being discussed, and b) that has been done anyway, getting into
news multiple times.

~~~
posterboy
Why does HN have a downvote button (for prolific users with karma>500,
anyways? (and why doesn't facebook?)).

~~~
syrrim
For sorting comments on usefulness. Of 10 toplevel replies to a post, which
should be displayed first? The highest-voted/most-recent one. Also spam.

~~~
posterboy
spam, is exactly the problem?

------
tomtimtall
This is not good for the woman in science campaign.

You are not notable just because you had at minimum one collaboration with a
famously notable person. She seems to be dissatisfied that she can’t just use
the Erdos number as an argument alone to add thin bios for every female
collaborator he had. And then goes on to complain that there perhaps exist
equally unimportant woman who would not fit into this catagory “just because
they didn’t collaborate with Erdos” but just as in-notable as the not notable
collaborators...

Stop judgeging notability by Erdos number if that is you singular claim to
fame then the correct action is speedy deletion independent of sex.

------
astrodev
The article has it backward. As a staunch deletionist, I can attest that
articles about non-notable women are much harder to get deleted than articles
about non-notable men.

~~~
olliej
Cool, so have you gone through and deleted all the articles about individual
software engineers? Because I know of numerous.

What is your definition of "notable"? I assume if you can produce stats you
have objective criteria?

Have you deleted "local hero" articles?

What is the criteria for relevance? I recall there being numerous studies in
the past demonstrating articles about important events and people in non-
western environments being deleted.

~~~
astrodev
The notability criteria are site-wide, not personal and depend somewhat on the
source of notability of a given subject (e.g. different criteria for academics
than for most other living persons). Check out [1] if you are genuinely
interested.

Actually, I think there is a significant bias against things that lack sources
in English.

[1]
[https://en.wikipedia.org/wiki/Wikipedia:Notability](https://en.wikipedia.org/wiki/Wikipedia:Notability)

------
carlmr
I'd like to see a bit more analysis past the Erdös number. What could other
(measurable) notability indicators be. Number of publications, length of
scientific career, number of citations, number of people they taught, ...?

Her analysis here seems very shallow and already operating from the
preconceived mindset that this must be misogyny. I'm not saying it's not, but
the argument is weak.

~~~
olliej
Ok, so the fundamental issue the blog references is "women who worked with
Erdos are significantly more likely to be tagged for early deletion", that
statement is backed up by the actual stats.

So the question is, what is the cause for this discrepancy?

It is possible that the women who worked with Erdos were much more likely to
be irrelevant than the men -- by a factor of 4 -- in which case this would be
reasonable.

The alternative (her claim) is that they were marked for deletion because
other editors default to assume that women are less relevant than men, unless
proved otherwise.

My question is: are the people who are tagging these articles for deletion
field experts -- mathematicians or whatever -- who can actually understand the
relative importance of individuals, or are they running through a checklist?
If they're just running through a checklist does their checklist just
reinforce old stereotypes? For example, old newspapers would not bother
interview women engineers, scientists, etc, and even if they were historically
women were absolutely kept out of those fields, so even if interviews were
random, statistics would ensure low public visibility. Hence relying on the
existence of published biographies is not sufficient, and is arguably less
relevant than the Erdos number.

Does it treat an Erdos number as being proof of importance in some cases, but
not in others? If your default view is men are real mathematicians, and women
are not, it is easy to interpret a low Erdos number as evidence of importance
(it confirms your belief of relevance), but if you start out with the converse
-- this person isn't relevant or important -- then it's just trivia.

~~~
carlmr
>My question is: are the people who are tagging these articles for deletion
field experts -- mathematicians or whatever -- who can actually understand the
relative importance of individuals, or are they running through a checklist?
If they're just running through a checklist does their checklist just
reinforce old stereotypes? For example, old newspapers would not bother
interview women engineers, scientists, etc, and even if they were historically
women were absolutely kept out of those fields, so even if interviews were
random, statistics would ensure low public visibility. Hence relying on the
existence of published biographies is not sufficient, and is arguably less
relevant than the Erdos number.

Exactly, these are the things you need to analyze before making such a claim.
If for example we look at other confounding factors and find that these
explain most of the variance, then we can discuss why these other factors
might be influenced by gender (e.g. less interest from contemporary news
publications).

