Hacker News new | past | comments | ask | show | jobs | submit login
How I Snuck Through Wikipedia’s Notability Test (medium.com)
97 points by steven on Mar 6, 2015 | hide | past | web | favorite | 70 comments

The article could be titled "How I harnessed the cronyism that has paralyzed Wikipedia and scored a free bio out of it, and Why you'll never succeed if you try the same"

The key part of his article is a little more than halfway down: "I also realized that my well-sourced biography can be attributed, in part, to the fact that every word I’ve published is available online, and thus easily cited as a reference. In the eyes of Wikipedia editors, my being a digital native appears to be a distinct advantage: it’s much easier to link to existing online sources than to visit libraries and dig through decades-old physical media, such as books and newspapers."

I am a Wikipedian. He has that right. Wikipedia has a HUGE bias toward online sources, as a practical matter, rather than toward print sources. That's not official policy on Wikipedia. Wikipedia policy makes clear that it is perfectly okay to cite a book that is a reliable source on other grounds, even if the book is very hard for any other Wikipedian to find in a library or as a used book. But the practical thing that happens on Wikipedia is that most articles are based on sources that Google can find, and many articles that are otherwise well sourced will be doubted if editors can't find any confirmation of their content on Google.

Basically Google's ability to serve up what humankind has put onto the public-visible World Web Wide limits the development of Wikipedia. I learned to do research the old-fashioned way, at a university with a huge library, and began using online databases there as a student job back in the 1970s. Today I can look things up with practiced facility online, but I still find that looking around in a library full of dead-tree books can turn up all kinds of information that has never been online. It takes both kinds of research to build a good encyclopedia.

Just for fun, take a look at how Wikipedia describes what Wikipedians should be here to do as encyclopedia editors,


as well as its description of what Wikipedia is not.


> the minutiae of the universe (toilet paper orientation: 5,147 words;

I submitted that article to HN. It got a few of upvotes but was then flag-killed. https://news.ycombinator.com/item?id=6047531

I submitted it because it's a pretty thorough article. It's not a stub of nonsense; they talk about some of the research behind the intense arguments that can result from hanging a roll "the wrong way".

I learnt to always leave a comment about why I submitted an article if I think it's going to be flagged.

I see the TP debate as an example of a more general phenomena where people consider themselves to be an expert on any subject they have even a single direct experience with. ('Expert' is probably too harsh though; it's more like "I want to share my knowledge, even if it's not actually useful.")

Edit: It may even be a biological instinct.

PS: This comment is probably a perfect example, now that I think of it.

I'm sure you all know the phrase "with enough eyes, all bugs are shallow". The same could be applied to Wikipedia and helps explain how the author slipped through: People only contribute to things that are in their sight. The author says this himself by noting that Wikipedia is human driven. His article wasn't notability-checked after the initial creation because no editor probably saw it. You see the opposite in the GoT article, where every detail is bought under scrutiny.

He also probably got a pass because the article was created by an established editor rather than a brand new user. Given enough time though, I'm sure some editor might've seen it and would have put the notability template that exists on the page now.

Dig into the depths of Wikipedia and I can guarantee you'll find tons of non-notable articles that no one bothered to bring up for deletion, but is headlined with cleanup templates. Heck, just the other day I saw an article that tried to pass for notability by indiscriminately littering the page with citations that just barely mentioned the subjects' name. All it takes is one well-meaning editor to double-check any dubious claims by looking at their citations and making sure they are reliable.

It's a bit silly to make it sound like he gamed the system. He didn't pass the notability test; it just wasn't administered yet. The editors that worked on his behalf were well-versed in the rules and they worked within them, acknowledging that the article might not last. You either get a speedy deletion of something obvious in the new queue or you get discussions for deletion for things that aren't apparently notable to someone else.

Heck, just check out the "list of collaborative software" and "list of project management software" articles; they're there more as a quarantine zone/chew toy than anything. Every time someone makes a crappy productivity webapp they apparently feel justified in creating its wikipedia article. These often continue to exist for a year or two before an editor finally takes the time to throw it into the delete queue. At least these are indexed in one place, whereas biographies of obscure people have no such convenient referral page.

> It's a bit silly to make it sound like he gamed the system. He didn't pass the notability test; it just wasn't administered yet.

Precisely. The moment someone goes and checks the citations, it turns out they're mostly junk: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

This stuck around for the same reason hoaxes stick around: it looked vaguely legitimate so nobody bothered looking in-depth or nominating it for deletion.

> He also probably got a pass because the article was created by an established editor rather than a brand new user.

This is the biggest deal that no one else is realizing. The rest is basically in the details. Not only did an established editor create the page, but the expertise in formatting, properly citing (or at least making it look like it), and writing correct verbosity that imitates a very professional and 'notable' wikipedia article is what made this fly under the radar. A quick skim of the article leaves absolutely no suspicion to the even partially trained eye (I'm not some power editor on Wikipedia, but I do edit/am active).

One of the main problems with Wikipedia is that it is actually multiple products in one: an encyclopedia, a biography site, an news archive, medical reference and so on. If the site was an encyclopedia, McMillen would just have a short page or even nothing. He probably wouldn't be notable enough. If it was a biography site, say one specific to journalists, he would have a longer page, and if it was an archive of news, he would have the kind of page seen here. The lack of defined format leads to this kind of confusion here, for both the reader and the writer. The problem for many of the site's writers is that the "encyclopedic" stuff is done. There's only so much to say about dinosaurs. When any news comes up they flock to fill in the page with minutae, with the result that pages on pop culture topics are becoming so long they are almost unreadable. See my blog post: http://newslines.org/blog/wikipedias-13-deadly-sins/

From your linked article:

> There are very few Web 2.0 features. Social features are primitive: readers cannot make an easy list of their favorites, nor can they share to Facebook or Twitter with a single click. It’s as if the past ten years of progress had never happened.

Thank god.

Wikipedia is like Web 1.0, but better. It has a sophisticated templating system, readable and consistent styling/formatting, more hyperlinking than the old web ever had, citations and fact-checking, it's freely editable by anyone, it's freely licensed, has full revision history, links never die...

Web 1.2?

Web 0.999...

>If the site was an encyclopedia, McMillen would just have a short page or even nothing. He probably wouldn't be notable enough.

I wouldn't put it like that. One of the nice things about wikipedia is that if a subject qualifies to have an article at all, it can have a nice detailed article.

I was thinking about this and there seem to be 4 main levels of noteriety: 1. no article 2. name on a list 3. article about a person 4. article about someone with linked articles about individual accomplishments

What about someone who is mentioned in one or more articles, but does not have their own. That seems like it's around 2.5 on your scale. I'm not sure how common it is, though. Might be very common (?).

(Primary known example is myself, BTW. I'm mentioned in 2 articles that I'm aware of.)

That's pretty much what I meant with 2. This guy won a short story award in 1955. No listing other than his name and his books name. No links.

I guess if you're common enough and it's known that you're the same person in more than one place, a red link is a step up from plain text.

Wikipedia is an encyclopedia. The reason it's very detailed compared to traditional encyclopedias is that it has no space limitations, and anyone can contribute.

Definition of an encyclopedia from Wikipedia: An encyclopedia or encyclopaedia is a type of reference work or compendium holding a comprehensive summary of information from either all branches of knowledge or a particular branch of knowledge.

Note the word "summary". An encyclopedia entry is supposed to be a summary of the topic. It is not a news archive.

I don't think any wikipedia articles are exhaustive to the topic they cover. Summarization is a spectrum.

The whole article is about a minor journalist whose page is bigger than some of the world's most interesting people. That clearly shows that summarization is necessary.

An example of 'business as usual'...


I wrote the first draft of this page in 2004 as a result of an interest in Victorian lunar observers (long story). This minor Victorian astronomer produced an important moon map and contributed serious work on celestial mechanics that was rendered obsolete by Ernest Brown's major improvement in lunar ephemeris computation.

The subsequent history shows a purposeful improvement in the page and adds a couple of references. I think the current version of the page is better than the one I contributed.

I'm wondering if the wikipedia process sort of mostly works but gets derailed around notability/contentious issues?

He says, about a certain editor:

>> (He has since asked that I leave his Wikipedia username out of this story.)

But later:

>> That editor... had expanded my page significantly...

Linking directly to the page, with the editor's name on it, at that time just after said editor had made those significant changes.

So yes, I guess he did keep the name out of the article, but it was only one click to find it out. Is this good? Unless the guy's reason for not wanting his name in the article was just so this article didn't show up in search results for his user name, and doesn't actually care that people connect his edits to the page with personal interaction with the subject of the page, then this seems kind of unethical to me.

For what it's worth, I'm working on an entry for a media executive who I was surprised to find didn't have an entry (while you might not recognize the person by name, I'm all but certain you'd know the organization). Rejected as "non-notable".

This despite NY Times, citations, multiple prior incumbents listed, and at least one existing "red link" reference to th individual.

I've tossed in a few more buckets of links and noted various other bits as mentioned above. We'll see how it works out....

I had several rejections for the page of the former Finance Minister in my country, because they thought I was him and was self-promoting.

After writing on the editor's page and showing him enough news headlines, I convinced him it was legitimate.

Mine finally got through. Jarl Mohn, the recently selected CEO of NPR.

But then, weren't they all (8 in 8 years).


Adding another 8 references seemed to help. They actually trimmed that down by a couple in the publishing process:


It's amusing to see that McMillen's entry is now tagged with "may not meet notability guidelines." http://en.wikipedia.org/wiki/Andrew_McMillen

I suspect that this belated effort to quiet everything down will just make for an even noisier and messier debate.

Not really. Looking at its Articles for Deletion page, it seems like most of the citations were actually junk. This is just like how hoax articles have stuck around: people are suspicious of citation-free pages, but pages with poor citations are rarely checked.

By the sounds of things, this article didn't actually pass Wikipedia's notability test - it's just that nobody put it to that test by nominating it for deletion (i.e. nobody thought it was suspicious).


We're both right, TazeTSchnitzel. I agree with you 100% about junky citations and the article's inability to meet true notability standards. But now that it's become a publicized article, various pranksters and Wikipedia critics aren't likely to let it disappear quietly.

Ah, I see your point. Yeah, some people might want to keep it, I hadn't considered that.

If the information is verifiable, why can't wikipedia just let the page exist?

Wikipedia is not about "correct" or "true", it's about "verifiable".

This leads to some distortions and a lot of frustration. It might be why WP has a problem keeping technical editors around.

Well, it's also that it's not consistently applied across the board.

In the actual definition of notability, it is made clear that the policy is purely about verifiability: something is notable when we can say more than a couple verifiable things about it. However, because the policy is named notability, the AFD (articles for deletion) threads very often contain systematic mention of how noteworthy an effect is "within the [whatever] community".

I never really followed up on it, but I wanted to see whether I could game the system in the reverse way. Coming from a physics background, in college I wrote up some drafts about a fake physics concept, including some thought into using @cornell.edu emails to create a fake arxiv.org LaTeX article or two. The idea was to combine this with links to 404'ed pages, some fake blog articles across the Internet authored by me, and some

The actual concept was going to be something like "Tenacity (science)" which was going to explain that NASA researchers had discovered some unexplained and hard-to-causally-model (but extremely predictable) force which makes orbital rocketry more energy-expensive than "the calculations" would give you. Models would be proposed requiring, say, Kerr metric general relativity solutions to Earth's gravity (hence trying to determine whether the core of the Earth was actually a spinning black hole) or strange drag forces going like v^1.5 or v^3 or something. The joke was that the actual genesis of the phrase was a 2005 Homestar Runner cartoon where a character named Strong Bad is sitting in a foil-lines cardboard box pretending that he is involved in a space program. "Got to... escape... Earth's... tenacity..." he says as if straining under a heavy weight.

>In the actual definition of notability, it is made clear that the policy is purely about verifiability: something is notable when we can say more than a couple verifiable things about it.

Someone is doing things very wrong then, because I've seen a lot of over-specialized but completely verifiable information get removed from Wikipedia. The reasoning was entirely based on noteworthiness.

Well, actually, it is about "quotable" more than "verifiable". The latter term implies that truth is part of the issue, whereas they only care about "quotations" as proof. Possible "relevant"/"trustworthy" quotations but not much more than that.

Fair enough, I edited the comment.

If we must question whether content is written by shills or even relevant then Wikipedia is a lot less valuable. If the defined standards are important they must be enforced or the platform will evolve, for better or worse.

You can see this on HN. Three of the top five links on HN right now are companies using their blogs to advertise on HN. As marketing on HN becomes important for startups the submission process stops being part of our defense for quality. Eventually we are democratically choosing our front page from submissions that bypassed our democracy, like an election in Hong Kong of presidential candidates selected by China, or a parody of democracy.

How does one know it's correct? If I do a research study and discover some new fact I think is "correct," should I be able to just put that on Wikipedia, without peer review?

One can verify the references: https://en.wikipedia.org/wiki/Andrew_McMillen#References

Pages without references get pruned pretty quickly.

In your scenario, you'd have to get published first, then put this fact in the appropriate page, then reference the published research.

Of course, some things fall through the cracks & editors of some pages are more vigilant than others.

there was a good post about this, a year or two ago, where somebody couldn't get Wikipedia to publish the truth on a topic because the inaccurate information was very widely cited. editors basically responded "we know this is false, but the true version isn't notable enough to mention."

(tried, but failed, to find the link just now.)

You may be referring to the Philip Roth incident. (There are a number of such incidents, but this is perhaps the most famous).

Essentially, the article on Roth's book The Human Stain (an excellent book by the way) cited a New York Times critic as saying that the main character may have been inspired by a particular person. Roth's agent attempted to remove this (Roth says that he was not inspired by this person). However, there was no published source that indicated that Roth felt this way, just Roth's agent's communication with Wikipedia volunteers. So Roth published an article in The New Yorker as a means of establishing a published source that could be used to refute the claim.

I think the ultimate resolution was to say something like "New York Times critic says Roth was inspired by... but Roth says..."

This is a case where both sides were partially right. It's perfectly encyclopedic to note that a major critic believes that an author was inspired by a particular person. However, there should have been a better mechanism for the author to establish his own version of the story.

But it gets into complicated areas. Should Wikipedia just take anything that an agent sends at face value? Isn't it possible that a person isn't the best source on biography (plenty of people have self-serving and inaccurate views on themselves). Wikipedia's general philosophy is to look at what reliable sources say and summarize them rather than trying to make judgements about who is right and wrong and I think the resolution of citing both sources and letting readers decide was reasonable.

The one I'm thinking of was more straight forward than that. A person's incorrect age was taken from wikipedia and published elsewhere. Those pages were then cited as reference confirming their age, completing an unbreakable circle (by wp's rules anyway).

There is a fellow writer and music journalist from Queensland by the name of Andrew McMillan. His entry is a mere 398 words. I got a bit nervous when I found it because he died in 2012: http://en.wikipedia.org/wiki/Andrew_McMillan_(writer)

Well, this whole situation just might be what makes him "notable" enough to have this wikipedia page.

He is, after all, the only australian freelance journalist whose essay on how his own wikipedia page bypassed the wikipedia notability filters spurred on a debate on his own notability. The delicious (and unique) recursive meta here is, IMO, more than enough to keep the page.

Seems like it would be bad to encourage people to do this by allowing the article to stay up.

Spite should be a valid reason for deleting articles.

> Spite should be a valid reason for deleting articles.

Should not, right?

turns out that doesn't qualify:


Reading the timeline on that is pretty sad, and reflects badly on wikipedia. Controversial article gets speedy deleted against policy and there's no recourse.


It is up, but I think instead it should go meta: There should be a new section describing how his own wikipedia article came about.

Yes, a delicious twist of irony.

A friend of mine once played an elaborate prank where I got a Wikipedia page. He made me a Faroese cricketer. It was pretty great, but eventually things spiraled out of control when I spontaneously combusted.


Sorry but that is not fun. We should have more respect towards volunteers time.

I lot of these editors are jaded because they spend so much time dealing with "hilarious bro prannks." I don't envy them.

Any way to view what the article looked like before it was deleted?

Wikipedia has a policy against this


The wayback machine just captured it once, and that was after the deletion.

For old articles probably not, but now there is: http://deletionpedia.org/en/Main_Page

Sadly I think it's disappeared into the ether. I also tried to find the Faroe Islands News Service blog that my friend put together, but it seems to be dead as well.

Here's an image of the last revision before it was deleted: https://i.imgur.com/mlip1D9.png

Ironically this sort of prank is probably worth an article on wikipedia...

I'm not sure I understand why he needed to make an article to watch something get deleted for notability "in real time". These decisions are made constantly and you can always watch the process by going to AfD.

I hate articles about Wikipedia.

I am regarded as Not Notable Enough For Wikipedia for the specific reason that I feel that no one should have to pay to read my essays and articles. Were someone to pay me to publish in dead tree form rather than on my own website, I would merit an article.

I've written mountains of highly regarded essays and articles, generally on the topics of mental illness - I have Bipolar-Type Schizoaffective Disorder - as well as software engineering.

My essay Living with Schizoaffective Disorder is on a reading list that the California State Department of Mental Health distributes to its county clinics.

"Will someone pay you to publish your work" is a much stronger filter than "do you put your work on your website".

(I write this as someone who has written a lot on my website for free and not published anything in a paid forum.)

But the issue here is that "will someone pay to publish" is true but unprovable. "has someone paid to publish" is a sloppy approximation at best.

There is no doubt whatsoever that my self-published work is notable; consider what I just stated, that Living with Schizoaffective Disorder is on a reading list that the State of California Mental Health Department distributes to its county clinics.

Despite that, the simple fact that I choose to self-publish - or rather, refuse to permit dead-tree publication - is regarded as making me non-notable.

There are lots of wikipedia articles about dead-tree writers whose work is arguably far less notable than my own.

Being published is not enough for WP notability, and someone who self-publishes can be notable enough for a Wikipedia article.


> I am regarded as Not Notable Enough For Wikipedia for the specific reason that I feel that no one should have to pay to read my essays and articles

That substantially misrepresents the reasoning given in the deletion discussion: http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion...

It wasn't that your work was self-published that made you non-notable, its that there wasn't substantial outside coverage in reliable secondary sources that would justify a biography.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact