
Wikipedia Is Nearing Completion, in a Sense - kevbin
http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurmountable-wikipedia-is-nearing-completion-in-a-sense/264111/
======
ChuckMcM
This is the other end of extrapolation. There is a lot of stuff in the world,
but its finite stuff. And information, like lots of things, has its own
inverse power law. When building a search engine folks say "Gee you don't have
nearly the hardware that Google does, how can you ever hope to compete?" and
the answer is the inverse power law. Looking at queries that are served out of
the index vs being served off the long tail it drops off dramatically.

How many companies are there in the world? 100M? Give them each a a megabyte
for a pictures and their information, that is 100 TB of data, 50 disk drives,
150 if they are triply replicated. And what are the net new businesses in a
given year? 2% annual growth rate for the worlds economies over the long
scale, that's maybe 2M new businesses a year, or 2TB of data, or 1 (or 3) new
disks a year?

A billion people on Facebook? 1/7th of the world population? What a megabyte
for each of them is PB of data? The Backblaze guys can put a petabyte of
storage in a single cabinet.

For a long time the Internet was playing 'catch-up' now it is asymptotically
approaching 'caught up.'

Different problems and different opportunities for folks now who are basing
their endeavors on the net.

~~~
tisme
Storing data and using data are two completely different things. You can store
all of FB and lots of other companies in a fairly small volume these days. But
as soon as you want to access that data to process or update it the game
changes rapidly. Suddenly that one cabinet explodes into a datacenter full of
cabinets, or even several data centers.

Storage is a solved problem for just about any amount that an ordinary company
might need. Getting that data delivered to a CPU at speeds that are still
usable in a practical sense if you want to say something about all of that
data is a completely unrelated problem which changes amount of technology and
funds required from the easy level to the extremely hard and beyond level.

~~~
ChuckMcM
_"Storing data and using data are two completely different things."_

That is so true. When people ask "How hard can it really be to write a search
engine these days?" I have been known to ask them to speculate on how they
might go about it and then point out the challenges of knowing what data you
have vs what data is asked for. Search is particularly interesting because the
more time you spend the better your answer can be, and its always challenging
to 'draw the line' between fast and relevant. But that is also what makes it
so fun :-)

------
ErrantX
As a highly active Wikipedian this article is rather wrong: Wikipedia is far
from complete. It has a lot of articles, covering a huge range of topics (and
in that sense is perhaps nearing "completion"). But those articles are almost
always far from complete.

As someone who has written multiple peer reviewed "Good Articles" and one
"A-Class" (a step below "Featured Article") - as well as numerous other fairly
well rounded articles - the act of taking material from a stub or short
overview to a complete encyclopaedic article is MASSIVE.

I wrote an article about Dudley Clarke; a British chap who was responsible for
military deception planning in North Africa during the Second World War. That
took nearly 3 months of research and writing, and cost me around £50 in text
books.

"but the bulk of the work, the actual writing and structuring of the articles,
has already been done"

No. The easy thing of slapping up and article with some information in it has
been done. The _bulk of the work_ is completing each article.

This article _does_ actually nail why Wikipedia editorship is declining: "With
the exciting work over, editors are losing interest" They just get the reason
wrong. This is why we are plagued by a massive community of vandalism
patrol/administrative types and those who treat it as a social network -
content creation is a minor facet of Wikipedia because of the massive
investment required to finish up an article.

------
kiba
Wikipedia may be getting more mature, but I think barrier to entry is also
cause for lack of outsider edits. Also, being a wikipedian isn't like being a
professor. You don't get much outsider respect.

Due to their policies, you also see "forks" or more specialist wiki such as
comixpedia, or libregamewiki(A wiki that I actually found) for subjects like
webcomics or open source video games that keeps getting deleted.

Gwern, who is a veteran editor, got so tired of the wikipolitics(Even though
he's really good at wikilawyering to protect his contribution when needed), so
he started <http://gwern.net>

He benefit from the reputation and traffic that would otherwise goes to
wikipedia.

------
DanBC
There is less for people to do, but there is still plenty of "wiki gnoming" to
do. This is exactly the kind of thing that could attract some new editors. The
minor fixing of small details should be an excellent use of crowdsourcing.

I seem to have lousy experiences at Wikipedia whenever I try to edit anything,
even if it's a minor edit.

Last time I tried:

My IP had already been blocked because it had been used by a vandal. I thought
that was a bit odd (blocking a dynamic IP range), so I made an unblock
request.

The template is confusing, and doesn't tell users to include the reason for
being blocked as well as the reason for being unblocked, so I got an error
message.

While I was fixing the mistake I made my unblock request was declined. (This
happened within a few minutes of me posting it.) Thus, I fixed the mistake I
made, saved the page, (working past the edit conflict) and find that my
request has been declined.

I read why it's declined.

I make another unblock request.

I get a friendly, polite, sympathetic reply. But it's not actually much use -
it doesn't help new users to understand WP. And the block remains.

I try to reply. Because the talk page includes links to templates WP thinks
I'm entering external links and asks me to enter capchas.

I enter the capchas and reply. I say that blocking a troll by blocking some
dynamic IPs is odd - they just log off the Internet and log back on to get a
new IP address, which may be outside the blocked range. Determined trolls (the
kind who attract IP range blocks) will find this trivial to do. Newbies, the
kind of people who are being targeted for this retention programme, may not
find this quite so easy.

What kind of person will willingly jump through these hoops? People who are
great at grammar and copy editing? Or cranks who want to put homeopathy in
every article, or who want to mention Armenian genocide in many articles?

~~~
ErrantX
Putting my admin hat on for a moment... rangeblocks are fairly rare for the
reason you cite. But when used they are deployed in situations where a vandal
has been noted to be using IP addresses from the same range.

Although you seem skeptical it often works! As these are not the smartest
cookies in the crumble.

Unfortunately, the only way to fix your problem is to create an account on
another IP address and log in (or alternately ask someone to create it for
you, a service Wikipedia can provide).

 _Because the talk page includes links to templates WP thinks I'm entering
external links and asks me to enter capchas._

That's rather odd, I've never come across that before! I thought internal
Wikipedia links were excluded from that filter. I will raise that when I find
someone who is responsible for such things :)

------
benmanns
Here is the data on a linear, rather than logarithmic scale[0]. Y axis is
number of articles and X axis is "years since 2001". I copied the data as best
I could from the chart given by the author of this article. Wikipedia has a
page itself about its own size which shows a similar linear growth rate[1].

[0]
[http://www.wolframalpha.com/input/?i=plot+%7B20%2C15000%2C10...](http://www.wolframalpha.com/input/?i=plot+%7B20%2C15000%2C100000%2C200000%2C500000%2C900000%2C1500000%2C2000000%2C2500000%2C3000000%2C3500000%2C4000000%7D)

[1] <http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia>

~~~
ygra
That's what I wondered too. It's easy to "show" a certain "saturation" when
using a log scale and that only means it doesn't grow relative to its current
size.

------
binxbolling
I think this article makes a great point, but there might be other, more
unpopular reasons fewer people are editing. For example, in my opinion, the
barrier to entry has gotten much higher. Fanatical editors revert
aggressively, red tape holds up even great changes, and the rites & mores of
the Wikipedia community become ever more convoluted & impenetrable to
outsiders.

It's a myth, albeit popular, that one can just read an article and quickly
edit a mistake (whether major or minor). Perhaps this facade is crumbling a
bit, and more people are realizing that "contributing" is not actually all
that easy. I actually love Wikipedia, so don't take this as petty sniping,
it's just what I've experienced.

~~~
jarek
I'm actually rather curious as I see this argument repeatedly and it doesn't
match my personal experience (I've corrected plenty of mistakes both with a
user account and as an IP and I don't think I've been reverted once). Do you
mind sharing what kind of articles or mistakes have you had this experience
with?

~~~
JustSomeAnon
Unfortunately I have to comment as anonymous.

My experience totally matches binxbolling. As a new contributor my entries
were either blanked by trolls (and left for dead for weeks without end) or
have been deleted by over-zealous editors whose only reference seems to be
their own opinion and the spouting of endless WP this and WP that rules. I was
appalled by the bullying and mob mentality I found in there.

~~~
jarek
Can you comment on the articles or fields you were editing in, or would that
jeopardize your anonymity?

~~~
JustSomeAnon
The field is, broadly speaking, Computer Science.

------
aprescott
It seems a little misleading to have a graph with log-y values suggesting that
growth has stopped because it's reaching some kind of peak. log(number of
articles) looking like log(t) means it might still be growing pretty linearly.

~~~
yk
Wikimedia has the numbers (some scrolling required):

<http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm>

It is recently slightly lower than linear, but by no means looking as bad as
the article suggests. In fact ( reading from the plot in the Atlantic) it took
Wikipedia five years (2001 - 2005) to reach 1M articles, two years for the
second and roughly 1 and a half years each for the next two million. From the
more detailed statistics at Wikimedia, new articles are now added at roughly
1000 a day, down from 1600 a day in 2006.

[added]I just found the linear plot of en.wikipedia articles:
<https://en.wikipedia.org/wiki/File:EnwikipediaArt.PNG>

------
britta
I found this response interesting from a person who works for Wikimedia
([http://branch.com/b/surmounting-the-insurmountable-
wikipedia...](http://branch.com/b/surmounting-the-insurmountable-wikipedia-is-
nearing-comple#was-3CT6Ooc)):

 _...at the Foundation, we tend to call this "the Gold Rush theory" - the idea
that Wikipedia's gold rush days are over._

 _My personal view is that the problem isn't that everything is done, but that
it's harder for willing new editors to find things to do. (For existing power
contributors, they have no hard time finding topics, which is why overall
article growth remains steady)._

 _The answer here is that the Foundation needs to work on ways to surface
interesting and useful stuff to write about. Wikipedia can't afford to hope
people stumble on these topics._

He works on making Wikipedia's UI better in ways that encourage useful
contributions, especially for new editors - see his blog posts at
[http://blog.wikimedia.org/2012/09/24/giving-new-
wikipedians-...](http://blog.wikimedia.org/2012/09/24/giving-new-wikipedians-
feedback-post-edit/) and [http://blog.wikimedia.org/2012/10/24/fix-this-
broken-workflo...](http://blog.wikimedia.org/2012/10/24/fix-this-broken-
workflow/) for examples.

------
AustinGibbons
I did some research on the declining editor rates for a course project.
Without recreating our entire paper, I want to make this important point.
Wikipedia is not (necessarily) nearing the completion of all knowledge, but
rather what is feasible for its editors to contribute - indeed, there is much
work to be done on "the history of sanitation and sewage in ancient Carthage"
and "topic sensitive page rank", which at least has a page, let alone "trust
sensitive page rank" which seems to be missing one. Our core piece of evidence
was observing the _same_ trends across different languages (in both # edits
and # page views). If you are interested, check out our final report here - it
has its flaws, but I believe the three pieces of evidence we explored hold
some merit in their own right.
([http://www.stanford.edu/class/cs341/reports/09-GibbonsVetran...](http://www.stanford.edu/class/cs341/reports/09-GibbonsVetranoBiancaniCS341.pdf))

~~~
robryan
The more specific a topic is the few people that could actually write an
accessible article on it. Paying specialized writers or even paying/ asking
for donations of other works they have done could fill out these areas.

------
jevinskie
"Jensen believes that there is a way out of this: "Wikipedia is now a mature
reference work with a stable organizational structure and a well-established
reputation. The problem is that it is not mature in a scholarly sense."
Wikipedia should devote more resources toward getting editors access to
higher-quality scholarship (in private databases like JSTOR), admission to
military-history conferences, and maybe even training in the field of
historiography, so that they could bring the articles up to a more polished,
professional standard."

I think that this is the most interesting part of the article. What if
Wikipedia was funded well enough to act as both a traditional encyclopedia
with full-time editors/writers and as a place where everyone can add/edit
content? We need a modern Andrew Carnegie to step up.

~~~
gojomo
They're not hurting for funding. The Wikimedia Foundation had a $28 million
budget for its current year and is projecting $46 million in revenues for
2012-2013:

[https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Q...](https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Questions_and_Answers)

Also, hiring editors/writers might crowd out the volunteer motivations,
costing the project far more in the loss of donated effort. I believe
Wikipedia's challenges are more strategic -- deciding the right things to do
and in what proportions -- than financial.

~~~
jessriedel
What makes you think $40 million is a lot? That seems like a very tiny amount
to me, especially considering that most (nearly all?) is spent on technical
requirements rather than editing.

~~~
gojomo
I believe $40 million per year is a lot because it could, just based on crude
rules-of-thumb, support a strong technical staff of well over 100 people
indefinitely. Indeed, according the 2012-2013 Plan document, current Wikimedia
Foundation headcount was about 119 at the time of writing and expected to grow
to 174 next year.

I'm not sure what you mean by 'technical requirements', but if you
specifically mean hosting costs (other than staffing) and capital expenditures
on equipment, per the plan's page 69 that is forecast to be about $5.3MM of
the total $42MM budget -- around 13%.

Staff is the major cost -- I believe it is chiefly technical, administrative,
legal, and community/chapter outreach roles. I also believe there's a not
single person with explicit 'editing' duties on their payroll, as a matter of
strategy and doctrine: Wikipedia is edited by unpaid volunteers, the
Foundation supports those volunteers. Adding any 'official' and compensated
editors would tamper with the formula/culture that's worked so far.

In one respect, crossing that threshold by adding just a single paid editor
would be 'cheap', but if that steps makes the volunteers feel less
appreciated, or start thinking of things in terms of a salary (aka:
'motivation crowding'), the loss in productive contributions could be much
larger.

Why do you think $40MM-per-year is a "very tiny amount", and how big of an
editorial staff do you think they'd need under some new model including
professionalized editors?

~~~
jessriedel
There are 4 million articles on Wikipedia and 50k active editors. 100 people,
even working full time, would hardly make a dent. This is true even ignoring
the serious impact of tampering with the all-volunteer philosophy. I'd wager
you'd need at least 10 times the number of people/money to seriously improve
the content of wikipedia by hiring editors.

~~~
gojomo
The definition of 'active editor' that gives totals in the tens of thousands
is anything more than _5_ edits a month. That could just be minutes of time,
so I doubt you can draw any sensible conclusions about sizing a paid full-time
staff from '5+ edit' volunteer counts.

FWIW, the Wikipedia article on Encyclopedia Britannica suggests that in 2007,
Britannica had a credited staff of about 60, with another dozen editorial
advisors. (While thousands of other advisors contributed over the decades,
they're not required as continuing full-time staff.)

So your suggestion that the Foundation would need a "at least 10 times the
number of people/money" -- 1000+ full-time paid editors, a $400 million/year+
budget?!? -- for serious improvements seems wildly extravagant. Is it
extrapolated from the known size/cost of any real-world professional/academic
editorial efforts?

~~~
jessriedel
I'm saying that if you added the staff of Encyclopedia Britannica to Wikipedia
you wouldn't make a dent in Wikipedia. Britannica is about 40 million words.
Wikipedia is 2 billion words, or _50 times_ larger. 2% is a blip.

~~~
gojomo
And Wikpedia got the 2 billion words with zero paid editors, and indeed a
culture that is suspicious of financial motivations, which is a big reason why
the Foundation/community doesn't seem to have any interest in paid editors.

But if they thought they could use them, it doesn't follow that they'd need a
staff of a thousand-plus. It depends on the (unstated, entirely-hypothetical)
strategy for using them. Superficial word-count-output extrapolations wouldn't
be part of such a strategy.

They've got plenty of money (provided each year's donation campaign meets its
goals). They've got plenty of words. There's just not a clear path where "if
they just had enough funding from some modern Andrew Carnegie" they could
throw paid editors at their mission and improve things. (That was the
particular suggestion that started this tangent about whether they need more
money or not.)

Other tangential evidence: while the Foundation reaches its fundraising target
each year, they often spend less than planned. See for example:

[https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Q...](https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Questions_and_Answers#Why_is_2011-12.27s_projected_spending_lower_than_plan.3F)

------
nevinera
That first chart really bugs me. You're showing us an apparent asymptote, a
totally reasonable thing to do to prove your point.. but then I notice it's a
_log plot_. That isn't showing us anything, _number of seconds_ would make the
same shape of plot.

~~~
rhplus
Yes, the log version is a little misleading, but the linear version shows that
growth is slowing down too:

<http://en.wikipedia.org/wiki/File:EnwikipediaGom.PNG>

~~~
benmanns
Note the light green line under the blue growth rate line. The Size of
Wikipedia article[0] says that this was the modeled growth rate as of 2010,
which it only held to until 2011. This shows the danger of modeling existing
data and extrapolating to the future.

[0] <http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia>

------
Jabbles
I suspect many articles are approaching a local maximum, and it will be very
difficult to do a significant reorganisation of an article in order to reach
"the global maximum".

It is possible to have two very dissimilar, balanced and detailed articles on
the same subject. Wikipedia makes an arbitrary choice and this is difficult to
undo. I can't think of any articles that I can point to and say "this was the
wrong choice", but it's certainly something to think about.

------
mattparlane
I can't help but think that presenting this graph on a log scale isn't the
best way to do it. I mean, if the number went up linearly year-over-year,
wouldn't it look like that on a log scale anyway?

~~~
rm999
Yeah the log scale hides his point. A log scale is useful when you want to
highlight the rate of growth, e.g. for something that grows exponentially. But
I'd expect Wikipedia to grow linearly, not exponentially - on a log scale it
is hard to distinguish anything that is subexponential (which includes linear
growth).

~~~
wavesounds
Totally agree with you guys and came here to post the same thing, he may make
some good points but this seems somewhat deceptive to the average American
reader.

------
chris_wot
I think the best metric I'd use to see if Wikipedia is "complete" us to look
at the major articles and see what proportion have [citation needed] on them.

Of course, I'm a little biased as that tag is the one I created so its really
my baby. But I designed it for obsolescence, so I suppose I can't complain :-)

~~~
Xcelerate
Whoa, hold on... you were the guy that invented that tag?!

~~~
chris_wot
Yup. You can find my greatest hits under ta bu shi da yu.

------
_feda_
It's always seemed weird to me that wikipedia has a problem with scaring away
potential editors. I for one love editing wikipedia and never found it in
anyway intimidating when I started editing in my teens, although I'm probably
part of a sizable edge case of humanity that enjoys the relative complexity of
the process of editing itself, markdown et al. I don't find the culture to be
off-putting either, because the culture gels nicely with my personality traits
and those of any person who would enjoy contributing to an encyclopedia.

Wikipedia has a learning curve sure, but like learning to use a shell instead
of a gui, the pain is certainly ultimately worth the gain. The barriers to
entry are nowhere near as high as those I imagine serious, academic editors
face in their careers.

~~~
mjn
As an academic I generally agree. I have mostly good experiences editing
Wikipedia, with a handful of bad ones. By and large it's much more reasonable
than some of the things I've run into in academia. My main wish on Wikipedia
is that I got more feedback of _any_ kind. Often I will write an article and
never hear from anyone about it. Maybe I pick too obscure topics, but it'd be
nice to hear some constructive feedback more often.

~~~
_feda_
It's regrettable you're not getting the feedback you should be getting, as
this is something that will really help you stick to it. On the other hand,
the very reason you aren't getting feedback is a good one: you're working on
obscure articles, which is precisely the area wikipedia needs work done on in
order to become the fully-fledged, echt encyclopedia (as I'm sure you know in
the case of proprietary equivalents professionals are hired to fulfill the
role you are fulfilling). Anyway, consider yourself a lonely pioneer ;)

------
tokenadult
I have been a Wikipedian with a registered account for a few years, and I have
reflected on these issues. The article asks, "But what if the decline in
engagement has little to do with culture or the design of the site? What if,
instead, it's that there's just less for new Wikipedians to do?"

It's the culture. There is plenty to do. Indeed, every time I read a Wikipedia
article, I find something in every article that turns on my inner editor mode.
But my log-in information is only stored on my office computer, not on my
wife's computer or my children's computer, so when I am reading along in
Wikipedia with my family and I see a mistake, I usually grumble rather than
Wikignome and fix the problem.

<http://en.wikipedia.org/wiki/Wikipedia:SOFIXIT>

I do Wikignome regularly when I'm reading Wikipedia for one reason or another
on my office computer. That gives me solely the satisfaction of fixing a
problem, and no satisfaction at all of supposing that that builds up my
reputation among other Wikipedians or that the fixes will even persist in
further updates to the pages.

Quite apart from the articles that have jump-right-out-at-you errors,
convenient to fix for anyone who knows grammar, spelling, or basic facts of
the world, there are many, many articles that are readily apparent examples of
factual mistakes from more subtle causes such as edit wars

<http://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars>

that often persist even after action of the Arbitration Committee.

[http://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/...](http://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/Case/Race_and_intelligence#Case_amendments)

A user who is knowledgeable about SOURCES and who has professional editing
experience and academic editing experience (as I have) is hardly likely to
find Wikipedia welcoming. Wikipedia's culture was set in the days when it was
the "encyclopedia anyone can edit," that is, the encyclopedia where anyone can
make something up, and then as Wikipedia grew an administrative apparatus,
administering was used more for turf battles and point-of-view pushing than it
was used for editing to encyclopedic quality or fact-checking.

I think a philanthropist with a budget the size of Wikipedia's budget over a
few years (a large amount of money, but pocket change for a billionaire) could
build a Wikipedia competitor that could do quite a lot better. Maybe no one
will ever think that that is worthwhile. For sure, many people who are well
aware of how incomplete Wikipedia still is regret the off-putting culture of
Wikipedia, and will be dissuaded from spending much volunteer time to improve
it.

AFTER EDIT: A grandchild comment noted that "One perk of competing with
Wikipedia is that you can use Wikipedia articles as a base." And indeed, if a
philanthropist's goal for a competing project were improving the quality of
online encyclopedias, licensing the competing project's articles in the same
way Wikipedia's are licensed would mean that good articles could work their
way back into Wikipedia--a very disruptive strategy indeed for improving
content.

~~~
unavoidable
I'm not sure if an alternative competitor is necessarily possible or easy.
Google tried an alternative model presumably to address issues with turf wars
and credibility with Google Knol, but that burned out pretty quickly even
though the model was worth pursuing (giving expert editors incentives to
create their own pages in an area they have specialized domain knowledge).

~~~
tdoggette
One perk of competing with Wikipedia is that you can use Wikipedia articles as
a base.

~~~
mjn
A perk but also a logistical challenge. In the small number of attempts I've
observed so far, a big problem is that if you start with a small community but
all of Wikipedia as your starting content, you immediately get overrun by
spammers, because your small community can't police editing of 2m+ articles,
and you also lack all the anti-vandalism "immune system" bot infrastructure
that Wikipedians currently run.

One way to sidestep it would be if your experiment is with a community
structure that's much more restrictive than Wikipedia's. If, for example, only
approved editors can edit articles, then you avoid most of the spam problem.
But it's not clear to me that insufficient barriers to editing are the main
problem with Wikipedia.

------
treelovinhippie
Simple: remove the "must be noteworthy" rule. I remember many years ago I was
heavily into the Diggnation podcast and a few others coming out of Revision3
and Leo Laporte's network. Anyway, somehow I did a search on Wikipedia and
found that these podcasts were not on there at all (despite having many
viewers).

So long story short, I created pages for these podcasts, filled them with
content, and a few days later they were removed by mods for not being
"noteworthy" enough. I attempted to plead with them that these podcasts were
in fact the most popular tech podcasts on the web, but the mods didn't care.

I haven't edited Wikipedia since.

~~~
britta
The notability guideline (<http://en.wikipedia.org/wiki/Wikipedia:Notability>)
is closely related to the verifiability policy
(<http://en.wikipedia.org/wiki/Wikipedia:Verifiability>) — from Wikipedia's
perspective, if a subject has not been covered by multiple reasonably reliable
secondary sources (in other words, if it isn't "notable"), we can't write a
reasonably verifiable article about that subject. Every article has to include
secondary sources as references, so that editors and readers can quickly fact-
check.

There are plenty of thoughtful discussions elsewhere about why these
notability and verifiability rules are flawed, but I wanted to point out this
connection because it shows that the notability guideline is possibly not as
arbitrary as it seems at first. I wish I could figure out a way to help
Wikipedia explain this better. Maybe "notability" is the wrong name for this
guideline.

~~~
saurik
> ...if a subject has not been covered by multiple reasonably reliable
> secondary sources (in other words, if it isn't "notable"), we can't write a
> reasonably verifiable article about that subject...

It is nearly impossible to verify an article based on secondary sources. To
take a non-Wikipedia view of this for a second, I went and pulled the
descriptions of these sources from academic institutions. Princeton's
reference desk describes a secondary source as follows. After that, a similar
description from the UCSC library.

> A secondary source interprets and analyzes primary sources. These sources
> are one or more steps removed from the event. Secondary sources may have
> pictures, quotes or graphics of primary sources in them.

> The function of these is to interpret primary sources, and so can be
> described as at least one step removed from the event or phenomenon under
> review. Secondary source materials, then, interpret, assign value to,
> conjecture upon, and draw conclusions about the events reported in primary
> sources.

(At this point I also feel it important to point out that both of these
sources believe that encyclopedias are "secondary sources". If you do a search
on Google for "tertiary source", which Wikipedia adamantly believes is a third
category that includes encyclopedias, you get only 43k hits, half of which
mention "wikipedia". The few universities that mention "tertiary source" list
encyclopedias as being in both categories: only Wikipedia seems to believe
that encyclopedias are clearly and definitively a "tertiary source".)

If you are interested in opinions and analysis, you can happily refer to a
secondary source, but if you want to verify what actually happened, you cannot
be taken seriously unless you can show a clear trail of evidence that
terminates in a primary source.

(Certainly, if it is impossible to obtain primary sources, then one can use a
secondary source, but that is something that should only be used as a last
resort: if you have access to the primary sources that a secondary source
used, you should verify them yourself.)

The result of attempting to find truth from secondary sources is that you will
forever be plagued by horrible bias, both in terms of what things you can find
to be "notable" and in terms of the validity of the opinions you can find in
them.

Worse, as encyclopedias--which today pretty much means "Wikipedia" to a very
large percentage of the public--are used as reference material for the
construction of secondary sources, everything from newspaper articles to
books, claiming secondary sources to have anything to do with "validation"
just makes your chain of evidence circular.

And, in fact, there have been multiple published large-scale examples of such
circular information ending up in Wikipedia, as temporarily un-sourced
information gets used in newspaper articles which then reinforces the
information in Wikipedia when editors attempt to find sources.

Regardless, the entire notion is kind of preposterous anyway, as the way that
most secondary sources operate they just print what they are told by the
people they interview for articles _without the citations required to verify
the ultimate source of the information_.

This means that if someone, whether it be the Roth example from a few days ago
or anyone else wanting to provide a paper-trail for information on Wikipedia,
wants to be able to get something into Wikipedia, they really just need to be
in the position to tell a reporter the information: they do not need to post
it themselves directly.

As an interesting contextual example of this, Wikipedia now actually does have
an article on Diggnation. In this article, it states quite firmly that "there
are an estimated 250,000 regular subscribers to the show" along with a sourced
citation to... the New York Times.

However, the article in the New York Times quite clearly was entirely itself
sourced by simply asking the people who worked for Diggnation and Revision3 a
bunch of questions; it even quite clearly states that that number came from
them: "Revision3 says it counts roughly 250,000 views each week".

In such a situation, the way the New York Times operates (and I know this
first hand, as there have been articles about the things that I do published
by them) is that they will happily publish whatever number you tell them, and
at best "fact check" it by calling you back to verify they heard you
correctly.

This does not make the number true; it barely even demonstrates "Diggnation
was important enough to have an article written about them", due to the "slows
news day" phenomenon. The idea that this source is somehow different from any
other source is just a fantasy, and one that, as far as I've been able to
tell, one that only Wikipedia believes.

Wikipedia, in fact, seems to take the exact opposite stance on all of this,
claiming that "Wikipedia articles should be based on reliable, published
secondary sources and, to a lesser extent, on tertiary sources", along with a
long list of rules about how primary sources (which are apparently even more
dangerous in their world-view than _tertiary sources_ ) can technically be
used, but only in highly limited and nearly useless circumstances.

<http://en.wikipedia.org/wiki/Wikipedia:No_original_research>

The result of this insanity is then numerous situations like the one that
treelovinhippie ran into. Re-telling him what the rules of Wikipedia are--when
it is quite clear that he got to experience them first-hand and was
sufficiently unimpressed as to ask for their removal--seems to be missing the
point: for this purpose, the rule really is "arbitrary", and couldn't possibly
have anything to do with "verifiability"; if anything, Wikipedia's rules on
"notability" at the same time cause numerous topics to be unable to be covered
_and_ cause the information that does get published to be based on shaky
foundations and unreliable sources.

~~~
ErrantX
_Worse, as encyclopedias--which today pretty much means "Wikipedia" to a very
large percentage of the public--are used as reference material for the
construction of secondary sources, everything from newspaper articles to
books, claiming secondary sources to have anything to do with "validation"
just makes your chain of evidence circular._

This is a phenomen Wikipedians are aware of. But what you refer to here are
often tertiary sources, and not necessarily the sort of material you'd want to
use.

 _The result of attempting to find truth from secondary sources is that you
will forever be plagued by horrible bias, both in terms of what things you can
find to be "notable" and in terms of the validity of the opinions you can find
in them._

You're arguing to swap that bias for your own? i.e. you look at primary source
material and decide which items are notable for their own article. What about
if another editor disagrees?

In academia secondary sources are an established way of processing primary
information; an expert reviews the primary material and, citing it, draws
conclusions. Other experts may do the same, and disagree with the first one.

Wikipedia is a tertiary source, which draws on secondary and, yes, primary
material to summarise the current status of a topic.

 _However, the article in the New York Times quite clearly was entirely itself
sourced by simply asking the people who worked for Diggnation and Revision3 a
bunch of questions; it even quite clearly states that that number came from
them: "Revision3 says it counts roughly 250,000 views each week"._

 _In such a situation, the way the New York Times operates (and I know this
first hand, as there have been articles about the things that I do published
by them) is that they will happily publish whatever number you tell them, and
at best "fact check" it by calling you back to verify they heard you
correctly._

This _is_ a problem, and one with no easy solution. The approach of requiring
sources to have their own review process (i.e. media has editors who
theoretically vet content) should fix this, but as you point out, in practice
it is not 100% reliable.

~~~
saurik
> You're arguing to swap that bias for your own? i.e. you look at primary
> source material and decide which items are notable for their own article.
> What about if another editor disagrees?

treelovinhippie's stated opinion was "remove the "must be noteworthy" rule"
(from which I am mentally rewriting his statement to "notability"). I
personally agree with that, as I do not believe that the notability
requirement is leading to increased article trustworthiness.

I would rather see the idea of "notability" replaced with an "article score"
based upon the history: there have been a few glorious visualizers for
Wikipedia designed to really make this information hit center, and I think any
of them would be a better global solution than "notability".

> In academia secondary sources are an established way of processing primary
> information; an expert reviews the primary material and, citing it, draws
> conclusions. Other experts may do the same, and disagree with the first one.

I imagine this is determined heavily by the field? I can't imagine in either
mathematics or algorithmic computer science there being any need to wait until
a review paper is published "drawing conclusions" (whatever that would mean)
from the information in the primary research.

However, I honestly feel like you are dragging me down an academic rathole in
these very specific cases, such as medicine: the kind of material that we are
explicitly discussing here, Diggnation, is not going to be covered in anything
remotely approaching a scholarly source.

Instead, what we are comparing are situations like 1) referencing a New York
Times article stating that "person Y said X" and 2) directly linking to the
blog or Twitter feed of person Y and demonstrating first-hand that they said
"X": in these situations, removing a layer of required trust.

In my experience discussing situations like this with Wikipedia editors, in
addition to my direct attempts to read through the rules that Wikipedia
publishes for how their website should be used, it is quite clear that
Wikipedia would prefer you to cite the NYT article rather than hotlink the
information.

> This is a problem, and one with no easy solution. The approach of requiring
> sources to have their own review process (i.e. media has editors who
> theoretically vet content) should fix this, but as you point out, in
> practice it is not 100% reliable.

So, Wikipedia specifically states that "mainstream newspapers" are an example
of "the most reliable sources". Can you provide me a reference to where
Wikipedia is claiming that you can only depend on sources that "have their own
review process"? (Or, was that just a potential idea for a fix?)

Personally, in this situation, I would much prefer a reality where Wikipedia
simply admits that secondary sources (in this field; again: not medicine, and
possibly not even history) are nothing more than opinions (in the general
case, this is how Wikipedia defines secondary sources: opinions and
interpretation).

If they did this, then they couldn't cite a New York Times article for a
sentence that simply stated a fact: they could only cite a New York Times
article and state "the New York Times states", for which the New York Times
should be a perfectly valid (and I will argue "authoritative") reference.

Yes: in this case that would probably make it impossible to just state
"Diggnation has 250,000 subscribers" without some kind of in-article
qualification; however, I don't think that that is inherently a bad thing: I
feel like if I were writing a scholarly article on this subject, I wouldn't be
able to say anything definitive either.

~~~
ErrantX
_I personally agree with that, as I do not believe that the notability
requirement is leading to increased article trustworthiness._

I struggle to follow this because Wikipedia doesn't consider "notability" to
relate to article trustworthiness. It's instead intended to act as a soft
margin for the topics that deserve a standalone article (remember,
noteworthiness is _only_ related to whether an article should exist or not).

 _I would rather see the idea of "notability" replaced with an "article score"
based upon the history_

Interesting idea, and I'd like to hear more of this idea! However, from this I
suspect you are mixing up the reliability of an article and the specific
phrase "notability" (about how deserving a topic is to have an article). When
deciding if an article is notable it will usually have little or no history!

 _I imagine this is determined heavily by the field?_

Yes, a point Wikipedia admits.

 _I can't imagine in either mathematics or algorithmic computer science there
being any need to wait until a review paper is published "drawing conclusions"
(whatever that would mean) from the information in the primary research._

Then you would, I am afraid, be wrong :) In such a field many papers are
published contending a theory, and then other experts review the contention
and submit responses/reviews/criticism.

 _the kind of material that we are explicitly discussing here, Diggnation, is
not going to be covered in anything remotely approaching a scholarly source._

Yes, which is the key to the problem.

 _Can you provide me a reference to where Wikipedia is claiming that you can
only depend on sources that "have their own review process"?_

Yes, the core sourcing policy requires a "reliable source"
(<http://en.wikipedia.org/wiki/Wikipedia:RS>). "Articles should be based on
reliable, third-party, published sources with a reputation for fact-checking
and accuracy".
[http://en.wikipedia.org/wiki/Wikipedia:SOURCES#Reliable_sour...](http://en.wikipedia.org/wiki/Wikipedia:SOURCES#Reliable_sources)
goes into more detail.

 _Instead, what we are comparing are situations like 1) referencing a New York
Times article stating that "person Y said X" and 2) directly linking to the
blog or Twitter feed of person Y and demonstrating first-hand that they said
"X": in these situations, removing a layer of required trust._

Is it? Now I agree that if the NYT has simply asked Diggnation for those
figures then Wikipedia should note that is where it comes from. However citing
it to NYT is intended to demonstrate that someone other than the editor
inserting the material trusts its veracity. Its difficult because we simply do
not know where NYT got that info: if Diggnation showed them some real figures
(say, a screenshot) then the NYT article _is_ better than a tweet!

Which is why this is not a simple problem.

~~~
saurik
> I personally agree with that, as I do not believe that the notability
> requirement is leading to increased article trustworthiness.

> Interesting idea, and I'd like to hear more of this idea! However, from this
> I suspect you are mixing up the reliability of an article and the specific
> phrase "notability" (about how deserving a topic is to have an article).
> When deciding if an article is notable it will usually have little or no
> history!

The reason I am "mixing this up" is that the only argument I have ever heard
for why Wikipedia needs a notability policy at all is the one that was cited
by britta earlier in this thread: that without such a clause there would be
tons of articles that are hardly ever looked at by anyone, hardly ever edited
by anyone, and containing information that is difficult to verify. I believe
that there are numerous better solutions to this than attempting to use the
"notability" filter, as I maintain that "notability" does not actually lead to
"veracity".

To be very clear about this, I will repeat the context from britta that
started my involvement in this discussion:

>> The notability guideline
(<http://en.wikipedia.org/wiki/Wikipedia:Notability>) is closely related to
the verifiability policy
(<http://en.wikipedia.org/wiki/Wikipedia:Verifiability>) — from Wikipedia's
perspective, if a subject has not been covered by multiple reasonably reliable
secondary sources (in other words, if it isn't "notable"), we can't write a
reasonably verifiable article about that subject. Every article has to include
secondary sources as references, so that editors and readers can quickly fact-
check.

Going back to your response:

> Its difficult because we simply do not know where NYT got that info: if
> Diggnation showed them some real figures (say, a screenshot) then the NYT
> article is better than a tweet!

I must apologize here, as I was intending to be using a different example, but
you took me to mean the Diggnation example: this is my fault, I should have
been more clear.

What was coming to mind with relation to the "Twitter post" example, is that
there is a ton of journalism on topics I directly care about that is based on
the Twitter feeds of people I work with: a lot of "person X said Y", which is
then translated into some article on what is or is not possible with a tool
that person X builds. The fact that a reporter read that statement and
repeated it doesn't make it more true, and in fact the opposite is quite
common: they are paying sufficiently little attention and have sufficiently
little background knowledge that they repeat it wrong.

Please understand: I am not talking about a situation of "interpretation" or
"research", but more like establishing dates on when things happened... I
understand that this feels pretty blurry (but it seems equally silly to go
into a detailed example of something where my example is itself biased; I am
happier sticking with the examples such as Diggnation and RSA); it is simply a
situation where "that dude at Wired that finds this stuff gets him readers" is
not somehow more trustworthy than the place he got the information from...
either the information shouldn't be published at all (I believe this is
probably a quite reasonable course of action), or "that dude at Wired" should
be skipped and the original source should be used.

That said, part of your comment doesn't rely on that misunderstanding I
caused: it is true that we don't know where the NYT got that information,
however that uncertainty really doesn't make it any more true; while it means
there is some possibility that the information was gathered in a way that we
should indirectly trust, as we can't see it we can't verify it, and it is
honestly not in any way better than if some random person on Wikipedia just
asserted it to us... reporters for major publications (such as the New York
Times and Washington Post, both of which I have first-party experience with
due to articles written about my work) really do trust that you are a reliable
source on things that you control, and really do attempt to fact check by
calling you back for verification.

~~~
ErrantX
Ok, fair enough. Britta touches on some of the issues, but not in its
entirety. Notability is about a) requiring there be at least one reliable
third party source (so that the article _has a chance_ of containing
verifiable information) and b) ensuring that there is some limit on the scope
of Wikipedia. It is this latter one that is the key facet.

Whilst Notability _is_ closely related to _Verifiability_ it is not quite in
the way britta cast it, but rather related to requiring the material used to
define a subject notable (i.e. a significant claim to importance) is
verifiable. i.e. the relationship works the opposite way.

 _I was intending to be using a different example, but you took me to mean the
Diggnation example: this is my fault, I should have been more clear_

Ah, my apologies, I'm reading quickly as it is a busy day.

Please don't get me wrong; the issue you highlight is a major problem, one I
have raised a few times internally with the community. But there has been no
easy resolution.

It's worth noting that the reliable sources policy explicitly notes that
reliability hinges on not only the publisher but also the content and the
author. If an author is seen to lack the qualifications, or has a bad
reputation, these factor into consideration.

With that said; a lot of Wikipedians _don't know this_. A problem I run into
constantly when discussing sources ("Well, it was published by the NYT's so it
doesn't matter what their reputation is"). It's not the policy at fault there,
but the lack of interest of our own community in the rules...

 _"that dude at Wired that finds this stuff gets him readers" is not somehow
more trustworthy than the place he got the information from..._

The _intent_ of the policies (and bear in mind what I say above as to how much
that holds up..) is that the secondary source is used to filter what in the
primary material is considered important to the wider community. To take an
example: when Microsoft released Windows 8 there was quite an extensive list
of new features. Simply recording that isn't what Wikipedia aims to do,
instead you would use secondary sources to highlight the new features that
were considered by "experts" to be important, groundbreaking or otherwise
worth a comment (of course, the full feature list would be linked to as well).

I'm not arguing this policy is perfect, nor that it doesn't break down in the
scenario you cite, but it does have a solid basis.

One other policy is that Wikipedia does not have firm rules (for this very
reason) and so you could say that making a convincing argument such as you
have should keep the material out. In principle this works, in practice it
doesn't but only because of the community dynamics (a whole _other_ problem!).

 _while it means there is some possibility that the information was gathered
in a way that we should indirectly trust, as we can't see it we can't verify
it, and it is honestly not in any way better than if some random person on
Wikipedia just asserted it to us..._

To an extent it does. Because the reporter who you cite has his/her real name
attached to the article and has a public reputation to uphold.

~~~
saurik
> ...ensuring that there is some limit on the scope of Wikipedia. It is this
> latter one that is the key facet.

 _Right_... but in a world where I can store the entirety of Wikipedia on my
mobile phone, you have to trabsitively ask why there is a need a "limit on the
scope of Wikipedia". I see no a priori reason why Wikipedia needs or even
should tolerate such limits, so one must examine te arguments used to defend
that policy.

So far, the only reasonable arguments I have heard (as in, discounting
technology problems that never existed: you can easily scale Wikipedia to have
a bunch of mostly-ignored articles) come down to "verifiability" through the
argument path I elaborated (and which britta seeded), and that is precisely
the path used by people defending "deletionism" on behalf of Wikipedia
editors.

~~~
ErrantX
There are a number of good arguments.

Where does the scope of Wikipedia end? Should there be an article about
"saurik"?

How do you actively police articles for e.g. defamation (note, we _already_
struggle to handle this problem and it is getting worse)?

How do you stop spam?

I like to come at this argument from the opposite direction: what need is
there for Wikipedia to give an article to every single trivial thing. Is what
the president had for breakfast in 2011 sufficiently interesting to the
reader?

Wikipedia is not a dump of knowledge, it is supposed to be a curated summary
of the sum of human knowledge. And as with an article where you make editorial
decisions about the level of detail to go into, so the entire Wiki is scoped
to a reasonable level of detail.

------
gbog
In the first comment on the linked page, a guy says rightly that the number of
missing articles is still enormous, but in parts of knowledge that are further
from that of the median Wikipedian.

In a comment here someone says the foundation has the goal to find and purpose
interesting things to write on Wikipedia to new editors.

The sum of these the interesting problem of crossing culture barriers. How to
get the regular geek interested in the many kings and warlords of China's
Warring States?

------
davidw
Am I the only one who has memories of 'Raid on Bungeling Bay' triggered by the
words 'nearing completion'?

<http://en.wikipedia.org/wiki/Raid_on_Bungeling_Bay>

"Battleship nearing completion!" or something to that effect.

------
pixelcort
One big opportunity is the differences in content between different language
editions of Wikipedia. In many cases the content for an article varies widely
between languages, and I still run across articles that are not yet available
in English yet.

------
anuraj
Very myopic view restricted only to English language. There are more than a
100 major languages in the world. So now Wikipedia has to concentrate on other
languages.

------
pebb
Yup, from now on all edits done by outsiders shall be reverted with extreme
prejudice.

