Hacker News new | past | comments | ask | show | jobs | submit login
The two-sides of Wikipedia (currybet.net)
31 points by jancona on June 8, 2010 | hide | past | favorite | 29 comments



This guy is upset that Wikipedia deleted an article about a Swedish indie-rock band that appears to have been covered in depth by one alt-weekly in Vancouver and nobody else --- an article, incidentally, that was posted and deleted 3 times before that alt-weekly one-pager was published.

I don't know how we benefit from rehashing this over and over again, and since this is the #1 issue people appear to have with Wikipedia (and the issue most directly and frequently addressed by Wikipedia's voluminous site guidelines), I think it's fair to point that out. But, once again, with feeling:

Wikipedia is not an effort to organize all the world's information. That's Google. Wikipedia is an effort to build the world's best encyclopedia. The difference between an "encyclopedia" and "all the world's information" is that the information in an encylopedia needs to be reliable. To ensure that the information in the encylopedia has a chance at being reliable, the encyclopedia is constrained to information that can be written about notable topics cover in reliable secondary sources.

Virtually everybody who writes an article about a nonnotable topic ends up objecting, often loudly, when the article is deleted. That's understandable. Wikipedia could do a better job of warning people of the bar their topic needs to clear. But they can't make resurrection of deleted articles trivial to anybody, or they will spend all their time re-litigating deletions.

This is not helped by the fact that WP articles gain extremely favorable search engine position almost by default.

People seem to have a really hard time with the idea that Wikipedia imposes restrictions in order to make the lives of editors and maintainers easier. But --- and I say this as a WP-skeptic --- the community effort that built WP is monumental and unprecedented. They get the benefit of the doubt.

Regardless, the likelihood that the particular "speedy deletion" policy this article complains about will ever be resolved is epsilon. Speedy deletion, particularly of no-name bands, vanity books, websites, and tiny companies is almost the first line of defense against article-creep. Changing the policy would be an existential change to the way WP is managed.

Which doesn't matter, because you can resurrect speedy'd articles already; you just need to take the article to Deletion Review and make a case for it. Maybe WP needs an article on First Aid Kit. I like Fleet Foxes, too! (WP has excellent coverage of Fleet Foxes). But WP is run by human beings donating their time, and people make mistakes, and it is utterly disingenuous to pretend like First Aid Kit is an obvious "keep".


While I agree with you 100% about the tired rehashing of notability guidelines every time someone has their pet article deleted, I do want to nitpick one issue.

You write: "Wikipedia is not an effort to organize all the world's information. That's Google. Wikipedia is an effort to build the world's best encyclopedia."

Unfortunately, some of Wikipedia's own marketing has fostered the incorrect impression that they are open to any and all possible article topics. For example, in a press release celebrating their millionth article they reproduce one of Jimbo's quotes:

Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing. -- Jimmy Wales

Jimbo should know better. The guideline "What Wikipedia Is Not" is actually quite detailed and informative, but people rarely read that or the notability guideline before posting long rants decrying Wikipedia as run by a cabal of retrograde deletionists.

http://en.wikipedia.org/wiki/Wikipedia:NOT

http://en.wikipedia.org/wiki/Wikipedia:Notability


The problem I always had (as a WP editor with around 800 main space edits) was that people the vast majority of editors applied those policies when it suited them.

I recall several occasions where edit wars erupted because I cut a 150 word, non-notable, unsourced piece from, say, a Star Wars article.


Whoa! Are you saying Wikipedia is a project involving people!?


Virtually everybody who writes an article about a nonnotable topic ends up objecting, often loudly, when the article is deleted.

For further comments on this issue, I'd like anyone who has posted an article that has later been deleted to tell us all about the sources that were posted with that article.

http://en.wikipedia.org/wiki/Wikipedia:Verifiability

I wonder what sources people are finding before they post new articles.


Wikipedia does not even aim to be the best encyclopedia, just an encyclopedia.

I've not read the actual article, as it seems that the host can't cope with the traffic, but if it was speedily deleted, it probably deserved it - most speedy deletions are genuine and helpful. However, as an experienced wikipedia editor, I have to say that you are wrong.

A few years ago, I built up over 500 mainspace edits and received numerous barnstars. I worked to correct minor errors, source references for articles, rewrite articles to flow better, and even to reduce tensions amongst editors and promote the editing of wikipedia by experts. But in the end, the corrosive and bitter nature of the community drove me away, much as it drives away a lot of people who are both knowledgable in their field and able to write about it.

"People seem to have a really hard time with the idea that Wikipedia imposes restrictions in order to make the lives of editors and maintainers easier."

This idea is trivial to disprove. Firstly, the majority of articles do not need constant attention, and the number of articles is not directly proportional to the level of vandalism. The majority of editors focus on a few articles, and certain high-interest topics get the most attention. Very few editors actually go and work on random articles, so stating that there is any sort of diversion of energy by the presense of minority interest articles is false. If anything, removing minority interest kicks away the editors that supported those articles, editors who might otherwise have contributed to vandalism patrolling and other more widely helpful activities. Article-creep is an entirely imagined threat that simply does not exist.

In fact, a lot of articles nominated for deletion for not being notable have been fairly stable, unvandalised, verifiable, and even maintaned by their creators and related community. No one had an issue with the presense of these articles beyond not being notable enough - as if a purely electronic encyclopedia had some pressing page limit about to be reached.

You mention secondary sources, yet wikipedia editors as a whole can't understand how to use primary, secondary and tirtary sources correctly. It is a widely held (and enforced) belief that, when wanting to state that something was said in a book, only a secondary source can be cited. So they would prefer a newspaper's review to the actual book itself, for example. This is, for most cases, simply absurd.

Let's consider the deletion process. Officially, Articles For Deletion is about concensus. In practice, it's typically just a vote by the in-croud, most of which never explain their opinion beyond a brief comment. It doesn't matter how much you source an article during this process - it will normally get culled anyhow if they have taken a dislike. Indeed, there is a distinct dislike of trying to improve articles - instead you should just nominate them for deletion (never mind the fact that a quick google search can often give useful references to shore up an article)!

I could go on, but I'd be here all night.


As an administrator in my sixth year now, I thought I'd weigh in on one of the issues you brought up.

It is a widely held (and enforced) belief that, when wanting to state that something was said in a book, only a secondary source can be cited. So they would prefer a newspaper's review to the actual book itself, for example. This is, for most cases, simply absurd.

I think you're right, and I think that type of behavior is a misinterpretation. There's obviously no need to cite a secondary source about what a book literally says. In fact, in this case, rather than citing the primary source and paraphrasing, the best thing would be to directly quote the book.

The problem arises when "stat[ing] that something was said in a book" involves an implicit judgment, synthesis, interpretation, or analysis of the book. In this case, although you are trying your best to neutrally paraphrase a book's content, you are in some way introducing your own judgment into the process.

In most cases if the editor is acting in good faith then there is no big problem here. But on controversial subjects or articles working for featured status, it is better to avoid the possibility of "original research" (for those unfamiliar with this Wikipedia policy: http://en.wikipedia.org/wiki/Wikipedia:OR). Putting an editor's judgment about a book is original research, but putting a verifiable and attributed judgment from a secondary source isn't—secondary sources represent the 'state of the debate' about an article's topic. It's the job of tertiary sources (i.e. encyclopedias) to represent the state of the debate, not to write original assessments of the topic.

I agree with you that too strict an enforcement of this rule ends in an absurd, frustrating, and dispiriting situation for all involved. But on controversial articles there is often a good reason to insist on verifiable secondary sources.


Obviously, I agree with you on this.

As I said in reply to another comment, I believe that the reason for this absurdity is probably because it's the lowest common denominator solution.

I can't remember if we crossed paths when I was active: http://en.wikipedia.org/wiki/User:LinaMishima


Is a Wikipedia user allowed, in an article, to quote any arbitrary text that exists in some permanent location (with an attribution and datestamp), as long as it's not Wikipedia? Even if they themselves wrote it? Couldn't there just be developed a bookmarklet for Wikipedia edits that takes a text selection, posts it to the user's personal blog, and then turns the selection into a quote from the blog? What would be the difference?


The short answer is yes. In general, any editor is allowed to quote/cite relevant text, even if that text comes from a blog (regardless of the who wrote the blog).

The long answer is that if the article's content is disputed such a scheme will probably not last very long under scrutiny. The personal blog of Paul Krugman or Bruce Schneier would obviously carry more weight than text from John Doe's blog created last week. Further, the published books or journal articles of scholars would obviously take precedence over blog posts.

So, while citing things is always good, you can't really count on citing Encyclopedia Dramatica entries to keep your additions from being deleted/reverted/edited/removed. The authority of the citation matters when your edit is disputed—simply slapping a footnote+link on your text isn't a magical shield.


> The authority of the citation matters when your edit is disputed

It'd be nice if there were some empirical rule for determining link authority—it would probably stop a lot of pointless debating if there were some evidence on the table either way. For example, what if every Internet-attribution only "proved" notability to the weight of its calculated PageRank at the time of inclusion? Then you could automatically discard attributions that user X gave for a link to user X's blog, but if Schneier decided to pull some Wiki content onto his blog and quote it, that would be notable, both under PageRank and the spirit of the rule.


Couldn't there just be developed a bookmarklet for Wikipedia edits that takes a text selection, posts it to the user's personal blog, and then turns the selection into a quote from the blog? What would be the difference?

Blogs wouldn't be the sources I would want to see more of in Wikipedia. Dead trees, because of their expense, still prompt more fact-checking and editing.

I'm an editor of an article that is before the Wikipedia Arbitration Committee just now. (I happened to wander by, as a newbie Wikipedian, just as the article went from failed mediation to arbitration.) There is enough making stuff up already on Wikipedia. The last thing it needs is more ease in using blog posts (generally very unreliable) as sources.


With the amusing result that Wikipedia will cite newspaper articles that cite information from blogs, rather than directly citing the blog.

(Happened to me and my blog. And as a data point, out of some hundreds of newpaper etc articles about a blog post, the total number that seemed to involve any kind of value-added fact checking was 2.)


I have seen cases where news articles are written using the wikipedia article but not citing it and then the information in the original wikipedia article is citing using the news article that got the information from wikipedia in the first place.


I didn't mean to suggest that it would be a good thing :) That's the point—it's a parasitic corner-case of the rules, and should probably be patched up somehow.


I don't think you and I have that much to argue about. I found the community corrosive, competitive, political, hidebound, and litigious as well. I know it sounds like I'm sticking up for WP. I'm just addressing a misperception people clearly have about it.

That said, I disagree with your view of the how the encyclopedia works.

I, too, spent time a couple years ago accumulating mainspace edits, rewriting articles, and spending time on mop-and-broom tasks. Here's a link: http://j.mp/d0Ozh9. I left for similar reasons as you (my experiment with WP ended and I decided it was a failure for me.)

Article-creep is not an entirely imagined threat, as anyone who spends any time in AfD (where articles are nominated for "non"-speedy deletion) will attest. Articles are created for no-name bands with specious claims about regional influence. Articles are created for books that have not been written. Articles are created in constellations to document people's private role playing games. Articles are created to defame other people. Articles are created to copy company press releases onto Wikipedia.

All these things happen every day, with tens of articles going through through the slow-path of AfD daily, each of which has to be scrutinized by 3-10 editors and adjudicated by an admin before it can be scrubbed off the site.

Indeed, it's intuitively obvious to me why there must be article creep: articles in Wikipedia almost invariably appear at the top of Google SERPs, which is something that people pay significant amounts of money to achieve without Wikipedia. Also, people cite Wikipedia coverage in press releases and bios as evidence of notability. There's tremendous incentive for abuse.

The harm this causes to the encyclopedia is straightforward. A subject that has not been covered in a reliable secondary source can't possibly be verifiable without relying on hearsay from editors. There's no mechanism in WP to distinguish between articles with "real" sourcing and articles that can be sourced only to some company's press release.


I see what you're saying, and broadly, I agree with you.

I think we have different definitions of article creep. The things you talk about are real problems. The problem for me is that when I have encountered the term "article creep", it has mostly been with respect to articles about episodes from highly-notable television series. Really, we have two different types of article creep:

Firstly, we have the pure vanity cruft, new articles created without any references which simply are impossible to verify, or have been created by sources too close to the origin. These genuinely do polute wikipedia, since they typically do not attract new editors, and rarely get linked to (both internal and external to wikipedia).

Secondly, we have genuine quality writing that is well-researched, clearly verifiable, useful to a wide audience, and can be well-linked. This second category would attract editors, maintains itself, and is generally useful. Yet, for various reasons these can get accused of ruining wikipedia.

Part of the issue is that most wikipedia editors have been untrained, and seeminly have not even had and understood something akin to GCSE history. A significant number joined to promote their point of view, rather than to develop a quality resource. The easiest solution to the problem, given this, is as you support. The best, however, is a different matter.


Shouldn't I be able to counter this argument by asking you to cite a topic that Wikipedia covers poorly as a result of these policies?


Practically any notable serial format fictional work, for a start. The subject as a whole gets reasonable coverage, but the coverage of the material within, arguably the source of the notability, has typically been removed wholesale from wikipedia. The inclusion of this material could be said to be a matter of taste, however I have yet to see anyone counter it's inclusion with arguments which cannot be disproven.

A large number of topics about anything not angloamerican. I've seen before now the members of the national sports teams for middle eastern countries put up for deletion as "not notable".

Anything, absolutely anything, that is best verified via physical books and records. These inevitably end up declared as not verified, or non notable (since the internet doesn't know much about it).

Given it's very late over here, I'm not going to go and look up the rest of the list. But as a counter counter-argument, can you cite a topic that wikipedia's coverage is significantly enhanced as a result of these policies?


Yes. Companies. There are hundreds and hundreds of tiny technology companies created every year, and Wikipedia only covers the ones that have managed to get written about somewhere else. Wikipedia's business coverage would be a "who's who" mess of press releases without these policies.

Another one: industry jargon. When I was on AfD patrol, I was constantly having to knock back random three letter acronyms people had invented to promote or position their companies. If you find industry jargon in WP, there is at least a decent chance that three people in the world have spontaneously used the term before.

Another one: open source software. Obviously it would be bad if Wikipedia was Freshmeat, but I'll give you an even more concrete example: I helped maintain a couple pages that summarized a bunch of different software solutions (such as "Comparison of DNS Servers"). Those pages were dumping grounds for everyone's 0.01-alpha releases, each of which usually came with added columns on the page's table that was a "yes" for that NN piece of code and a "no" for everything else.

Note that in the latter two cases, NN articles are creating worse problems than simply littering the WP and abusing it for company promotion; it's also tainting WP's coverage of topics, confusing lay readers.

It is definitely a WP problem that "tracking down reliable sources" devolves to "search NEWS.GOOGLE.COM". On the other hand, the authors of pages can cite books in AfDs, and after doing almost 100 AfDs, I never saw such a citation get knocked down.


Wow, article constellations for private RPGs.. Makes me wonder to what extent this happens because of the widespread misconception that there is a singleton "the Wiki" (that one at the top of "the Google"). Do people add this stuff to Wikipedia because they don't understand they can have their own wikis?


The put it there because they want people to read it, and nobody is going to read it if they put it on their private wiki.


Ahead of vandalism this is the biggest problem Wikipedia faces. It’s not at all easy to decide, given limited resources, what is worthy of inclusion. Consequently that selection process turned into this huge monster in the German Wikipedia, with meters of rules and guidelines [1]. The battles fought to keep or delete an article are epic and entertaining but also the single biggest conflict in the community.

It’s certainly not easy to decide what can be maintained and what cannot. The English Wikipedia is quite a bit more inclusive than the German one and when I want to find something obscure I tend to search the English one first. My perception is also that in general, the German Wikipedia has higher quality or more complete articles, but that’s something I could just be imagining.

I certainly don’t know who picked the right compromise or if anyone did.

[1] http://de.wikipedia.org/wiki/Wikipedia%3ARK


WP doesn't (officially) want to decide what's "worthy" and "unworthy" of inclusion. Instead, it has rules about what's feasible to include, if the encyclopedia is to be reliable. It is infeasible to include subjects for which there are no reliable secondary sources that unfamiliar reader could use to verify the content in the encyclopedia.

This is why it's OK for WP to have extensive coverage of Pokemon characters (extensive secondary sourcing), and not OK for WP to have coverage about an up-and-coming local band.


That’s, as far as I know, not how it works in the German Wikipedia. No extensive coverage of Pokemon there. They very much are willing to decide what’s worthy and what’s unworthy, based on the amount of work that can be done. I have no problem with that.


The Lady Gaga article was too deleted one time (maybe more) in the beginning (in year 2008). If that's saying something.

( http://deletionpedia.dbatley.com/w/index.php?title=Lady_gaga... )


!#@!@^% deletionists are ruining Wikipedia. They’ll be the first against the wall when the revolution comes.


The second to the wall will be people who think Wikipedia artices are worth killing for.


Don't panic, the revolution will be peaceful and well-read.

Perhaps we'll just scare the deletionists off to their own planet with (WP:V!) stories of a giant mutant space goat.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: