Hacker News new | past | comments | ask | show | jobs | submit login
Craig Newmark donates $500k to reduce harassment on Wikipedia (wikimedia.org)
152 points by The_ed17 on Jan 26, 2017 | hide | past | web | favorite | 143 comments

I started editing Wikipedia in 2003. It was fun. Over the years it became less fun, I gave up on participating and quite a few articles I had started ended up being deleted.

I also often found myself looking for articles I knew had been on Wikipedia, but had been deleted.

In December 2013 I had enough of the deletionism. I spent a few hours to set up http://deletionpedia.org/ - to rescue articles from deletion.

It doesn't deal with harassment, but it's a useful resource if you want to find back something that was deleted on Wikipedia.

(The site had been set up before, but the original creator let it slip.)

I started updating pages related to my research topic - thinking it would be a public service. The pages were egregiously out of date. Every attempt to edit was reverted by a resident page troll. Total waste of my time and the pages remain totally wrong.

(yes i cited research and no the topic is not a controversial political one.)

I stopped wasting my time for a similar reason. I added some information to a page and it was reverted because I didn't properly source my change. OK I re-created my change and added with a properly sourced reference and someone who didn't like it reverted my change, citing the guidelines on BLP. I made the change again and in the note, I quoted the BLP guideline, explaining that my change was consistent with the rules. My change was reverted again and I received an email warning about a possible suspension of my account for defacing pages...

With that, I was like "F you people". I don't think I have added any information since then.

Did you get on the IRC channel? I edit wikipedia and normally the only way to figure out why something should have been reset is to ask.

Personally, I've edited plenty of Wikipedia pages and I still occasionally do when I come across one I think I can help with, and I only have an issue around 20% of the time, and usually it's because I'm filling out a topic that's too obscure.

Ditto on the official trolls, so I've quit too. One troll had a degree, at least, but he wasn't reading current research. Participation continues to decline. If your interest is very narrow and you are IRC-wise and persistent, I don't doubt that you can eventually intimidate the trolls in that narrow topic and not have a problem thereafter. But this isn't my case or the common case, I suspect.

In hindsight, this is deliciously ironic advice considering the noteworthy cesspool of Dreg's creation. In 2012 I ventured into the Craigslist chatroom to determine why a posted item I had listed was 'ghosted'. I believe the answer was too many numerals in my description(RV: model year, mileage, doors, beds, etc.). After my attempt to satisfy the ambiguous criterion to no effect I returned and cited numerous examples other than mine that exceeded the reason given but were not ghosted. The result was every ad I had listed was then ghosted and every relist and new creation was ghosted by default. I ended up creating a new disposable, copy+pasted new listings verbatim without incident and sold most items. Listing a recently deceased family member's possessions was a chore, reaching out to Dregslist chat turned it into a nightmare.

Why isn't Dreg using these funds to cleanup harrasment, data farmers and scammers on 'his' site?

edit: on-topic, I too had edits on WP overturned in the mid-00's. I posted my defense in page notes to no avail and desisted any further attempts.

Why should one have to work out how to use irc to help out in the public interest?

What are some examples of articles you've written that you feel definitely shouldn't have been deleted?

A friend was a Commissioner of Public Works and prolific writer who authored or expanded stub articles about infrastructure (reservoirs, dams, etc).

He attracted the ire of some wackjob when he referenced printed materials. Given his somewhat unique position, he had some stuff digitized and posted, and returned to Wikipedia a few months later to find that almost everything he did was reverted.

I'm glad that people put up with the nonsense and contribute to Wikipedia... but what a shit experience.

> He attracted the ire of some wackjob when he referenced printed materials. Given his somewhat unique position, he had some stuff digitized and posted, ...

Were these published documents, which reviewers could independently obtain and verify?

> Were these published documents, which reviewers could independently obtain and verify?

That doesn't stop some of the people at WP from reverting everything. Which means OP's friend is left to either leave it reverted, or trawl through the various arcane dispute resolution / meta pages, arguing their case, building consensus, to eventually get people saying it should be left in. Or saying it should be left out because those meta pages sometime feel as random as tossing a coin.

Unsurprisingly this optimizes for people who tolerate vast amounts of meta bullshit, and not people who know what they're talking about and know what the good sources are.

There shouldn't be any need to build consensus – the behaviour described by the OP is plainly against Wikipedia policy.[1] That's not to say that it didn't happen, but it's important to be clear about what caused the editor's bad experience. Sometimes it's rogue editors and sometimes it's overly pedantic enforcement of policy, but edits are often reverted for good reasons.

[1] https://en.wikipedia.org/wiki/Wikipedia:Offline_sources

I get that.

But facts ("Here's the catalog number in the New York Public Library, and how to request it via interlibrary loan.") shouldn't require consensus.

That's true, but on Wikipedia they do.

Especially for cases like that - if someone has more 'sway' on Wikipedia than you, they can (and will often) just say something like "Thanks for the source - I'll verify and if it says what you think it says, I'll add it in to the article." Then do nothing, ever again. In fact, unreliable citations, or citations that don't actually say what the citer thinks they say, that can be easily checked online are far more acceptable on Wikipedia than citing a book.

Inertia like this leaves useless pages like "Oplomachi" (https://en.wikipedia.org/wiki/Oplomachi) live way after they should have been merged, as well as leading to the deletion of useful pages/sections/references.

The major articles, which are more likely to get attention, tend to be less dysfunctional than niche interests. That said, there are of course far more articles relating to niche interests than there are major articles.

It’s not about someone having more “sway”. It’s about someone having more free time to waste on an edit war.

Oh, believe me, there is sway - there is a hidden hierarchy. If you think you are equal in an edit war, and only need to be persistent, you're wrong. Can you lock an article or part of one?

So is there a Glassdoor equivalent for Wikipedia editors?

They don't until they do.

They were. Given his position, he was able to have some source material digitized and posted somewhere (either the agency's website or the archives for the county/city/whatever). I think at one point the wiki lords were ragging on about the absence of an article interpreting primary sources.

At that point it was too late... anything he did was on some tickler list and waging wiki-war was not worth the time investment.

For content that is very narrow niche (on the expert/author side, hopefully with a broader reader base to make it worthwhile) and thorough, Wikipedia might not be the place to write. You can try something like Wikia, where one person can create a topic they care about, and post their information (and welcome others to contribute).

Then someone can link form Wikipedia to Wikia.

Wikia isn't just for TV shows and games. For example, the text editor Vim has a Wikia section: vim.wikia.com

References are the basis of all material on Wikipedia. It's the only way to deal with bias at a global level. Your friend might be a decent person but there are lots of Commissioners of Public Works that aren't and have agendas that take a non trivial amount of effort and time to discover.

Printed references are references, and they shouldn't be rejected just because they aren't on the internet. But even when the author went to the trouble of putting those references on the internet, they were still deleted.

There's just no justification for that. Even if a policy required all references to be available via the internet, the author fulfilled that requirement.

sigh...there is justification. Which doesn't mean your friend and his content were treated shabbily.

If this was allowed, and Donald Trump is putting up his documents on the internet and referencing them on his wikipedia page would you allow it? It can do a lot of damage and people and organizations do this all the time and every day.

This is the kind of behavior the policy exists to prevent. Its well known that it isn't perfect Please read - https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest

If something has has been officially published, and is available to the public, it's a reference. That's entirely distinct from someone citing their own stuff.

Edit: ... their own unpublished stuff.

He said that there were references, just print ones not available online. They are supposed to have equal status.


I'm not the parent, but I remember reading good articles - going back to them some time later and finding them deleted. Can't really recall the specifics but I bet they were for esoteric programming languages or something.

http://deletionpedia.org/en/List_of_Issuer_Identification_Nu... that page seems like a decent example though. That seems like a really useful resource if I were looking for that information. I can't figure out why it would have been deleted.

It's an interesting list, but I can understand Wikipedia not wanting it.

Somebody somewhere is maintaining the official list. If you start a new bank, you can just make up your own Issuer Identification Number and expect to interoperate with the rest of the financial industry. Since someone somewhere has a official list, Wikipedia should publish or link to that official list.

I suppose if the official list is "secret" or unpublished, then the question is trickier. Even in that case, I think the best option would be for Wikipedia to offer an external link to someone compiling or maintaining such a list.

Uh, too late to edit my comment above, but I meant to say, "If you start a new bank, you CAN'T just make up your own Issuer Identification Number and expect to interoperate with the rest of the financial industry."

There was a ridiculous shitshow over MMA articles that dragged on for what seemed like years, a crusade entirely driven by one or two editors who knew how to abuse the AfD process.

The reasons states were essentially:

- Not Comprehensive - Not wholly accurate - Difficult to keep updated - Not all sourced/vergiable or unsourced

Which is totally fair, except that would apply to a lot of articles...

The main reason given is that the article was original research. In other words, it was not just unsourced, but fundamentally unsourceable. There is nowhere an editor can go to reliably determine whether the entries in the list are correct (at least, not to the knowledge of anyone who participated in the deletion discussion). That puts it well outside the bounds of Wikipedia's mission as an encyclopedia, which is a tertiary information source that only summarizes information from other sources.

Is it reasoning or is it mere justification? Hard to tell.

This is the protective wall around the Wikipedia editor's kingdom.

> This list is essentially unsourced and because of the nature of these numbers being unpublished, will always be original research. Does not seem to be a notable list.

Seems like pretty good reasoning to me. The list is unsourced and Wikipedia is an encyclopedia. Plus it's not a notable topic, something I'd also agree with.

Perhaps there ought to be a "wiki facts" website where such material could be published and editors could attempt to ascertain its validity and publish any facts that can be confirmed, even if not with typical encyclopedic quality sources.

You miss my point. You offered a rationale. But how exactly is it being used? 2 options : (A) It is being used to arrive at a conclusion (B) It is being used to justify a conclusion.

Not the parent here, but I was pretty peeved to see the [Red Eclipse](http://deletionpedia.org/en/Red_Eclipse) page get deleted for [not being a notable enough game](https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...) despite being one of the most active open source games around, and a history of over a decade.

I just found it annoying because if you look through Wikipedia's [List of Open Source games](https://en.wikipedia.org/wiki/List_of_open-source_video_game...) is pretty much full of less notable abandonware, and it seems like some people just got a chip on their shoulder for this game's article.

I feel very few will show up to link articles they've written; here's one that I also didn't write but had a lot of insanely useful info:


This is in no way encyclopedic and violates several policies and guidelines on lists and external links https://en.wikipedia.org/w/index.php?title=Microsoft_SQL_Ser...

Seems reasonable but I almost wish their could be a tier 2 Wikipedia where people could share this type of a time investment. Maybe that's what the various undeletion sites will wind up doing!

It's not just about deletes.

It's amazing how so many people have the same problems with WP and they are never addressed. Comments here are getting tons of up votes due to frustration.

My questions are:

- Does WP leadership really know the extent to which people have stopped editing due to frustration?

- Do they not fix it because they don't want to change, don't know a solution, or are bullied by their own editors?

- Are they aware that WP tends to attract people with this trait and mitigating these effects would make a massive difference?


I promise I'm not trying to be snarky or dismissive, but I literally have the opposite concern. I am amazed that a group of nerds on message boards can look at the most ambitious knowledge resource in the history of the Internet, and possibly in the history of the last 100 years, and come to the conclusion that it's urgently endangered by the fact that their edits about young startups and science fiction characters are rejected.

I feel like the Wikipedia project is one of the better run communities anywhere on the Internet, and that we don't perceive them that way mostly due to the absolutely spectacular scale that they operate at.

Note that I'm not saying they're the most pleasant community in the world to work with. They aren't! But that's because they're doing so much, working with so many people at so many different levels of engagement. Hacker News is a trifling community by comparison, and look how much work goes in to trying to maintain norms and civility here.

I feel like the Wikipedia community has bent over backwards to understand the concerns that completionist interest groups have about logging every possible startup, every open source developer of any note, every open source project, &c. They spend more time discussing their norms and processes about subject inclusion than any other meta-concern.

I do not feel like the tech community meets Wikipedia halfway. Every argument I see in HN threads about "deletionism" seems rooted in the idea that an Internet encyclopedia need not have strict standards for inclusion because the marginal cost of an additional article is zero. But that's simply a false statement; it's based on the false premise that the only cost for Wikipedia for articles is storage space. You can't read the Wikipedia project charter for 5 minutes without realizing that storage is their least important cost.

I literally have the opposite concern. I am amazed that [people think Wikipedia] is urgently endangered by the fact that their edits about young startups and science fiction characters are rejected.

I think this mischaracterizes WhitneyLand's fear. The worry is not that the individual edits are crucially important, it's that without a welcoming community Wikipedia will eventually run out of editors willing to do keep the site alive. While the content of Wikipedia won't disappear as a result, one fear is that over time the information will become static and eventually outdated. Perhaps this is acceptable, as older encyclopedias can be a snapshot of their time.

A greater worry that some have is that Wikipedia may change from being a relatively unbiased collection of information to cherry-picking and displaying only one side of the facts. I think it's a legitimate fear that some revisions are being refused because they don't match the desired narrative, and that the "rules" are being (selectively) applied as post-hoc justification. While this may not be an overwhelming problem yet, it's possible that a more inclusive community will prevent it from becoming a greater problem in the future.

nkurz, yes this is what I meant.

tptacek, I appreciate your passion, but actually I don't even feel strongly about the deletiionist/completionist issue. I also don't have any pet subjects that I feel are unjustly excluded.

It is about needlessly driving away good contributions in situations where there is no net benefit.

I don't want to generalize too much, but there are people who care more about the letter of the wikipedia process and debate than they do about the net affect on content quality.

Why not let editors build reputation based on customer service to their peers and to occasional contributors? If such a system was designed and motivated well enough, I believe it could be one way to help the situation. I actually journaled a while writing about ideas to discuss, but lost interest because I didn't know anyone who would care.

It's not so much they delete entire articles, it's more that certain people camp on articles and remove all your material.

> I started editing Wikipedia in 2003. It was fun. Over the years it became less fun

My experience is similar to yours. If the Wikimedia Foundation really cared about harrassment they would kick out all the deletionists.

> I spent a few hours to set up http://deletionpedia.org/ - to rescue articles from deletion.

I hit random 5 times (it's slow), and got one empty article ('looks like it survived on WP') and four profiles (one company, three meatbags), all of which were only a couple of sentences and even calling them a 'stub article' would be generous. No tears lost over these poor-quality chaff articles.

"support the development of tools for volunteer editors and staff to reduce harassment on Wikipedia and block harassers."

The only harassment I've seen on Wikipedia is from Little Napoleon long-term admins who grind contributors down with petty bureaucracy.

You might want to review the debacle that was the Salim Mehajer article then.

Do you have a reference? I'm interested.

I think this is the incident in question: https://lists.wikimedia.org/pipermail/wikimedia-l/2016-May/0...

There's a summary here, but it rather understates the whole thing - read the above link first if you have time: http://motherboard.vice.com/read/wikipedia-editor-says-sites...


That's... wow. I assumed it was something about the subject of the article getting involved in harassment, but it doesn't seem to be.

I'm glad that the author got some help.

This has been my experience also.

Not a huge fan of this.

The editors on Wikipedia wield a large amount of power in shaping the site.

When they go and make arbitrary decisions about the content on the site, and users start calling out the editors for bias and bogus decisions, well now all of a sudden the crooked editor can just cry "harassment and cyberbullying!" and go a long way to shutting down rational criticism.

>the crooked editor can just cry "harassment and cyberbullying!" and go a long way to shutting down rational criticism.

This is becoming a standard way of dealing with anyone who disagrees with you online. Reddit has been ruined by it.

Someone sent you a PM? Harassment. Comment you don't agree with? Harassment. Someone not sustaining your narrative? Harassment.

Reddit has been ruined by a mob of barely-literate jocks peddling fringe conspiracy theories constantly threatening – but perpetually failing – to leave.

The more time passes, the more I think GamerGate and it's adherents have ruined almost every community I take part in.

Besides this post, it looks like other posts that are getting to some of the real symptoms of Wikipedia's issues are getting down voted.

When you take a lot of flak, you know you're over the target...

Or, you're legitimately making a bad decision.

In 2012 I wrote my first Wikipedia article on the 50-person startup I was working for at the time. I didn't include anything overly self-promoting, just the basic facts and referenced some news articles. My article was immediately nominated for deletion and a number of community members accused me of being a "single purpose account", i.e. not interested in contributing, just advertising. Needless to say I did not go on to create/edit more articles after a welcome like that.

A couple of editors did come to my defence. I got the impression there was a lot of internal conflict about this sort of thing.

Edit to add the following: The article: https://en.wikipedia.org/wiki/Ecobee Deletion discussion: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

I think they have a point in that case. You didn't have any interest in contributing until you had a startup you needed to promote. Also, unless your startup is notable it's not supposed to have an article.

"If a topic has received significant coverage in reliable sources that are independent of the subject, it is presumed to be suitable for a stand-alone article or list... If a topic does not meet these criteria but still has some verifiable facts, it might be useful to discuss it within another article."


I agree with the above.

However, from my experience in CAD, Wikipedia's notability and importance ratings are strongly skewed towards open-source and against commercial systems.





No disrespect meant to the FreeCAD folks, but that is definitely back-to-front! The article on Solidworks lists 165,000 companies using the product as of 2013. How is that low-importance?

The skew tends to be even worse against enterprise class systems.

Those importance ratings are utterly unimportant. In the vast majority of cases, they are just the opinion of a single editor who looked at the article for 10 seconds, and they only affect how the article is listed in some automated report that nobody ever looks at.

> You didn't have any interest in contributing until you had a startup you needed to promote.

That's one explanation of the facts but it's not the only one. Any new user is likely to start out by contributing on topics close to them - places they've worked, technologies they've worked with, etc.

If employees aren't allowed to make articles on their employers then that's that. But treating a new account differently than an old one is just assuming bad faith.

I actually had been contributing small edits for grammar anonymously for a while, and was interested in getting more involved in Wikipedia. My intention wasn't to promote my employer, I just thought it met the notability requirements.

Additionally, "referenced some news articles" is not sufficient for notability because most startup coverage is PR [1]. Unfortunately, when you get genuine independent journalism, it is also not positive coverage (e.g. Theranos, Magic Leap).

[1] http://paulgraham.com/submarine.html

The community of wikimedia and wikipedia is pretty bad with this. I've seen articles locked, then marked for deletion, so you can't add more information to make it 'article-worthy'. They then say that its not 'against the guidelines', as the ultimate excuse.

Mediawiki is supposed to also be a communicative platform on these problems, but it really fails its goal there with its talk pages, when these problems really stem from a lack of centralized community and being able to easily talk about and resolve these issues. Typically, you'l get referenced to IRC or another talk page, where your issue will not be resolved and will probably take forever to be responded to.

Overall, I've ditched working with Mediawiki or anything wikimedia. They don't show caring to actually invest in open platforms and software that others can use, they're just interested in making their own projects popular. Some of the core devs are actually really good at what they do, the problem is that the framework now needs a big revamp for it to be usable outside of the wikipedia environment properly, something wikimedia will not invest in.

If they want to show good faith in freedom of information, they would make their software into components and allow other projects to use them, especially the actual wiki markup processor. This would allow people to integrate wiki functionality into fundamentally better frameworks to maintain, like Drupal 8, or design their own frameworks that internally use the packages maintained in mediawiki.

You say they're "bad" at this, but another word you could use is "consistent", and you could follow that up by suggesting that there's a coherent criteria used for what's deleted, but that nerds have a really hard time understanding it.

To wit: Wikipedia not at all concerned about storage space, but they are concerned very much about the amount of time they'll need to spend policing articles to make sure that things that wind up preferentially at the top of Google search result pages aren't full of advertising spam, lies, and cruft. Every article they add increases that burden. A reasonable line to draw in the sand is "we're only going to allow articles that make a clear statement of why their topic is notable, and for which an ordinary, disinterested editor could verify all the facts by following cited sources."

> things that wind up preferentially at the top of Google search result pages

I feel that this really should not be the worry of Wikipedia.

Well, it very much is.

> They don't show caring to actually invest in open platforms and software that others can use

You mean apart from Wikipedia and Mediawiki?

Mediawiki is made for Wikipedia, it is not software that can be well used outside of the environment. I know, because I've invested a lot of time into trying to make it so educators and communities could start their own wiki's, and what I've found is that its way too complex to expect anyone other than a sysadmin to maintain, and way too hard for users to learn wiki markup and extensions to use it in any decent way in templates. Add ontop the amount of work you need to do to just get Lua (An actual .php extension), Visual editor (Giant node.js project), or the math extension (another big node.js project) in, and you're looking at a massive amount of work that could break at any time. This is because its direction is not for individual installs, it is meant to be in an environment of sysadmins, consistent maintenance, and those who develop the framework.

Wikipedia is also not that 'open'. What you edit has to fit specific guidelines, and of course get past the moderators to be approved. It also misses the point of freedom of information, because there's tons of info out there that people want to put out, but doesn't fit the scheme of Wikipedia.

The Wikimedia visions is: "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.", and thats what I'm criticizing Wikimedia and its projects for not upholding.

> What you edit has to fit specific guidelines

If you want truly open communication, go to 4chan. It's a cesspool, which is fine for people that like that, but it's much further away from being 'information for everyone'. Claiming that WP isn't open because they want a basic level of quality is just grinding an axe - WP has informed a far greater number of humans to a far higher level of quality than all the ^chans put together.

> I know, because I've invested a lot of time into trying to make it so educators and communities could start their own wiki's

And yet MediaWiki is sprayed in wikis all over the internet. I've set it up as well. Yes, learning the full markup isn't trivial, but the basic stuff is. And I'm not sure how it's "not open" simply because it has a learning curve. Does this mean Vim is not open? Emacs? Apache? OpenBSD? The Vim GUI sucks, because it's outside it's expected environment of a terminal - does that also mean it's "not open"?

If you want an example of open software that is designed specifically 'for the punters', look at Gnome 3... where you're pretty locked down and can't do much (cue complaints then about 'freedom'), but there's no learning curve. LibreOffice is made for general consumption and still gets complaints about being difficult to use from the punters - and even then, if you want to use the more advanced features, there's a learning curve.

Complex software has a learning curve, and hiding that learning curve is really difficult. Apple 'solved' this by simply removing functionality and configurability (again, cue complaints about losing freedom). If Apple made a wiki, you wouldn't even have the choice of adding that maths plugin. Hell, you probably wouldn't even be able to skin it.

> it is meant to be in an environment of sysadmins

It's a heavyweight engine that you're wanting to put lots of heavyweight stuff on. That's what they're designing for, and it has some warts, but it works. It's daft to complain that the engine primarily written by a non-profit for one of the top 5 websites isn't written as a one-click install feather-light application.

Basically you're holding WP to an impossible standard and complaining that they don't measure up.

>If you want truly open communication, go to 4chan.

Its not about open communications, its the fact that information has to fit the criteria Wikipedia deems necessary, and that does not fit a lot of information out there. For example, a group of proffessionals in building design want to create an educational site on how to get started digitally, what software, the theories and factors in play when creating a building, fueled by their real-life experience and education over time. That is not something you can put on Wikipedia. You can put some of the theory, but ultimately experience is lost in translation or deleted due to no sources.

>Claiming that WP isn't open because they want a basic level of quality is just grinding an axe

Its not open in anywhere near what their vision states. Its open for sourced information and whatever various mods will allow. Which, is fine if thats how they want it to be, but to claim its an open platform for information is false.

>WP has informed a far greater number of humans to a far higher level of quality than all the ^chans put together.

I am not advocating that Wikipedia just be an open book to write whatever you want, but that its platform does not support much outside of sourced info, which is a category of information, not the sum of all information, and leaves out a lot of other information that doesn't fit its guidelines.

>Yes, learning the full markup isn't trivial, but the basic stuff is. And I'm not sure how it's "not open" simply because it has a learning curve.

Making a wiki has very little to do with the basic markup, and a lot more to do with designing templates and organizing how your data is formatted and presented, and that is what mediawiki fails to do in a manner that is accessible. Difficulty does reduce accessibility, which infact does reduce its openness. If it were simpler and well documented, searchable, then there would be a lot more writers. A lot of the problems can be solved by having a markup language that also acts like a programming language, being able to work with variables and inputs and do transforms on them, much like an actual templating language.

The learning curve of Libreoffice or other programs of that nature is a false equivalence. Adding a graph in Libreoffice takes a few clicks of the dropdowns, maybe a few tries of adding in info. Adding in a graph into mediawiki requires you to find an extension, install that, learn its syntax, and god forbid you add it into a template dynamically, learn how to get data variables from wiki markup. It is significantly more work and understanding of tech.

>Complex software has a learning curve, and hiding that learning curve is really difficult.

Yes, it is, but it is possible, if the software were designed for being used outside of the wikipedia environment more, similiar to frameworks like Drupal 8 or Wordpress are, it would be much more maintainable and learnable. Understand that wikitext is just a small small part of a mediawiki environment, and even thats enough to bar entry for many people.

>It's a heavyweight engine that you're wanting to put lots of heavyweight stuff on. That's what they're designing for, and it has some warts, but it works.

Its an old engine that is very integrated into itself with a lot of tech debt that hasn't been paid back. They are designing for that, not for creating a framework that best suits accessible, editable, presentable information.

>It's daft to complain that the engine primarily written by a non-profit for one of the top 5 websites isn't written as a one-click install feather-light application. Wikimedia does not claim the Mediawiki is made for WP and shouldn't be used outside of it. I claim that, but thats not how it should be.

An application being 'heavy' has nothing to do with its maintainability or usability to the end user. Arguably, Wordpress is much heavier, yet has a built-in auto updater, plugins and theme installer, and is quite easy to setup.

>Basically you're holding WP to an impossible standard and complaining that they don't measure up.

I'm holding /Wikimedia/ to the standard they've set for themselves, with expectations much lower than that, and still it doesn't hold up, because they are not actually doing what their vision is, they're just making their own product where information has to fit their guidelines. You can argue that Wikimedia is a non-profit, or the software is complex, or whatever you'd like, but the reality is that there is a significant amount of information that will never be passed into the internet space because good platforms for it don't exist yet.

How many 50-person companies do you think have existed, globally, since 2001?

Lets just make some very rough estimates. Sweden with a population of 10 million creates an average of 36 500 new companies per year. Let say than 5% reach at some point 50 employees, which would result in 180 Wikipedia articles per million population per year. There is 7.4 billion people in the world, so that is 180 * 1000 * 7.4 * 16, which would be a bit over 21 million Wikipedia articles that only covers new 50-person and bigger companies (not including companies created before 2001).

The people who accused you of "single purpose account" were of course in the fault since they should have assumed good-faith, but I can't generally disagree with the notion that a 50-person startup might need more than employees to be notable enough for a encyclopedia.

The assumption that the world creates 50 person companies at the same rate as Sweden is not likely.

Company listings are actually often really useful on wikipedia, there can be some outside info that is far better than what the company itself has and if there are suspect things about the company wikipedia can link to them as well.

Wikipedia at one point deleted the article on Atlasssian because 'it wasn't notable'.

Unlike so many minor characters in Star Wars...

I said "very rough estimates" since indeed the world average is unlikely to be exactly the same as Sweden. But even if we reduce the global rate to 10% of Sweden (fair?), it is still 2 million articles and would cover half the current size of english Wikipedia.

Usefulness of non-notible articles is often discussed in Wikipedia. One side generally argue that any article that is useful should be included. The idea that notability is the criteria and not usefulness is an interesting discussion, and part of the deletionism versus inclusionism controversy.

Considering approximately 2 billion people in the world are self sustaining farmers, even 10% of Swedens rate is an over-estimate.

>enough for a encyclopedia.

I hate this argument. Wikipedia is not some storage bound book shipped out to people. There should be no limit on articles as long as the content is verifiable by contributors.

You might say, "but the disambiguation page will get big." well, that's a technical problem that can be fixed. I can search for something on Google and find it easily even though it contains far more than Wikipedia.

And I hate that argument too. There are long-term costs for maintaining any page on wikipedia, because there's always people out there looking to insert troll content or provide biased information for a specifically-targeted search term.

I'd estimate the typical page on Wikipedia has 0-1 people actively looking after it. And some of these articles are extremely popular (but noncontroversial) people/places/things. Wikipedia is full of articles which are "done" but still suck.

So you cannot look at a volunteer project and determine the storage costs are negligible, no problem, because that's very obviously not the main challenge.

> There are long-term costs for maintaining any page on wikipedia, because there's always people out there looking to insert troll content or provide biased information for a specifically-targeted search term.

But these costs do not scale on a per-page basis; rather, they scale based on the number of trolls. I don't think the number of pages meaningfully changes the amount of effort "trolls" put into "trolling"; meanwhile, automated tools like watchlists allow you to keep an eye on an unlimited number of pages.

It should be much easier to automate anti-"trolling" tools on fringe pages which get very few edits - e.g. automatically adding newly-created or rarely-edited pages to a watchlist.

Finally, it doesn't look like wikipedia has a great editor retention policy if the problem was really combating trolls; There seems to have been an assumption of bad faith on the count of GP - if he is really a "PR shill", then no skin off their back - if they're paid to do it, they'll keep trying, becoming a "troll". However, if he was to be a legitimate editor, blaming them from starting in their own topic of interest(even if it was self-promotion) doesn't seem like a good way to retain them as a long-term editor.

> they scale based on the number of trolls

Low-level trolling has almost zero cost on wikipedia, you don't even need an account. Especially for articles that are largely politically uncontroversial and "done". So they probably catch most of the vandals, and use their process to stop maybe the top 20% of political kooks. But when some random adds dubious information to a long-tail article, it can hang around for years.

Moderators are voting to remove pages that have validated data. That's idiotic because it's more work than leaving the article be in a locked state.

I would rather have a bunch of articles that are locked rather than deleted by some moderator that thinks they are defending the glory of worthy human knowledge.

There are no moderators, you could comment too.

In Sweden, the current proportion of companies greater than 50 people is closer to 0.5%. Since many companies close before reaching this size, the rate attaining that size is much lower by another factor of 10. So, your numbers exaggerate the amount by at least 100x on this basis, not to mention the huge portion of the world population which is not living in advanced economies with high rates of corporate formation.

It would be fun to estimate storage space for the articles.

Can you point us to the actual article you wrote? What was its title?

Tip: It would help to disclose in the user page too.

The anti-online harassment industry is becoming a toxic den of snake oil salespeople. Those offering "solutions" tend be politically-motivated cash grabbers who line their own pockets by manufacturing vague and amorphous problems to exploit society's genuine empathy.

Maybe so. But they do need a solution. What are the options?

> Blocking – making it more difficult for someone who is blocked from the site to return

If they're talking about IP bans on viewing Wikipedia here, this is a terrible idea. If some troll gets banned on a college campus that will result in the inadvertent ban of thousands of other connected to the same network. This line strikes me as naive.

Perhaps the blocks aren't implemented as naively as you assume. It wouldn't be too difficult, as per your example, to distinguish between an IP that is shared by many devices and one with more limited use. There are also other ways to block besides just IP-based.

I do agree with you that the nature of the internet makes blocking people from free (non-paid) sites inherently difficult though.

The first proposed suggestion is to send a cookie when a blocked account logins and have it be sent if they logout and tries to create new accounts. I personally doubt this will have a major effect on system which has user accounts (like college campus) but libraries could get problems.

The second is to limiting the scope of a IP ban so that it only effects specific user-agent strings, trying to only effecting the intended user.

The third suggestion is to create a cookie when a new account is created that "counts" the number of times a new account is created on a single machine.

Cookies as a security mechanism? Relying on people not knowing about Ctrl+Shift+Del or incognito mode or user-agent chooser extensions or just using different browsers? Who seriously thinks that's going to work?

These tools are never going to be perfect, just roadblocks. Many people don't know those tricks.

They've been dealing with this issue for years. If you're on a university campus you can access wikipedia and see the recent changes, there's usually a weird mix of very obscure academic content getting added and undergraduates adding insults about their friends.

If there is a sudden spike in vandalism I believe they just require you to log in with an account usually.

There's a difference between editing and viewing. :-)

I've also found IP bans to be ineffective due to serious collateral damage. I think shadow-banning is one of the most effective methods.

How possible is that on Wikipedia though where you are expecting to see your edits? On Reddit people rarely go back and look at a thread they commented on from an unlogged in computer.

Unless the user is trying to bypass the shadowban (by verifying anonymously), they should see "what they expect" whereas others won't see it.

Given this was talking about harassment, I assumed it could cover comments in article discussions reasonably effectively.

If you shadowbanned a ip because of a user and others with that ip could still view the site read only you would only unconvinced people that wanted to make edits which would be probably rare .

We're still not sure shadowbanning is not liable behavior for fraud. So that is a risk to take all on its own.

I would expect browser fingerprinting and other similar techniques would be more likely to be used.

Wikimedia believes browser fingerprinting of users is too intrusive and against their privacy policy.

Toxicity, not harassment, has always been the glaring issue with the Wikipedia community. Of course there are the occasional miscreants who look to personally attack and harass people, but the entire site seems to be dedicated to finding the most lawyerly and acrimonious way to discourage contribution.

Some of that is certainly warranted. When political topics are the target of massive edit wars as each side seeks to enshrine their particular truth in the public record, you need rules and enforcers and arbitrators. But it can get extraordinarily toxic.

Will this money be put towards reducing harassment from the moderators? Wikipedia and it's gaggle of moderators make it difficult to add/remove things.

Consider using an alternative such as:


"Starlords"? "Planetary Knowledge Core"? The "Seven Canons"? What is this, an encyclopedia, or bad sci-fi?

I'm curious, their thing seems to be reject "published sources" in favour of "reliable truth". How does it expect to define what is "reliable truth" then, if not through citing reputable sources?

I clicked random page a few times to see what was available and a lot of the information seems to be lifted directly from Wikipedia anyway.

Also it all feels pretty silly with all the 'galactic' titles, at least Wikipedia has an air of professionalism, despite the rampant bureaucracy.

It just has an air of professionalism because you're used to it.

Is this talking about harassment by moderators? Serious.

That's what I'm trying to determine myself. Geez, talk about burying the lede.

It's interesting that after 15 years of operation, Wikipedia does not apparently have decent tools to detect harassment /sarc.

There will be a lot of discussion about the symptoms here, but the cause is straightforward: Wikis are built through conflict, and much of that conflict involves harassment, doxxing etc. Ask anyone who has tried to edit any major page.

The real solutions to harassment are counterintuitive: Enforce full anonymity, take measures to stop people and gangs "owning " pages, stop using a system that lets any user at any level veto other user's edits, have a proper editorial workflow, and many more. But none of these will never happen, so the harassment will continue.

It should also be noted that the Wikimedia Foundation just raised millions of dollars in its latest fundraising drive, and has millions more in the bank, so it really doesn't need the money.

> Wikis are built through conflict, and much of that conflict involves harassment, doxxing etc. Ask anyone who has tried to edit any major page.

Wikipedia has tainted the well. Places like Meatball wiki were really good for a long time.

I wish he would donate $500K to filtering scams from Craig's list.

Instead of typing on Hacker News, why not become successful and do it yourself?

I'm reading the harassment report[1] and I'm having trouble understanding it.

Of those surveyed, 38% said they had experienced harassment. I understand that bit.

Then, those who had been harassed or witnessed harassment were asked to identify the type of harassment. Of these, one of the least common types is "hacking", with an average of 2.69 times.

I don't understand what that is saying. Each person who was harassed was hacked 2.69 times on average??! How can that be possible?

[1] https://upload.wikimedia.org/wikipedia/commons/5/52/Harassme...

What hasn't been said is that Wikipedia has adopted an explicitly narcissistic goal, of being embarrassed by being the only wrong source the fewest possible number of times (namely never) - as opposed to doing the best job of accurately informing more people, more of the time; even if the data is new, or uncommon. I've had the New York Times rejected as a source because it just wasn't prestigious enough, and deleted. Which is charming if there's a better contrary source, but there wasn't. This goal is not compatible with that of being a very widely sourced, and very current encyclopedia. It is quite compatible with ossification.

I can't believe a half a million dollars is necessary to reduce harassment. I'm all for reducing harassment online but that much money could make a serious difference applied in a different way. Look at Wikimedia's report on harassment (which this is in response to):


There's room for improvement but not a half a million dollars worth of room for improvement. I suppose people will donate to what's important to them, though.

It sounds like you think this is easy. I think you're underestimating the difficulty in protecting a large, diverse, fairly open online community. In the general case, this is an unsolved problem (see Twitter and Reddit).

I could easily see them spending half a million trying various things and not solving the problem. Or, more hopefully, they'll come up with some good ideas that will work elsewhere too.

that much money could make a serious difference

To what, things you care more about? You could make the same argument about anything. It would be one thing if you were critiquing the administrators of Wikipedia deciding to allocate resources to this - since they are answerable to a variety of stakeholders - but when you critique someone for choosing to donate to alleviate a specific problem then I fail to see how that's any of your business.

Everyone is open to criticism for their actions. I'm free to disagree with this person just as much as you are free to disagree with me about it.

Sure, but I notice a distinct absence of any justification for why your concerns should hold a higher priority, which says to me that you consider harassment to be an issue of inherently low importance.

That's a real asinine assumption to make. I explicitly said in my comment that harassment is a worthy cause, but that a half a million dollars is excessive.

By what standard? How much do you think would be appropriate to spend on this issue?

A calculated sum based on an objective analysis of the needs of the situation to fit the financial requirements of specific action items.

Put a number on it, otherwise you're just blowing smoke. If you can't put a number on it, then you clearly haven't performed the analysis and so your claim of it not being a good use of money is untested.

How do you make an objective measure of the financing required to fix harassment problems to a reasonable degree?

The same way you actually fix the problems. By figuring out the steps you need to take, and estimating their cost. If you can't do that then how are you going to fix the problems even when you do have the money?

Having worked on anti-abuse/fraud work, I think $500k is light for a site the size of wikipedia. It takes people working full time to create and modify the systems to stop it.

1/2 million is only 50k for 10 years. So, no it's not a lot of money.

> I can't believe a half a million dollars is necessary to reduce harassment.

I can't believe they think half a million dollars is enough.

Wikipedia is pretty toxic, even now after years of concerted effort to de-toxify it.

I predict someone's going to launch this expensive thing; there are going to be about 750,000 words written about it across different wiki meta pages; some policies are going to be changed (with yet more meta discussion); and then the money is going to run out.

I mean, just look at "Mexican–American War" versus "Mexican-American War" -- that was 20,000 words just on that meta page. (There are easily 500,000 words, some at Arbcom, about - vs – on wikipedia).

It's not just toxic, it's deadly. It certainly brought me to the very edge. There are a number of people who have committed suicide after being harassed on Wikipedia.

That survey is from 2015. 2016 saw a surge in harassment (or, at the very least, a perceived surge in harassment). I'm sure this donation is somewhat of a preventative measure, given our political climate.

Seems like a fine donation to me. Feels like a world where facts are just opinions to some folks, so I think (perhaps and hopefully only cynically) it is going to get worse. If everyone doing good work on Wiki pages is abused into not wanting to do it anymore, that would be a sad and significant loss indeed.

>If everyone doing good work on Wiki pages is abused into not wanting to do it anymore, that would be a sad and significant loss indeed.

Agreed, but Wikimedia's own report (linked above) contradicts this hypothesis. fnovd suggests 2016 is different, but no survey was conducted so it's difficult to say.

I was glad to see a large percentage ignored it, only ~4% surveyed stopped editing in an old survey, but how long would you ignore being harassed? And the number that found it upsetting is higher than those that quit editing. Seems like a preemptive measure.

Google hasn't solved harassment in Youtube comments. Twitter hasn't solved the problem of people tweeting images of lampshades at jews.

Wikimedia's resources are quite a bit lower.

Maybe we really just can't have nice things.

This is a great donation by Craig. I fully support it. I also think there should be more legitimate alternatives to Wikipedia and some more competition, but there is not unfortunately.

I've tried to edit Everipedia which bills itself as an alternative to Wikipedia like other alts like rationalwiki etc and although they have a long way to go to get their software and UX up to par, their premise is pretty cool. They want to have a live updating branch of wikipedia always on their site in real time to edit and fork by their own community. I'd say the best alternatives so far are Everipedia, RationalWiki (if you can call this an alternative), and smaller projects that are niche like Stanford Encyclopedia of Philosophy. There should be more legitimate ones in my opinion though.

I wonder if there is a larger issue here. I see parallels with sites like StackOverflow - it was fun and productive in the beginning, now there seems to be an army of users and moderators on the ready to shut down questions they don't like for whatever reason.

I was hoping that this was a donation intended to lessen Wikipedia's nagging for donations...


I mean, it's the single most ambitious volunteer resource on the Internet and almost certainly the greatest encyclopedia to have existed in the history of the world and possibly the among the greatest knowledge resources to have existed in the history of the world.

But, sure. The admins are obnoxious sometimes. I agree with you.

Worst site ever.

I didn't say "worst site ever", I said the community is insufferable. If you don't edit you don't have to deal with the monkeys there and thus the website is, as you've said, the greatest encyclopedia to have existed in the history of the world and blah blah. But don't you dare to register.

Wikipedia has almost as many trolls as social media.

In my experience the editors of Wikipedia definitely deserve to be harassed.

Is Wikipedia getting worse - No

Is Wikipedia losing editors - Yes, because it's 99% done. The work is not there.

Do people not like the fact what they think is important gets rejected - Yes

Do they blame that on harassment - ?

I really think this is solving a problem that doesn't exist.

Wikipedia is winding down not ramping up.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact