Hacker News new | past | comments | ask | show | jobs | submit login
A Chinese woman wrote millions of words of fake Wikipedia history (sixthtone.com)
326 points by Zuider on June 29, 2022 | hide | past | favorite | 164 comments



All the comments here are positive, the range is from slightly amused to comparing her to Borges (rofl).

I didn't like it the moment I heard this, but now contrarian mode has kicked in:

Why isn't this destruction of essential information infrastructure? Why isn't this a 'fuck you' to the millions of volunteer hours W is based on? Why isn't this potentially infecting millions of minds with lies?

Why isn't this absolutely deplorable?

(COI Statement: i am a wikipedia editor for 15+ years, I am a member of my local wikimedia chapter)


Completely agree. If she had released this in a different medium it would be a wonderfully fun achievement (which I think is what most of the comments are responding to), but what she actually did is quite literally a detriment to humanity. It goes against basic, fundamental principles of what it means to be a good person.


> Why isn't this destruction of essential information infrastructure?

Because it doesn't scale. Unlike spam / fake news, the effort to take down something like this is smaller than the effort of creating it.

Now sadly when people start using GPT-4 for the same thing, the balance changes, and we're not far from that..I'm much more worried about that.


That's an interesting take on balance. Although, the determination, not to speak of the will, would usually add to a sentence, the plain damage taken as base-line by some meassure, whereas the individu prospect of future damage is a priori limited by the hope that measurate judgement will deter, because you shan't condem future crimes before they happen.


The problem with fake information on Wikipedia is that poorly sourced documents will source it, which will then be sourced by Wikipedia down the line until the misinformation is self- legitimized.


I'm also quite surprised that the impact this could have on how the Chinese view Russia and Russians in general is not mentioned. It's likely not major (I'm not sure how easy it is to access the Chinese Wiki in there), but I'd imagine it's non-negligible.


> Why isn't this destruction of essential information infrastructure

It may be deemed "vandalism" by seasoned editors, though Arb might chose more salient descriptors.

I don't recall right now what the legal opinion on destruction of data is, ie. irrecoverable erasure in the most trivial case, and I think they wouldn't introduce specific new laws around IT systems if it were that easy. But if it is, then analogies from destruction to somewhat impeding property is well precedented (in .de) even if it's entirely reversable in theory.

1mil words isn't that much work though, so I'd consider this a fairly quick end and not exactly news.


Probably because it's not merely destruction but creation with volume. As a wikipedia reader I guess I have the luxury to laugh about it knowing stuff like this is eventually discovered and corrected


It's of little help to you if it is corrected a day or decade after you read the article.

The thing to remember is to check the cited sources whenever the information is remotely important to you.

See https://www.theregister.com/2017/01/16/wikipedia_16_birthday... or this from Charles Seife:

"Wikipedia is like an old and eccentric uncle. He can be a lot of fun—over the years he's seen a lot, and he can tell a great story. He's also no dummy; he's accumulated a lot of information and has some strong opinions about what he's gathered. You can learn quite a bit from him. But take everything he says with a grain of salt. A lot of the things he thinks he knows for sure aren't quite right, or are taken out of context. And when it comes down to it, sometimes he believes things that are a little bit, well, nuts. If it ever matters to you whether something he said is real or fictional, it's crucial to check it out with a more reliable source."

—Charles Seife, Virtual Unreality, Appendix, "The top ten dicta of the internet skeptic", Dictum no. 1.


Charles Seife describes my feeling about every scientific or journalistic text I read.

What he does not describe is my disability to understand most topics good enough to find out if a source is reliable or not. That a paper got into a famous money-making journal certainly is not a guarantee for reliability.


I didn't think reliable was gradable. Conversely, singular a + more usualy commanding plural agreement is two juxtaposed determiners selling one off as adverb.

Why not suggest instead to check more reliable sources. Would that sound too diminishing, and an indefinite determiner is added as buffer?

Well, if I tried learning linguistics only from Wikipedia I won't ever find out.


Controversial, but I largely disagree. I don’t think I’ve ever read the wiki page on something I’m an expert on and found any inaccuracies that weren’t due to advances in the last couple of years.


Are you a mathematician?

I am only asking because I understand that most of the higher mathematics pages on Wikipedia are written by actual mathematicians – and pretty much unreadable to anyone who is not a mathematician and already fully conversant with the topics concerned.


I am not a mathematician but yeah the pages I’m talking about are all math-adjacent. Totally fair.


Math as structural science is tangential in [[almost-all]] cases.

Maths inasmuch as WP can cover it surely didn't make heaps advances in the last decades, except perhaps in the pedagogy department (that I can source, but it is literally decades) and category theory to give the whole thing structure (also pretty much cut and dried as soon as Bourbaki, I believe, though reception may take a while).


I have a herd of elephants. I decide one day I get them all really really well fed, and then lead them to the marketplace where they then take a really really big dump on the stalls and the produce and the sellers and the customers. In the end, everything and everyone is just covered, and people are walking knee-deep in elephant poo.

That's not merely destruction but creation with volume.


Though my personal experience with Wikipedia editors has been imperfect, thank you for your commitment to providing accurate information, thank you for your service.


Even worse, what if it isn't a single actor on a single website but a whole swarm of army faking humanity?

E.g. The CCP recently declared that Hong Kong was never a colony of the UK, but "under ruling of a colonial government" [1]

1. https://www.bbc.co.uk/news/world-australia-61810263

Are we aware that in China, people mostly support Russia's invasion to Ukraine because of state level propaganda?


I don’t really see how that example is supposed to be that egregious?

“A colony of the UK” vs “under the ruling of a colonial government” are pretty similar.

Given that the basis of the UK’s colonization of Hong Kong was based on a treaty granting it control over the territory for 150 years, after which it was returned to China. The Chinese portrayal seems apt, given that it was a temporary, though long, occupation.


> The CCP recently declared that Hong Kong was never a colony of the UK, but "under ruling of a colonial government"

Is this supposed to be a difference of fact or a difference of perspective? Colonies are ruled by colonial governments.

edit: e.g. if I kidnap a woman and force her to marry me, her family may not want to refer to her as my ex-wife after they get her back.


> E.g. The CCP recently declared that Hong Kong was never a colony of the UK, but "under ruling of a colonial government" [1]

I don’t really understand the strenuous objection to this. This seems less like “1984-style rewriting of history” and more like completely routine nationalism that you see all around the world.


"You don't need to cite that the sky is blue" applies.

The thing is, in cz.WP they could do that if they want.

> Are we aware that in China, people mostly support Russia's invasion to Ukraine because of state level propaganda?

I cannot say that because of state propaganda I became aware of that. But I sense that you are low key shilling, no offence, so yeah


Yeah this problem affects all countries to varying degrees. Usually not as bad as the CCP. Most of the west have no opinion or don’t support Yemen against the oppressive Saudis who are backed by the US and more of the west. Same issues: propaganda/how News and media is handled.

To be clear: The Russian Govt invasion of Ukraine is awful. They are completely in the wrong. Putin needs to go and another strong man duplicate must not replace him. Don’t get me started on how bad the CCP is!


What always does my head in here is Wikipedia's claim or requirement to present a "Neutral Point of View".

https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_vie...

Theoretically, this would mean "representing fairly, proportionately, and, as far as possible, without editorial bias, all the significant views that have been published by reliable sources on a topic."

The way that Wikipedia justifies not presenting, say, Russian or Chinese views on global politics as prominently as Western views is that Russian and Chinese sources are simply not "reliable".

But of course, that merely begs the question.

In a way, it would be interesting to read a publication that really does present all the competing narratives, if only to learn what people elsewhere are told by their media.

The downside is that it would contain not just different viewpoints but also an even greater number of outright lies, as one would actively have to abandon any attempt to present the truth to the best of one's ability. :/


It would contain a greater number of everything if you resolve to put everyone's perspective in. Would it contain a greater proportion of lies? Maybe - there are a lot more lies about a thing than truths about a thing. And if there's a consensus truth against a 0.005% lie, that sums as 1 truth and 1 lie. But:

1) what it would completely eliminate, though, is lies that are presented with no opposing perspective. And, along with that...

2) it would have more truths. If you're the victim of a lying culture/government, there would likely be truth in one or more of the opposing perspectives, where normally there would be silence.


That is nicely put and actually a good way of looking at it. Thanks.


>Are we aware that in China, people mostly support Russia's invasion to Ukraine because of state level propaganda?

Are you aware that in USA people mostly supported Middle East invasion because of state-level propaganda? China isn't an outlier here.


> essential information infrastructure

lol


I completely agree. This almost feels like a state-sanctioned act. Isn't "Fake News" the theme of the last six years? It's undoing pieces of our society. This is just one more log on the fire.

Wikipedia, despite all odds, has proven the most effective distributor of facts of our time. Shame on those who undermine its credibility.


No, "Fake News" is a term coined to allow a blanket dismissal of anything made public that political leaders don't want to be heard. There certainly can be factual falsehoods in news, and it is worth evaluating how well news organizations are reporting facts... but "Fake News" was the opening salvo before following up with stating that fact-checking is insulting.

Let us not validate the concept by embracing the term. It is a weasel word for people who want to promote their own flavor of misinformation.

As far as Wikipedia goes, this shows a loophole that needs to be closed. I'm amused by it - I can also understand why others are not.


One nit:

"Fake News" is now exactly as you describe: "a term to allow a blanket dismissal of anything made public that political leaders don't want to be heard."

However, IIRC, the original coinage of the term was about disinformation and/or news organizations that consistently traded in misinformation or disinformation.

The speed with which it got co-opted by political leaders as a blanket dismissal was impressive. I'm not sure it even lasted months in its initial meaning.


> However, IIRC, the original coinage of the term was about disinformation and/or news organizations that consistently traded in misinformation or disinformation.

Correct. I recall it initially as a left-leaning term coined in response to the "alternative facts" incident.


The website reporting this is affiliated with the Chinese Communist Party.

https://en.wikipedia.org/wiki/Sixth_Tone

But the story is perfectly real: the link to the community discussion on Wikipedia checks out.

Chinese state media do enjoy reporting – often intelligently – on Wikipedia's foibles (I've been quoted by them a number of times). And I have sometimes wished Western media were equally diligent about digging up stories like this, rather than always reflexively singing Wikipedia's praises. Wikipedia would actually profit from the scrutiny, as I and some Signpost colleagues pointed out at the 2015 WikiConference:

https://commons.wikimedia.org/wiki/File:Journalism_and_the_o...

(The Signpost is the English Wikipedia's community newspaper, https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost )

But it is also clear that this Sixth Tone article is designed to support a political narrative. I would not completely exclude the possibility that it was a state-sponsored effort. The apology the user posted (in Chinese) on the English Wikipedia (someone linked it below) does read quite wooden (I had DeepL translate it). On the other hand, this may simply reflect cultural differences.


This is not serious because if you believe something you read online, you are a sucker.

I am not saying that this is funny or anything, but I feel like it's irrelevant. I assume many articles contain lies or are downright fabricated. This just confirms my assumption, and makes me think of how many things like this one are left to discover yet.


Everyone is a sucker?

I would not believe anyone who claimed they did not believe something they read online. What is the distinction between online and offline these days as a source of information?

It made sense when online was a bunch of random people with no businesses or editors behind it. But when all major organizations and institutions have an online presence, it seems meaningless to differentiate online and offline.


This false history is a literary achievement. Writing such a plausible pseudo-account and convincing so many for so long is impressive. I guess they should take it all down but I'm kind of sad that there aren't any comparable projects (note: I'm aware that there are wikis devoted to creating fictional worlds, but these worlds are rarely historically grounded or plausible)


I’m surprised that she doesn’t publish adapted novels based on her Wikipedia entries.

Might be wrong, but both the Twilight series and 50 Shades of Gray series were initially fan fiction pieces by housewives who published their work on blogs.

Alternate historical fiction is a very popular genre.


50 Shades of Grey was Twilight fanfic with the 'serial numbers filed off'. I don't think Twilight itself has fanfic in its origin.


I think Twilight was the first novel from the author and she pretty much wrote a transcript, and got a book deal off the back of it. She had little previous experience in writing novels and was just very, very lucky that it happened to all fall in to place... some first time novelists spend years trying to do what she did.


Not to take away from hard work and talent but breaking into the arts in a big way--after which it's at least somewhat easier--must be, what, 90% luck.


Forgot to add that Orson Welles started his Hollywood career pulling a much worse public deception.


If you are referring to "War of the worlds " the radio broadcast indicates it is a audio drama. He couldn't have intended for people to think it is real just because the audio drama is in the form of a news broadcast.


The broadcast began with the usual announcement: "The Columbia Broadcasting System presents Orson Welles and his Mercury Theatre Players in a dramatization of H.G. Wells' novel, 'The War of the Worlds.'"

Listeners heard a dance band playing languid Spanish numbers in the "Park-Astoria Hotel". The music was interrupted suddenly by a flash. It was announced that a professor in a university observatory in the Southwest had noticed a series of gas explosions on the planet Mars.

They may have covered themselves legally, but it’s highly likely they intended to fool people

https://www.nydailynews.com/news/national/war-worlds-broadca...


I'm not sure how that quote is supposed to show it was a prank?

I think you are missing the nuance that radio was a young medium and the radio program was unrealistic and introduced as fiction, so it wouldn't have occurred to anyone what effect it would have on the audience.

The people who were panicking for the most part heard the part about it being a radio production of war of the world' but their brain didn't process those words. Nobody could have expected that.

The audience would have also been less media savvy than they would be today, so would react more naively than the producers might have expected to an early experimental example of a "found footage" program.


They are preserved in Wikipedia and ranked according to their longevity:

https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wi...

There's more or less a competition for them.


I suppose the real way to win would be to add a fake hoax to that page...


Exactly my thoughts. This whole story is wild. Putting over 10 years of effort into this is mind bending.

Some people really love to write. She must have gotten incredibly lost in her fictional world. I can only imagine it must have been the case here, and at some point it just happened to be that the medium she used was Wikipedia becuase that's where it all started for unrelated reasons.


She made a statement that she should learn a craft, but she already has one. She could definitely start writing a novel in her fictional world.


It was the Chinese Wikipedia. It's not like it would be vetted as closely as the English version, so not that surprising it stayed up so long.


Dunno. In uni I vandalized Wikipedia (oops, I mean created a hoax) with large, believable edits to generic/major articles like "Tree". My edits stayed up until I grew up years later, felt bad, and took them down.

I don't think Wikipedia is as closely vetted as we assume. For one, it's just so much cheaper to create content than it is to verify it. It's pretty amazing that Wikipedia is generally as high quality as it is with this in mind. And one reason why is that I imagine these types of bad actors (vandals making convincing edits just to be a jerk) are relatively rare.

I reckon most of Wikipedia's bad edits come from low-effort vandals and people trying to game high-value articles that have lots of eyeballs.


> I don't think Wikipedia is as closely vetted as we assume.

I keep finding gross errors in random pages that I visit for other reasons. For example:

https://en.wikipedia.org/wiki/French_toast

All of this article's references to French toast being described in the Roman Empire are straight-up lies. Interestingly, this is already noted on the talk page, but that has had no effect on the text of the article.

https://en.wikipedia.org/wiki/Chen_Shimei_and_Qin_Xianglian

In the "History" section of this article, we see that the characters appeared in a book in 1595, and that they are based on a real-life official (and his wife) of the Qing dynasty, which began in 1644.


There's one reference to it on the French toast page and it wasn't made up by the Wikipedia writer. There's a very similar Roman dish, and the original Latin cam be found here (recipe 3): https://la.wikisource.org/wiki/De_re_coquinaria/Liber_VII_-_...

This was translated by Joesph Dommers Vehling, whose translation can be found here (recipe 296): https://www.gutenberg.org/files/29728/29728-h/29728-h.htm

Vehling added "[and beaten eggs]", and equates the recipe with French toast, but makes it clear his addition is precisely that: it's bracketed and in lower case.


You seem to want to disagree with my comment, but I can't see where you're disagreeing.

I should note that there are indeed multiple references to the Roman Empire on the page; take a look at the sidebar.

This is the text in question from the History section:

> The earliest known reference to French toast is in the Apicius, a collection of Latin recipes dating to the 1st century CE, where it is described as simply aliter dulcia 'another sweet dish'.[8] The recipe says to "Break [slice] fine white bread, crust removed, into rather large pieces which soak in milk [and beaten eggs] fry in oil, cover with honey and serve".[9]

There are two sentences, and both of them are lies. There is no reference to French toast in the Apicius, and the quote given in the second sentence -- as you've already noted -- doesn't come from the Apicius. Wikipedia is supporting the claim that a 1st-century work refers to French toast by citing material originally written in the 20th century.

The idea that French toast appears in the Apicius is something the wikipedia author just made up, yes.


The Latin is: Aliter dulcia: siligineos rasos frangis, et buccellas maiores facies. in lacte infundis, frigis [et] in oleo, mel superfundis et inferes.

So there is no mention of the eggs, but the thing does have some resemblance to French toast.


> there is no mention of the eggs, but the thing does have some resemblance to French toast.

No, this sentence is self-contradictory. That would be like saying enchiladas date back to the Roman empire because they often combined bread with cheese.


If the bracketed note is simply a suggestion by the translator, at the very least it shouldn't be included in the quote in the Wikipedia article because it's misleading.


That would be a pretty strange change to make by itself. It would solve the problem that the article attributes a quote to the Apicius that doesn't actually appear anywhere in the Apicius. But it would create the much more immediately obvious problem that bread soaked in milk, fried, and served with honey cannot plausibly be described as French toast.


I mean yeah if eggs are essential for it to be french toast then there's no point in including the reference at all.

But an alternative for the wikipedia article I guess would be to just clarify that there was a dish of bread soaked in milk that has been compared to french toast, but that it didn't actually have eggs so it's debatable whether it really counts.


The problem with Wikipedia moderation is that is all done by people that care about the subject. If you pick a subject that no one cares too much about, you can easily deface it with non true information.

The other issue is the war cry "you are a sock puppet!" Many seem to use this to try to force their opinion to be heard. I saw this first hand once and it can get ugly.


> If you pick a subject that no one cares too much about, you can easily deface it with non true information.

I dunno, I tried a while back to see how it all works and made some edits on a few articles that I figured were pretty backwater. I wasn't defacing, I was trying to correct with valid info and attempting to follow the editing guidelines. It all just got quickly reverted by what I assume were bots, calling it 'vandalism'. I moved on.


Kozierok's Law accounts for much of this:

"The apparent accuracy of a Wikipedia article is inversely proportional to the depth of the reader's knowledge of the topic."


This sort of thing has happened on English Wikipedia as well. Remember the Bicholim conflict?

https://www.dailydot.com/unclick/wikipedia-bicholim-conflict...

Or the Amelia Bedelia hoax?

https://www.dailydot.com/unclick/amelia-bedelia-wikipedia-ho...

More here:

https://www.theregister.com/2017/01/16/wikipedia_16_birthday...


Some of this content was also in the English Wikipedia – it had been translated from the Chinese Wikipedia. See the deletion discussion here:

https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

Admin noticeboard thread:

https://en.wikipedia.org/w/index.php?title=Wikipedia:Adminis...


> Writing such a plausible pseudo-account and convincing so many for so long is impressive.

I find that in practice, the scary part is that it is not so impressive and that the thousand eyes are largely a lie.

All too often have I seen such things go on for a long time without anyone noticing, or perhaps many noticing, but not being motivated to investigate and call out, or perhaps many even calling out, but their calls going unanswered.


And perhaps scarier: Wikipedia is one of the better policed places.

If you work in a commercial publisher, there's less oversight.


Well they don't have the same threat model. Teenagers and grifters can't edit as easily in the case of a commercial publisher. That's not to say that they don't have their flaws, or that they are superior to Wikipedia. Just that the analysis is off the mark.


I routinely find spam, vandalism and heavy POV-pushing in English WP. In general, I check the sources, and if they are absent or fail to support the article text, I correct them[0], or challenge them on the talk page.

Do you? When you say you see such things going on without anybody noticing, presumably you noticed?

[0] I don't generally edit articles on politically-sensitive topics. They are wargrounds, patrolled by tough gatekeepers. I take any information in WP about current affairs with a bushel of salt. But it's still better than, for example, mainstream media.


I often notice after the fact when it blows up.

Consider the situation with XScreenLock in Debian when it was found that the code had a timer in it that bugged people to update when using an unsupported version. Once the timer reached this point and many received this update, Debian immediately patched it to remove it as many found it annoying and it was a controversial move, but to me the most interesting part was that it was in the code, publically; it was added at one point,and no one at Debian knew.

This timer that merely annoyed users into updating could just as easily been serious malware that no one would have noticed that lay dormant to awaken at a set date.


The Assassins' Creed video game franchise weaves real history to it's fictional story line. Playing the game or watching the game movies on YouTube can yield interesting Wikipedia rabbit holes.


While I understand your initial reaction, I implore you to think again about the potential consequences of her actions.

As a not-so-far-fetched example, remember that QAnon started out as a similar sort of 'blended reality fiction' (before it was likely overtaken by state-level propaganda agents). The only difference is that QAnon was based on giving a new interpretation to reputable third-party texts, while this goes one step further to actually fabricate seemingly original texts by a reputable source - I'd argue that's even worse.


"false history" -- is there any other kind? It is like the difference between a cult and established religions: the main difference is scale and time.


If you have trouble with the concept of 'truth', just go by what's useful.

That's eg why financial newspapers often stick closer to the truth: their readers want to be able to play the markets, so need something with a bit of a reality check, instead of just playing to their ideological preferences only.


Importantly, this only applies to the subsets of financial media that traders pay money for. Bloomberg's free news is garbage opinion pieces, but the stuff you pay (a lot) for is generally decent.


> financial newspapers often stick closer to the truth

Are you sure?

From my limited experience, there is a huge amount of fictitious narrative in financial news. I just had a look at https://www.barrons.com/ - hard to say how much is just opinions of the journalist, but not much looked fact based.


Older articles are easier to check for correctness.


If a journalist made random predictions, how would you know?

If a journalist made a random prediction, and market participants believed the prediction, so the market then did what the journalist predicted, how would you know?


> If a journalist made random predictions, how would you know?

If they consistently randomly get lucky, that's fine. Just check multiple stories to get a sense of the distribution.

> If a journalist made a random prediction, and market participants believed the prediction, so the market then did what the journalist predicted, how would you know?

If all you care about whether the prediction is accurate (so you can trade on it), you don't need to worry about the philosophical differences between an accurate prediction and a self-fulfilling prophecy.


Correct.

History in an ideal world tries to record what happened.

But realistically, history is a tool for politics.

But judging by the heavy downvotes of your comment, that fact doesn’t seem to be popular.


Even a biased account of history is a record of history, in some ways even more interesting when you have other biased accounts to compare with.

Goofy as his methods are, Herodotus is a very compelling read specifically because it's not one coherent narrative, but a collection of points-of-view (none of which may be entirely correct, but reflect what people claimed at least).


> that fact doesn’t seem to be popular.

No its because he question to concept of truth itself. Literally everybody knows history writing is manipulated for many reasons.

But in my observation, however in my view historiography is often far more complex then reflecting simply political desires.


I don't question "concept of truth." Ironically, the truth is corrupted already.

To answer the question "is there any other kind" would be enough to provide one [just one] example of a truthful history book (it is ok if it is imprecise in details as long as it is accurate overall--think physics theories within their application domain--we are far far away from history resembling hard sciences).

The example would demonstrate falsehoods you believe in.


> Zhemao published an apology letter on her English Wikipedia account, writing that her motivation was to learn about history. She also wrote that she is in fact a full-time housewife with only a high-school degree.

This is not the full story, she originally said the motive was to win online debates with wikipedia references (created by her alias account)


Then this article is disingenous, because it did not state it as part of her apology, which would have put some context to it.


What kind of debates?


Keyword: 贴游 or 文游

Scenario simulations based on eu4, hoi4 battles (possibly with mods), both teams would setoff by a historical event, also external events would affect the progress.

It's a somewhat popular online sport & sub-culture in China. People go to extremes for historical "accuracy" or just for the sake of it. Tons of memes are generated without people noticing.

In this example, Zhemao can claim something like "I should have extra 2000 silver starting at my castle because check out this wikipedia article". She surely have played many games in her favor.

Her apology: https://en.wikipedia.org/wiki/User:%E6%8A%98%E6%AF%9B


There was a similar scandal for the Scots wikipedia:

https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...


I don’t see any similarity between the scandals beyond the surface level one person generating a lot of content on Wikipedia that was later reversed. Your linked article is about some teenager putting up mistranslated garbage. That’s very boring vandalism, notable only due to its scale. TFA is about someone creating a vast collection of plausible-sounding history out of thin air, which is fascinating and arguably more damaging for Wikipedia.


Even worse (or better, depending on how you look at it), she interconnected her made-up history with historical places and persons - e.g. the principality of Tver did exist, so it's probably hard to tell which of the details about it are fact and which are fiction. That's almost Dan Brown-level stuff...


The Scots Wikipedia issue isn't straightforward at all. For one, considering the sheer effort spent in 170 000 edits over a decade, it's quite possible that the editor was acting in severely misguided good faith. More importantly, it raised questions about the utility and proceedings of Wikimedia sites with low traffic; a substantial discussion took place in an RFC[0] that included proposed audits to prevent future incidents, but nothing really went through other than increased attention to the long-existing Small-Wiki Monitoring Team. And the off-wiki effects...poisoned datasets and damage to the language itself!

Perhaps proposals from the RFC will be renewed in light of this, though it's not the same situation as the Chinese Wikipedia isn't really small. It's known for questionable circumstances regarding adminship and other user behaviour, though, and is generally quite insular. So, unsurprising that this story hasn't received much attention outside of Chinese Wikipedia or news. On-wiki, the only pages for it currently are that on zhwiki and a corresponding page on English Wikipedia with a brief summary.[1]

Also, enwiki's own newspaper has a more informed article on the Scots incident,[2] also with some discussion (there's also an HN post). By the way, I remember an article or Wikipedia page about how journalism about Wikipedia persistently lacks nuance or understanding of its customs (basically a community-wide Gell-Mann amnesia effect), but can't find it now...anyone happen to just know it?

[0] https://meta.wikimedia.org/wiki/Requests_for_comment/Large_s...

[1] https://en.wikipedia.org/wiki/Wikipedia:Fabricated_articles_...

[2] https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...


Media reporting on Wikipedia (with few exceptions, like Stephen Harrison on Slate) generally consists in regurgitating Wikimedia press releases. So issues like Wikimedia's dodgy fundraising practices that could do with investigative journalism for example often fall by the wayside:

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...

I and some others from the Signpost, the English Wikipedia's community newspaper, gave a presentation at the 2015 WikiConference about the lack of good Wikipedia journalism:

https://commons.wikimedia.org/wiki/File:Journalism_and_the_o...


> and damage to the language itself!

What could this mean? Sometimes people do talk about the health of a language, but that almost exclusively refers to the number of living speakers.


Being freely licensed, the Wikipedia corpus is widely used as input for AI tools such as machine translation.

https://www.theregister.com/2020/08/26/scots_wikipedia_fake/


How does that hurt the language itself? What would it mean for the language to be hurt?


Any people wanting to learn the language using these resources (e.g. learning more Scots from the Scots wikipedia) would mean that they learn the altered/false/different language instead of the actual language, and afterward that changes how the actual language is used in practice, propagating the misconceptions onwards and degrading the language.

In a similar way, if some people online are using they/their/they're interchangeably, and then other people (who are learning the language) learn that they are interchangeable and start using them this way, then English gets hurt by being altered in this undesired way.

Some changes to language are considered desirable (e.g. introduction of new terminology for new concepts, or restructuring either as a natural process or top-down reforms that makes it more clear and thus more useful for communication) and some are not (ones that increase confusion such as the they/their/they're example above), and the latter ones are considered to hurt the language.


> Some changes to language are considered desirable (e.g. introduction of new terminology for new concepts, or restructuring either as a natural process

The problem is that what you describe in your first paragraph:

> Any people wanting to learn the language using these resources (e.g. learning more Scots from the Scots wikipedia) would mean that they learn the altered/false/different language instead of the actual language, and afterward that changes how the actual language is used in practice

is just a description of how languages change as a natural process. It's not different in any way. Concluding that in this case it is "damage" and "bad" would require you to conclude that all natural language change is also "damage" and "bad", which is admittedly a popular viewpoint. But it's one you're trying to disavow.


The particular "changes" introduced by the teenager editing Scots wikipedia are very much unnatural, artificial change, and one that is done in a systematic way, "erasing" (by ignoring, simply due to the author not knowing the language) the actual Scots words and replacing them with calques from English.

If other people speaking Scots would start using them because they simply prefer to do so, I could consider that as part of natural change of language; however, if people who want to learn Scots are mislead and instead get taught examples from effectively another unrelated language ("NotScots"? "StupidScots"?) that are falsely labeled as being Scots, then that has nothing to do with natural language change and is pure damage.


> The particular "changes" introduced by the teenager editing Scots wikipedia are very much unnatural, artificial change, and one that is done in a systematic way, "erasing" (by ignoring, simply due to the author not knowing the language) the actual Scots words and replacing them with calques from English.

You say that like that isn't a ubiquitous process. Copying the usage of someone who was unfamiliar with some other usage is... almost the entirety of language change. The rest of it is copying the usage of someone who was being deliberately weird.


I wouldn't say that what happened at Scots Wikipedia is a "natural process". It's one single person lacking language knowledge being given a truly inordinate amount of influence.


AI tools and machine translation would result in the propagation on the internet of new prose, purportedly in Scots, but in fact synthesised based on the constructed language of scowiki.

Real prose in Scots is scarce enough already; if scowiki is used as a corpus for training AI, that is certainly a threat to the language.


In both cases people pretended to be experts (history / scots), it worked and large amount of inappropriate articles stayed public a long time.


> I don’t see any similarity between the scandals beyond the surface level one person generating a lot of content on Wikipedia that was later reversed.

That's the similarity.

It's like saying you don't see any similarity between Maradona and Messi, other than both of them being very successful Argentinian players.


It’s hard to not see this as Borges’ “Tlön, Uqbar, Orbis Tertius” come to life. For a hoax, it’s still an impressive feat of fictional world building.

cf. https://en.wikipedia.org/wiki/Tl%C3%B6n,_Uqbar,_Orbis_Tertiu...


Michael Scott: "Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information."


It might be worth noting that Sixth Tone is state media overseen by the Shanghai CCP, intended for English speaking audiences.


OK but what's the upshot wrt this story? Now that someone knows about its connection to the CCP, does it change how we ought to think about it in any way?


Yes. If this wasn't state-driven, it'd be a somewhat-humorous prank. If it is state-driven, it becomes a situation straight from 1984, where the protagonist's job is quite literally to rewrite history.


If you applied the same criteria to Western media, you'd conclude that the vast majority of them is state-driven too.

(Note that this is not whataboutism, but rather a proof that this behaviour is widespread and pretty much normal, which contrasts with article's rethoric)


I do, and I have. That said, the extremely-predictable "what-about-western-media" tu quoque that occurs anytime someone leverages a criticism towards any Second World nations is as tired of a counter-propaganda strategy as the propaganda it's used to deflect attention from.

Edit: just noticed that you're a self-proclaimed 50 Cent Army member. It's all making sense now - you're an actual propagandist. For anyone curious: https://en.wikipedia.org/wiki/50_Cent_Party


Wow. Sometimes here on HN I've seen comments that totally sound like CCP garbage but didn't have proof that they were actually connected. Thanks


Fascinating how for some Americans everything that is not their own state propaganda sounds suspicious; just like in USSR in the fifties.


Doesn't the fact that it's reported in state-owned media suggest, although not prove, that it's not state driven? Why would they invite extra scrutiny to their disinformation campaign by publishing an article about it?


Not really. It's a common strategy called "controlling the narrative" and it has been in play since the early days of public relations and propaganda, largely considered to be pioneered by Ivy Lee[1].

Regardless, I only posed it being state-driven as a hypothetical anyways. But state media reporting on the matter does little-to-nothing towards disproving that hypothesis.

[1]: https://pracademy.co.uk/insights/ivy-lee-and-the-origins-of-...


Very specious of the CCP to think English speaking audiences have any interest in what they have to say.


Borges, Umberto Eco, Thomas Pynchon, and Danilo Kiš would be proud; what else is the world's historical narrative but an edit war?

Such a fantastical lie wrapped around such utter banality; surely this will warrant her own Wikipedia article.


I gave up on wikipedia after adding beautiful photos of relevant chemical reactions I performed during University and watching random accounts remove them for no reason and replace them with... nothing.



Doesn't really apply to pictures especially on wikicommons.


I hope these articles are just mirrored onto a separate wiki site where more people can jump in and elaborate on the alt-world history that Zhemao started. Unfortunately I can't read Chinese, but I do for some reason find fake but entirely real feeling historical accounts very compelling.


Imagine a traveler from an alternate universe getting fed up by editors constantly rolling back their edit of A to E in Berenstein, and just documenting everything they remember from their prior life.


Wikipedia has an article on Magyaráb people.

They do not exist. First a Hungarian aristocrat was hoodwinked by some locals for money in Egypt in the 1930s, then a Hungarian student spending two semesters Cairo in the 1960s rewarmed the issue to prove his worth and successfully sold himself as an expert of all things Arabic in Hungary -- he never managed to get even a degree -- then finally in the nineties a far right weekly dug up the story for nationalistic purposes.

Finally an expedition was sent in 2006-2007 which have found with absolute surety these people do not exist. Their report is linked from wikipedia.

I tried to correct the article, I tried to delete the article, it was refused saying it's notorious enough to have an article on Wikipedia ... ... ... seriously?

Similarly, Hungarian prehistory on Wikipedia is completely outdated. Most of the "primary sources" it lists are completely unreliable, they were not even written with the intent of being reliable history wise (the The Annals of Fulda and The Annals of St-Bertin being remarkable exceptions). It notes "their reliability ... is suspect" ignoring Tamás Hölbling absolutely tearing them apart in 2010 in his two volume massive source criticism book. They are much closer to a comic book today than a historical book. It completely omits all the remarkable archaeogenetic findings since 2008. It completely ignores an absolutely groundbreaking symposium held in 1999 (they brought together researchers Indo-European and Uralic both archeological and linguistic, this was never done before), second edition can be found at https://www.sgr.fi/sust/SUST242.pdf . Overall, the Wikipedia article reflects the scientific consensus of the 1970s or so.

I tried to fix https://en.wikipedia.org/wiki/Abu_Abdallah_Jayhani because Gayhani is incorrect, one spelling which could be used is Ğayhānī (eg https://www.degruyter.com/document/doi/10.1515/islm.1998.75....) it got reverted with "The other names in the lead are not supposed to have custom alphabets or whatever they're called." Whatever, eh? But hilariously enough, Wikipedia itself has an article on the romanization of Persian and you can check that article and see for yourself that "G" in itself is never used to transliterate a persian letter no matter which scheme you pick... but whatever!

I gave up on trying to fix Wikipedia.


Yeah, you should look into Arabic Numerals, the Hindu Nationalists took over the page and now writes as originating from India, which it doesn't and doesn't really originate from a single country at all, it comes from the silk road. But anyways, they use "sources" which result in he said she said arguments, and at one time just completely lists an old elementary mathematics book as their source. The level of inaccuracy in many wikipedia articles is frighteningly high especially maintained by "offical" channels.

You should read sources on how the jewish americans helped black americans. The sources flat out state that During the period Jewish American's did nothing to help black americans during the segregation and at times actively encouraged it.


I'll add a grievance here too. The Timeline of Japanese History article really tends to skip over the less than nice parts of Imperial Japan, especially during WW2.

From ~1942 to 1945, there are no entries, for example. I know it's kinda covered in the Timeline of WW2 article, but the atrocities committed by the Imperial Japanese are not mentioned.

Whitewashing is real on Wikipedia. I know its a tired meme, but you really do need to do your own research nowadays.

https://en.wikipedia.org/wiki/Timeline_of_Japanese_history#2...


Jewish support for the civil rights movement is well documented, including strong participation in the March on Washington.

Why would you bring that particular topic up out of the blue?


Which version of Wikipedia? Do you have a link? (Edit:) Ah, never mind:

https://en.wikipedia.org/wiki/Magyarab_people


It begs: how many people find themselves weaving webs of lies in places like Wikipedia to keep their personas/identities intact?


It makes no difference to me, because I don’t read Wikipedia articles. I do use Wikipedia to find references and public domain illustrations, though.


okay


Not sure that's relevant?

But those illustrations could be full of lies just the same?


Forget Wikipedia, how much of history in general is largely made up or fluffed.


So much of it is that when archaeologists confirm a historical account, it's news. I recall when King Richard's body was dug up under a parking lot, the historians were shocked to discover he really did have a seriously curved spine. They had thought that the accounts of his disability were just propaganda by his enemies.


No its not news most of the time. The situation you are referencing is an extreme case and that's why it made news, and it happens to be a very well known story with the evidence being literally in the capital city. Seems like you are suffering from an extreme case of survivor bias.


It seems whenever I have in-depth knowledge of something, and see a journalist's reporting on it, it's all wrong.

For example, I was in a building once when a natural gas leak caused the roof to blow off. I recorded 3 local news casts about it later that evening. They all got major facts completely wrong. They said it was a warehouse (it was an office building). They said the building was cleared before it blew (I was in it when it blew, the firemen never suggested I leave).

Another one is the 737MAX crashes. The popular press consistently misrepresents and misreports it. (I've posted about that here many times.)

And, quite obviously, the mainstream media in the US has been completely misrepresenting current events. Do you think that's a modern phenomenon? I don't. I suspect it has always been happening, it's just that the internet has exposed it.


"What people outside do not appreciate is that a newspaper is like a soufflé, prepared in a hurry for immediate consumption. This of course is why whenever you read a newspaper account of some event of which you have personal knowledge it is nearly always inadequate or inaccurate. Journalists are as aware as anyone of this defect; it is simply that if the information is to reach as many readers as possible, something less than perfection has often to be accepted." —David E. H. Jones, in New Scientist, Vol. 26

Relevant in this context inasmuch as many Wikipedia articles are based on press sources.


This person should get together with the random brony who messed with the Scottish language Wikipedia.

https://www.google.com/search?q=scots+Wikipedia+brony&oq=sco...


Launched on April 6 [2016], garnering some immediate attention from the Western China-watching community, Sixth Tone is a Chinese invention: a media start-up under party oversight that features a slick, attractive website and appealing headlines designed to entice Western readers.

It's working.

https://foreignpolicy.com/2016/06/03/china-explained-sixth-t...


How beautiful is her apology . From the article:

Zhemao said she made most of her fake entries to fill the gaps left by her first couple of entries she edited. “As the saying goes, in order to tell a lie, you must tell more lies. I was reluctant to delete the hundreds of thousands of words I wrote, but as a result, I wound up losing millions of words, and a circle of academic friends collapsed,” she wrote. “The trouble I’ve caused is hard to make up for, so maybe a permanent ban is the only option. My current knowledge is not enough to make a living, so in the future I will learn a craft, work honestly, and not do nebulous things like this any more.”


A "fictional Wikipedia" seems like a nice project. Something that could grow organically from contributors with an effort to cross-link and cross-reference.

Is there anything like this?


SCP Foundation comes to mind https://scp-wiki.wikidot.com/. An entire wiki of a loosely interconnected series of supernatural events and entities. It's very easy to get lost in it.


Are they now redirecting (from scp-wiki.net and scpwiki.com) to the Wikidot URL for SEO?


If you by change understand German, there is https://www.stupidedia.org


A Sci-Fi version of this is Orion's Arm project: https://www.orionsarm.com/



Uncyclopedia and SCP come to mind.


"This Tweet links to a China state-affiliated media website. Find out more".

This is what I am seing on Twitter linking the above article. A good additional information.


When I was a kid, I enjoyed making up fake history and fake world political map just for fun lol, I also designed flags for each county I came up with. Great time.


all history is narrative, with a greater or lesser degree of fidelity to reality. This example should be preserved in some way to remind us of this important fact


I don't think it's too surprising that stuff like can happen on a non-English Wikipedia. The English one has superb moderation and reliability, but other languages range in quality from pretty good to complete garbage. It's a shame because often the only comprehensive article written about something will be in another language.


>The English one has superb moderation

I don't think I've ever seen that claim before. It's... maybe true of some articles.


Yeah, the English Wikipedia definitely has similar problems, though they may not be on such a grand scale as this. For instance, I know there's one user who, for at least a decade, has basically been trying to breath life into a new religious concept with a Wikipedia page for it. He's persistent enough, benefits from a relatively frequent typo for sourcing, and his the corner is obscure enough that nothing can be done (given Wikipedia's culture and his persistence).


Sounds like a curious work of fiction. I hope they are not going to just delete that.


I wonder if you could extract all of her contributions as a patch and use one of the deep learning language models to fill in the gaps. Maybe there's a great work hidden in there.


It's worth noting that sixthtone is a Chinese state-affiliated website. But the link to the community investigation on Chinese Wikipedia bears them out.


While it is admittedly amusing, methinks it's not harmless at all. Vandalism of this sort undermines the legitimacy of information sources more broadly. It was in fact a deliberate tactic of (incidentally Russian) propaganda outlets like RT to mix actual facts with fairly obvious lies and the appearance of a "mainstream" news source. The resulting effect that it had on consumers was that they'd say things like 'yeah, some things there are lies, but there are lies on both sides, you can't really trust any of them'. Or remember QAnon, which also derived its original success from blending authentic information sources with fabrications.

I'm not implying that she intended anything of that sort (although some of the other comments suggest that she may have used the fabrications to her advantage in smaller settings), but that her actions could have ended up producing similar results.

This is decidedly different from deliberate world-building that takes inspiration from the real and mixes it with the fictional, like what Tolkien did. It would have been exponentially harder to get anyone to care about her stories if they hadn't come with the wikipedia veneer (somewhat sadly, it might be easier for her now that she got global media attention).


I love this. I wish it was in English. It sounds a lot better than the regular Wikipedia.


Wikipedia isn't perfect, it's the best we got and it can be improved further.


Does Wikipedia no longer require you to cite any sources?


I hope she gets her own wikipedia page about this


Which means this can happen to any form of Wikipedia.


Is there an IgNoble prize in Literature?


> some Wikipedia editors warned that the incident had “shaken the credibility of the current Chinese Wikipedia as a whole

That might have been the goal all along.


Not surprising


What could she do with GPT-3 and DALL-E?


Gpt-3 does not have the long-term memory required to create a convincing book-length narrative. Not that that's a guarantee that gpt-4 will be thus limited.


America has probably rewritten history more times than we'd like to acknowledge. There are some countries that will jail you for questioning the official narratives. I have a feeling America is headed in that direction, one ideology or another.. eventually if you question it you will be labeled some sort of name to elicit the worst reactions from people towards you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: