
Sweden just claimed the world’s second-largest Wikipedia - lelf
http://www.washingtonpost.com/news/the-intersect/wp/2014/08/01/sweden-of-all-places-just-claimed-the-worlds-second-largest-wikipedia/
======
_delirium
I'm not sure I'd say it's the 2nd-largest. Second-most articles, yes, which is
a different metric. Total _size_ of an encyclopedia has to include not only
the number of headings, but how much text is under them! For traditional paper
encyclopedias, something like word-count is typically used, though that is
tricky to use across languages, since languages have different notions of what
constitutes a word (and different information rates, if you want to get into
that). Something like "compressed size of the database" might be an
approximation for encyclopedia size that normalizes for different languages'
use of words & UTF-8 bytes.

~~~
fiskpinne
I agree, I'm a swede myself but almost always use the English version because
the Swedish articles usually contains very little information.

~~~
chton
It's exactly the same thing for me in Dutch. There are a lot of articles,
sure, but the content is woeful. And there aren't even bots that artificially
inflate the numbers there.

~~~
kalleboo
I agree for general articles, but for an article that has regional
specificity, I'll look up the article in the local language and use an online
translator.

~~~
_delirium
Yeah, I do that with French and German especially. For very well-known people
and places, the English article is usually good, but if you want something
only moderately known, like a French writer who is historically important, but
not Voltaire-level famous, there's a good chance the other (in this case
French) article will be more complete. Also true for Japanese Wikipedia
articles on Japanese people and places. But Google-translated
Japanese->English is hard to read, so I unfortunately rarely consult those.

On the other hand, some of the smaller-language Wikipedias can be quite one-
sided and nationalist when it comes to local topics, especially disputed
history.

------
a3_nm
The number of articles is a terrible metric of the quantity of content of a
Wikipedia. A far better way to measure this is to look at the size of the data
dumps of [http://download.wikimedia.org](http://download.wikimedia.org)
(compressed so as to normalize differences in information density between
languages).

(Shameless plug: I tried to do that three years ago
[http://a3nm.net/blog/wikimedia_projects_by_size.html](http://a3nm.net/blog/wikimedia_projects_by_size.html)
\-- it could probably be done again today, though the code would probably need
to be refreshed a bit.)

------
arketyp
Before Wikipedia there was a Swedish collaborative encyclopedia called
susning.nu [1]. It eventually shut down due to problems with vandalism and in
the meantime sv.wikipedia had grown larger. I suspect a lot of articles were
migrated to wiki from susning, or at the very least the community did.
Interestingly, susning roughly translates to grasp, as in a short format
understanding.

"Because of this, Susning grew and became Sweden's biggest and the world's
next biggest wiki." [1]

[1]
[http://en.wikipedia.org/wiki/Susning.nu](http://en.wikipedia.org/wiki/Susning.nu)

~~~
iamtew
I remember we relied a lot on it in the early 2000's, as Wikipedia at that
time contained more general information, and susning.nu was useful for more
local (to Sweden at least) information.

------
igravious
> And it certainly doesn’t know that there are only 9 million people around to
> read its output, versus the 75 million who speak French, or the 78 million
> who speak German.

Minor nitpick:
[https://en.wikipedia.org/wiki/List_of_languages_by_number_of...](https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers)

FR: 74m

DE: 89m

That's assuming Wiki P is correct-ish and that page is not some Teuto[+]-bot
generated nonsense

[+]
[https://en.wiktionary.org/wiki/Appendix:English_nationality_...](https://en.wiktionary.org/wiki/Appendix:English_nationality_prefixes)

~~~
legulere
There are over 80m people living in Germany which is already more than 78m,
and this is totally ignoring German speakers in Austria, Switzerland, Belgium,
Denmark, Italy and Poland. I don't know how they came to those numbers.

~~~
barry-cotter
You forgot France, Lichtenstein and Namibia.

------
lucb1e
Why actually spend all this effort on localized Wikipedia articles? Like
someone commented, many people read the English Wikipedia because it contains
more info. At least I do, I know most peers do, and a Swedish commenter
(fiskpinne) just mentioned it. Sure, local articles are good to have for many
basic subjects, but when you really want to get into something you probably
know English.

People shouldn't want to read articles like the one about cryptography (just
picking something technical at random) in a non-English language beyond what
the word means and a general overview. There are so many English resources and
so many people able to read English if you'd contribute resources, learning it
should be encouraged rather than translating everything just because we can.

~~~
M2Ys4U
>Why actually spend all this effort on localized Wikipedia articles?

Because most people can't read English.

If the world's best encyclopaedia was only available in Swahili, would you
still be asking "Why actually spend all this effort on localised articles?",
assuming of course that you're not a Swahili speaker.

~~~
lucb1e
If almost everyone in the developed world spoke Swahili, it seems pretty
reasonable to say that.

~~~
emilsundberg
What? Wikipedia is for everyone. Most of the worlds population does not speak
english.

As a swede I use both swedish and english Wikipedia. It depends on what I'm
looking at. Swedish Wikipedia contains more in depth content on a lot of
topics.

------
kylebgorman
Why is this attributed to "Sweden" rather than to "Swedes" or "Swedish
speakers"?

* Swedish is the first language of about 5% of the population of Finland

* Swedish is supposedly mutually comprehensible with Norwegian

* Many minority languages are governmentally recognized and spoken by significant minorities in Sweden, including Saami and Finnish

There should be name for this fallacy of equating language and nationality.

(edit: newlines)

------
asgard1024
This article poses a real question whether computer generated articles are
suitable for encyclopedia consumed by humans.

Speaking of this, I would love to see Wikipedia in machine-readable language.
Something like CYC project.

Edit: I know of WikiData, but there should be an unstructured analog to that.

~~~
hav
As long as the content of those articles is relevant and accurate, I don't see
how that would impact on the suitability of the articles in question.

Does it really matter if a bot or a human has created an article as long as it
contains facts and not thorough analysis?

Disclaimer: I haven't read any of the bot generated articles (at least, I
don't think so).

~~~
asgard1024
I agree with your point. It's generally better to have some article that no
article. But Wikipedia is also supposed to be readable by humans, to provide
insight and understanding. I am not sure that mere formatted collection of
data, as opposed to a real article, does that. And I believe there are many
Wikipedia editors who feel the same.

Though, wouldn't it be more efficient to put the data into some machine
readable form, if they were already processed by the machines anyway? This
would then allow anybody to create algorithm to extract information (meaning)
from the data.

So I think there are also good reasons to keep the human and machine created
articles separate.

~~~
pilsetnieks
It's not so much a formatted collection of data as it is like an excerpt from
an encyclopedia. It's sparse and the articles are pretty similar to each other
but it has proper sources, and someone might actually find the information
useful. And it could serve as a starting point to improve by some enterprising
Swedish biologist.

------
rasz_pl
Of course it is, because it is full of GARBAGE made by this guy

[http://online.wsj.com/articles/for-this-
author-10-000-wikipe...](http://online.wsj.com/articles/for-this-
author-10-000-wikipedia-articles-is-a-good-days-work-1405305001)

His stupid bot pollutes wiki with useless crap like this:

[https://sv.wikipedia.org/wiki/Lepechiniella_persica](https://sv.wikipedia.org/wiki/Lepechiniella_persica)

~~~
antocv
Hehe, that guy was my physics professor. He is a genious. Had a majestic beard
as well. His stories and teaching methods reminded me of Carl Sagans style.

But his bot is quite annoying yes.

------
krick
The first my reaction is to say it's pretty cool or something, but actually it
isn't if I look at it more closely. You see, I don't really care about lots of
information gathered on the website with one domain. In fact, ideally I would
wish for exactly the opposite: if some information can be easily classified,
then it should. What I mean is that adding every new article to wikipedia has
its own drawbacks as well as value. Recently I was thinking about how cool it
would be to have wikipedia accessible offline on some portable device — and
it's nothing new, I remember edited versions of wikipedia for mobile devices
before smartphones become popular, but this sort of things isn't very useful
in many ways. On the other hand, realistically I would assume that all content
I will need to look up in my portable wiki easily fits on modern storage cards
even with some media-data. Even more than that: many articles become useless
to me if images are removed. The only real problem is that I don't know what I
_will_ look up and it's very hard to define what's useful.

Anyway, perfect layout for animal articles is significantly different from
perfect layout for math articles, so instead of populating wikipedia with
rather obscure non-verified by humans information the _really_ useful work
would be making easy-to-use thematic resources as the existing ones made by
academic people usually are totally unusable. That's where structured
information should go in the first place.

However bots on wikipedia often do some useful work, what really makes
wikipedia useful is mindful human work. Media-wiki engine isn't perfect for
structuring information and is useful specifically for free-form, unstructured
data populated by humans. I don't really see what is the use of populating
wikipedia with thousands of auto-generated texts.

------
topbanana
What a dreadful chart

~~~
m_mueller
When a chart is harder to interpret than a simple table with the same
information, you know something went wrong. Or at least you should.

------
EarthLaunch
So they're fine with a bot filling in obscure information that no person
decided was worthwhile to add manually, but if you add something manually it
can easily be deemed too obscure to be useful? In a strange way, this policy
is a logical extension of the actual (not the claimed) nature of deletionism.

------
SEJeff
Likely exclusively due to this guy:

[https://news.ycombinator.com/item?id=8033600](https://news.ycombinator.com/item?id=8033600)

Who runs a bot that makes millions of shitty stub articles.

~~~
smackfu
Which is most of what this article is about...

~~~
SEJeff
I thought it was fitting to link to the HN discussion, but actually linked it
before reading TFA. Shame on me.

------
dschiptsov
Isn't it about quality, not quantity?)

------
ilaksh
I wonder if there is an alternative to Wikipedia that doesn't suffer from the
same insular type of "Self-Appointed Guardians of the Galaxy"?

I mean one with a more sophisticated system for decision making rather than
just bureaucracy, fiefdom, politics, and bribery?

~~~
sparkzilla
If you are looking for a crowdsourced news archive and biography information
(which accounts for around half of Wikipedia's pages) you can check out my
site, Newslines [1], which fixes many of Wikipedia's software, data
presentation, and policy issues. We do this by 1) having a simplified data
structure based on news events 2) having a non-combative editorial approval
system and 3) paying our writers to post.

[1] [http://newslines.org/](http://newslines.org/)

------
Shivetya
If they want to get past the English side they can get a good start by
translating all the anime and music related pages, what good are three million
or more articles in english when a good number are about individual songs or
anime characters?

------
butler14
“Wikipedia is written by affluent white nerds, in languages only affluent
white nerds know"

so his solution to this problem was to expand the volume of content for a
language spoken almost exclusively by white nerdy people

------
edpichler
And the main page of Wikipedia is not updated, but:
[https://meta.wikimedia.org/wiki/List_of_Wikipedias](https://meta.wikimedia.org/wiki/List_of_Wikipedias)

~~~
pacofvf
It is now, that's what makes it so awesome!

