

The weirdest languages - adrinavarro
http://idibon.com/the-weirdest-languages/

======
WildUtah
Seems like OP should have considered a popularity weighting. Among the 25
weirdest we see Mandarin and Spanish, the world's two most popular languages
by native speaker count. That's a hint that what you're measuring isn't
exactly weirdness.

Meanwhile I see Hungarian, famous for being the hardest language to learn, is
fifth least weird. Even stranger, Cantonese, which is almost exactly the same
in writing as 'weird' Mandarin, is sixth least weird. How can two languages
that you write the same be so very far apart in feature set?

~~~
chewxy
I think 'weirdness' in this context should not be conflated with 'norm'.
Instead, as I read it, I think about it the same way you would think about the
regularity of a language (to put it into compsci terms)

I speak both mandarin and cantonese. I think a lot of the 'weirdness' factor
comes from actually saying the words. Also there are some words in cantonese
that are not in mandarin. For example, in Cantonese, there is 'mou' (the
negation of having), while in Mandarin, it'd be pronounced as two words ('mei2
you3). Spoken cantonese in some regards, is easier to pick up than spoken
mandarin (definitely easier to cuss in)

~~~
vorg
If Mandarin-speakers say "meiyou" quickly, it sounds like "mou". If English-
speakers say "going to" quickly, it sounds like "gonna".

~~~
arethuza
I wouldn't say all English speakers - someone saying "gonna" sounds distinctly
American to me. Around here, something like "gonnae" (which sounds more
different than the small difference in spelling would suggest), common usage:

"Gonnae no' dae that?"

------
yread
It would be cool if the author made the dataset available it would be fun to
try other things with it - population weighing (as WildUtah mentions),
grouping them by language families and calculating intra and inter-group
weirdness (and distances), clustering the languages into new groups,
calculating weirdness as mean distance in the 21 dimensional space to all
other languages, projecting the space itself on a plane so that we can see it
better,...

~~~
chewxy
You can download it yourself at
[http://wals.info/export](http://wals.info/export) . And given enough effort,
you can quite quickly reverse engineer which 21 features were used.

~~~
yread
That's the thing, he reduced the dataset from 2677 languages with 192 features
to 239 languages with 21 features. That's usually the hardest part of any
analysis, so it would be nice if he shared the results of his hard work with
us...

~~~
TylerSinSF
(Now added to the bottom of the post.)

------
SeanLuke
I don't know where to start on this page's absurd notion that Cantonese is a
non-"weird" language, but that Mandarin is weird. This must be some academic
nonsense based on sounds and disregarding other critical language features.

Let's put aside the fact that Cantonese has among the most complex
pronunciation systems in the world, with seven tones and both long and short
versions of a number of sounds. This is a language which has four different
communication modes. Here's how weird Cantonese is.

1\. Cantonese speakers _speak_ in Cantonese.

2\. Cantonese speakers _read and write_ in a totally different language,
namely Modern Chinese, which for all intents and purposes is Mandarin.

3\. Cantonese speakers _read out loud_ in Modern Chinese (Mandarin), but
_pronounce_ each of the characters with radically different Cantonese sounds.

4\. For purposes of comic book dialogue, etc., it is possible to read and
write _some Cantonese_ using various co-opted Chinese characters. But you
can't pronounce _all_ of Cantonese this way: many words have no written form
whatsoever. This has resulted in a bizarre pidgin written form. For example,
one very common word ("di1" \-- "a few") is actually usually written as "D"
rather than as a character. Other characters are impossible to write in
current fonts, or are also used in Modern Chinese but for different words than
in Cantonese, and so you see Latin letters like "o" and "a" next to them to
suggest a different meaning.

Mandarin weird my foot.

------
lmm
I think we should have expected that Mandarin is weird but Cantonese is very
normal - in the same way that Japan has a weird primary writing system, but
its secondary writing system (Hiragana) is one of the most regular in the
world. In fact the same reasoning could apply to Hindi - it's (or was until
recently) a secondary language in India, with English as the language of
government. Do other countries with two languages follow the same pattern?
E.g. I would predict from this that Afrikaans would be a very non-weird
language.

~~~
mtts
If Afrikaans is a non-weird language that would be because it's a creole
(indiginous African and Dutch).

Don't know about Hindi, but since it's used as a lingua franca of sorts that
might also be the reason behind its normalness.

------
nemo1618
It'd be cool if they could include artificial languages like Esperanto and
lojban. Given that one goal of both languages is to appeal to speakers of any
language, it would be interesting to see if they achieved their goal (i.e.
produced a very "non-weird" language).

------
tokenadult
Definitive comments about Esperanto:

[http://www.xibalba.demon.co.uk/jbr/ranto/](http://www.xibalba.demon.co.uk/jbr/ranto/)

I see this was posted overnight in my time zone. Several of the earlier
comments correctly point out that empirically, a language that has been
acquired by many second-language speakers (for example, English) must not
strike too many people as unlearnably "weird." Many widely spoken languages
have undergone a process that linguists call "koineization" (after the spread
of Koine Greek as a common language of the ancient eastern Mediterranean and
Near East)

[http://www.jstor.org/discover/10.2307/4167665?uid=3739736&ui...](http://www.jstor.org/discover/10.2307/4167665?uid=3739736&uid=371672041&uid=2&uid=3&uid=3739256&uid=60&sid=21102412849151)

[http://www.lancs.ac.uk/fss/linguistics/staff/kerswill/pkpubs...](http://www.lancs.ac.uk/fss/linguistics/staff/kerswill/pkpubs/Kerswill2002KoineAcc.pdf)

[http://en.wikipedia.org/wiki/Koin%C3%A9_language](http://en.wikipedia.org/wiki/Koin%C3%A9_language)

in which the language simplifies some grammatical (and possibly phonological)
features as it is spoken by more second-language speakers for trade or for use
as a language of national administration in a multilingual region.

The United States is largely an English-speaking country, but only about one-
fourth of Americans have ancestors who spoke English before arrival in North
America. (Indeed, only one of my four grandparents, all of whom were born in
the United States, grew up in an English-speaking household.) In other words,
General American English is a koine language of second-language learners of
English, so it is not surprising that it is spreading all over the world.

P.S. Feel free to visit my user profile here on HN to see more about my
background in linguistics and language learning and teaching.

AFTER EDIT: Cantonese versus Mandarin as "dialects" or "languages" were
mentioned in other comments. Cantonese is at least as different from Modern
Standard Chinese (Mandarin) as German is from English. How you might write the
conversation

"Does he know how to speak Mandarin?

"No, he doesn't."

他會說普通話嗎？

他不會。

in Modern Standard Chinese characters contrasts with how you would write

"Does he know how to speak Cantonese?

"No, he doesn't."

佢識唔識講廣東話？

佢唔識。

in the Chinese characters used to write Cantonese. As will readily appear even
to readers who don't know Chinese characters, many more words than "Mandarin"
and "Cantonese" differ between those sentences in Chinese characters.

~~~
mtdewcmu
I read something about how English was simplified as a result of the Viking
invasion of England in the middle ages. It sounds like koineization. English
might be weird, but it seems to be weird in a way that makes it highly
exportable, like a successful product.

~~~
squeed
That's right. Old English has grammatical gender and noun cases (like modern
German). Modern English retains case only for pronouns.

They were lost when the Danes invaded. The language was simplified to a
mutually-understandable subset.

------
A1kmm
There are a few potential problems which limit the significance of this:

1\. There is not a universal definition of what defines a language and what is
simply a dialect or a regional variation; this applies especially on large
continents where there can be greater variation in language features between
geographically remote locations, but no clean boundary at which you can say
people speak one or the other language.

2\. Languages evolve, diverge and sometimes borrow, and so a group of related
languages can share the same potentially idiosyncratic feature because of
common evolutionary roots rather than because the feature makes sense. This
could explain the result for Hindi - it is a standard language that 'averages'
a large number of other Indian languages.

~~~
Osmium
I would call these more "caveats" than "problems" \-- anything with a title
like "weirdest languages" is going to be incredibly subjective, and there will
be no doubt no shortage of people who disagree with specific choices, but as
long as the reasoning is clear it can still be interesting/useful.

> This could explain the result for Hindi - it is a standard language that
> 'averages' a large number of other Indian languages.

I think this is a little misleading; a lot of Indian languages come from a
completely different language family to Hindi (Hindi is Indo-European, but a
lot of Indian languages are Dravidian, e.g. Tamil and Urdu), though I'm not
qualified to speak about this.

~~~
mtts
Urdu, I'm told by someone who speaks the language, is Hindi written in Arabic
script.

~~~
Osmium
Could well be true! I'm not qualified to speak about that. Looking it up,
seems like I mis-spoke when I said Urdu was Dravidian. But there are a lot of
Indian languages that are (e.g. Kannada), so the overall point stands that the
"averaging" comment is a bit misleading.

Incidentally, your comment led me to this
[http://en.wikipedia.org/wiki/Hindi–Urdu_controversy](http://en.wikipedia.org/wiki/Hindi–Urdu_controversy)
which you may find interesting.

------
b6
This is great. I'm really happy to learn about WALS. I've been interested in
constructing a practical standard language for humans for a long time. This
kind of survey of what works seems essential.

~~~
mtts
Just use Indonesian. It's very non-weird, both according to the article and
according to my own experience.

------
pointernil
To what degree do spoken languages influence our programming languages?

If programming would have been developed in asia, what would be the paradigm
for programming be?

Lines of code? Pictures? Left to right? Top to bottom? Objects and methods?

Any steampunk fantasies available reg. programming languages? How weird would
those be?

------
Ashuu
Hindi is the least weird language but still it is a secondary language in
India! Now thats weird!!

~~~
JoeAltmaier
Perhaps its historical: many languages have Hindi roots, so are similar. That
makes it more central in the statistics.

------
Dewie
It's interesting to me how Norwegian is one of the top 25 strangest languages
in the world on that list, but Danish and Swedish isn't. Maybe it was on the
lower end of the top 25.

~~~
dagw
One possible theory is that it could be related how the data set deals with
(or fails to deal with) Book-Norwegian vs New-Norwegian. Also modern Swedish
and Danish grammar only uses two genders while Norwegian still has three, so
that could weight in.

