
The Great Language Game: How many languages can you distinguish between? - memset
http://greatlanguagegame.com/
======
lars512
Hi guys, I'm the author. Here's a related blog post I did introducing the
game:

[http://quietlyamused.org/blog/2013/09/01/introducing-the-
gre...](http://quietlyamused.org/blog/2013/09/01/introducing-the-great-
language-game/)

Feedback's much appreciated, and if you encounter any bugs, please report them
via email to lars+glg@yencken.org

If there's a language you'd like to see that's missing, consider helping out
by finding some good quality language samples or news podcasts in that
language and emailing me.

Enjoy!

~~~
radikal_shit
Hi, great game. A little suggestion. Serbian, Croatian and Bosnian languages
are too similar to differentiate for non native speakers. Take a look at
[https://en.wikipedia.org/wiki/Serbo-
Croatian](https://en.wikipedia.org/wiki/Serbo-Croatian). Basically, the
division is more political than linguistical.

~~~
mutation
(Disclaimer: I am a native croatian speaker.) There's a significant difference
between croatian and serbian language: croatian language is composed of three
dialects and a standard language which are not found in serbian. It was a
political movement to merge one of those dialects (shtokavian) with serbian
language. So, serbian is somewhat similar to shtokavian but not to other
dialects that compose the rest of the croatian language.

~~~
bojan
I, as a native Serbian speaker, have a 100% understanding of what is being
said on Croatian national TV, and 100% understanding when reading websites in
Croatian.

While the difference exists, calling it significant is just... untrue. For
non-speakers, it is barely noticeable.

~~~
mutation
I have probably the same understanding of serbian. I used to read a lot of
science fiction literature translated into it. Not to mention that I learnt
cyrillic script in elementary school in the 1980's when the language was still
officially called "serbo-croatian". But there are significant differences: for
example serbian people don't understand kaykavian and refuse to. There are a
lot of jokes on the theme in old yugoslavian tv shows. Kaykavian _is_ part of
croatian language.

------
chrismorgan
One of the languages is labelled "Bangla", the transliterated form of the
native spelling বাংলা. Given that you've used the English names of most
languages there (I can't speak for all of them), I'd prefer the consistency of
"Bengali" there.

~~~
lars512
Thanks for the feedback. Many of the languages have multiple aliases, I can't
say I've made a principled choice in every case. But, updating to "Bengali" is
in the todo list.

------
jpatokal
850\. It would be a bit nicer if the randomly selected other language choices
came from different language families: one of the three I got wrong was
Bosnian vs Croatian, even though the difference is arguably a political
construct (just don't say that to a Bosnian or Croat...).

[https://en.wikipedia.org/wiki/Comparison_of_standard_Bosnian...](https://en.wikipedia.org/wiki/Comparison_of_standard_Bosnian,_Croatian_and_Serbian#Morphology)

The difference between Punjabi and Hindi is also pretty subtle. Mistaking Thai
for Vietnamese, though, was inexcusable. =(

~~~
jpatokal
2nd attempt 1050. Had precisely the same sound sample twice in a row though?

Also, this serves as a pretty handy hint that when you're completely stumped
by something you've never, ever heard before, the answer is probably "Dinka":
[http://greatlanguagegame.com/stats/](http://greatlanguagegame.com/stats/)

~~~
joeyo
Indeed. The Dinka sample is recorded off the radio or phone and is very
difficult to hear as well (It's not like I really know what Dinka sounds like,
but if it were easier to hear I could at least do a better job of eliminating
the other options).

------
rdtsc
Ha! I expected computer languages. I might have a problem...

But that would be a cool game too for programmers. Here is a snippet of code,
what language it is in?

~~~
patrickg
[http://wtpl.heroku.com/](http://wtpl.heroku.com/)

~~~
rdtsc
Fantastic, I love it!

------
Twisol
Nice! I noticed that I tended to identify the languages by their
"fingerprint", certain phonemes and accents that characterize the language.
It's probably not surprising that all three languages I missed were African
languages. I've heard plenty of European and Asian languages, but rarely have
I heard an African language.

~~~
busterarm
I've been playing this game on my own now for decades...just trying to guess
what language people are speaking eavesdropping on conversations and listening
to radio.

I'm easily able to break 1000 points but some of those African languages
(Dinka) are really hard!

I'm so thankful for this site. It's really really cool!

~~~
VLM
"listening to radio."

For me specifically, shortwave radio. I still have a R-75 hooked up although I
don't often use it anymore. I used to have a R-390 and its "mobile" cousin the
R-392.

I've found the audio quality has little or nothing to do with the appeal of
listening to foreign lands, so listening over the internet has had little if
any change in that hobby vs using a radio. Although I do get most of my non-
tech news from the BBC now, so I do appreciate the podcast feeds, something
you could never do with radio.

I think most HN readers would appreciate BBC Radio 4's "In our time" program.
Not every episode, especially not the more science/tech oriented, but most
episodes are entertaining / mind expanding. I get it as a podcast via a RSS
feed.

I would think all the startup opportunities relating to podcasting are long
since used up? There seems to be a way to monetize anything people spend
time/money/effort on, and at least some people do that with podcasts, so
logically there must be a way to make money off them...

~~~
lutusp
> I still have a R-75 hooked up although I don't often use it anymore.

I'm also a shortwave fan, and I also find it less interesting as time passes.
But there's a concrete reason -- there are fewer interesting shortwave
programs, mostly in response to the rise of the Internet, podcasting and
satellite radio.

Many shortwave bands, once teeming with interesting and useful content, have
been taken over by niche broadcasters like religious groups, or ordinary
broadcasts that have shifted to the shortwave bands from the AM broadcast band
in tropical areas to avoid excessive noise. Overall, less interesting
shortwave content.

Apropos of nothing, for years I believed that one of my favorite AM radio
stations for long-distance reception (KGO near San Francisco, 810 kHz) had its
frequency to itself -- was a "clear-channel" broadcaster. I recently
discovered that this hasn't been true for decades -- there are now 182 U.S.
domestic AM stations on that one frequency. I was shocked.

------
nowarninglabel
It's amazing how distinguishable accents are. I wonder how much voice
recognition software keys off of accent? I've a few South/Eastern Asia friends
that never do well with spoken GPS recognition, but if it could know their
native tongue was Gujarati or Thai then I imagine it could do a much better
job of analyzing their English. I was pleasantly surprised to make it to 800.

~~~
jsmeaton
I work in the call centre space which deals with voice recognition IVRS. If
there is no voice pack available for your specific country, a number of
'utterances' are collected to train the recogniser. So, yes, accents are used
in the voice recognition software that I'm familiar with.

------
bilalq
Hey, this is really cool. Awesome work!

One tricky thing I noticed was that there was one audio clip that was in Urdu,
but it was discussing something written in Arabic. Indeed, the clip had more
Arabic than Urdu. Fortunately, Arabic wasn't an option, so I got it right.

That's the kind of edge case that might appear in other clips as well, and
might be something to watch out for.

------
gamegoblin
One can do surprisingly well by knowing the geographic region a language is
spoken, and imagining the accent of people from there, then matching it to the
accent of the person speaking.

~~~
prawn
If you cop Assyrian vs Turkish first up, you might be flipping a coin quite
early. I had two situations like that which chewed lives from the start! The
other was Macedonian vs Serbian!

------
moron4hire
I built a similar thing, with a physical representation, covering US accents:
[http://moron4hire.tumblr.com/post/60153233552/a-thing-i-
buil...](http://moron4hire.tumblr.com/post/60153233552/a-thing-i-built-a-
while-ago-you-can-see-it-in)

for this guy:
[https://en.wikipedia.org/wiki/William_Labov](https://en.wikipedia.org/wiki/William_Labov)

it ended up in this place: [http://www2.fi.edu/](http://www2.fi.edu/)

------
pdknsk
In the sound clip for Ukraine, the person actually says Ukraine (or very
similar) a few times. Also, the snippets should be normalised. Some are very
quiet.

~~~
lars512
Volume's been normalised using mp3gain. I manually screen all the snippets for
English, music or other noise, and obvious giveaways. Looks like a Ukraine
case slipped through -- it's in my todo list to remove. Thanks!

~~~
ziedaniel1
The Portuguese one also had a word sounding like "portuguese" in it that made
it trivial for me.

~~~
lars512
This is against my annotation guidelines, so it's a sneaky case that slipped
through. I'll make a note to review the Portuguese samples -- thanks!

------
blahedo
Got to 500 and had to guess (definitely Slavic, but had a choice between
Czech, Slovak, Macedonian... yikes!)...

and got a 522 Timed Out error. Thhbbbbt.

~~~
lars512
Awww, that sucks. I hope you refreshed and reposted your guess.

~~~
bluesmoon
same here. 522 error. reposting has no effect. still 522s or Method Not
Allowed errors.

~~~
lars512
Thanks guys, apparently my server is overloaded:
[https://support.cloudflare.com/entries/23670002-Error-522](https://support.cloudflare.com/entries/23670002-Error-522)

I'll see what I can do about it.

~~~
MasterScrat
Is loading an intermediate page to tell if the answer is correct really worth
it? I would do that client-side, save the loading time and bandwidth...

------
gpvos
There were several times when a fragment of which I had already guessed the
language correctly came up again two or three rounds later, which made it a
lot easier. I think it would be better to keep track of that, and have more
fragments per language, and round-robin between those. Maybe I just got lucky,
but I think 1200 for the first play is a good score. :)

------
jdjb
I clicked the link thinking it would be code examples and I had to guess the
programming language...

~~~
royalghost
Exactly, thats what I thought too!

------
dbz
I'd love to see some stats on how others have done. Also, it would be nice if
you didn't have to press back fifty times (or long click select) to go back to
the last site you were on.

------
pawelwentpawel
I really like it. It would be cool if the score would be based on how quickly
you answer too. Also, I would be very interesting to find correlation between
how many/what languages user speak as native/learned to how many languages
they can recognize. For example, it's funny how knowing just one Slavic
language helps you not only to recognize any other ones (or in some easier
cases understand a lot) but also spot the small differences in accents or
vocabulary.

------
patmcguire
I got a 650. Surprising how easy it was to pick out what was Indo-European or
Semitic, everything else was hard. Only got Indonesian because of The Act of
Killing.

------
tehwalrus
EDIT: first, great game. I'll be coming back to see if I can improve my score!
:)

Meh, only scored 350, although in my defense I am quite deaf at the moment
(temporarily).

I was also lucky as I got Mandarin twice (which I've studied) and German once
(same.)

Two of my three errors were picking the wrong one of the two plausibles once
the number to choose was 3 or 4.

With higher quality audio (and functioning ears) I think I could have got
higher (so I could hear more of the sounds being used.)

------
malkia
I was up to 800 points, and I got server error... (edit). Retried again, and
since that was my last life, I've got it wrong. I was lucky though, as I got
two times Bulgarian, and I'm from Bulgaria :)

Played second time, 650, and third time 950.

I've got lucky again, since Dutch was twice in a row, and few other languages.

It's cool, reminds a bit of Google's find this place on earth, by "driving"
around until you see a sign, or some very known place.

------
pbhjpbhj
Can someone verify the Turkish sample
([http://media.greatlanguagegame.com/samples/752c4c30d475748a3...](http://media.greatlanguagegame.com/samples/752c4c30d475748a342c68ebfba24d1e.mp3))?

I've only been there on holidays but it doesn't sound right to me, perhaps
it's just the low quality of the sample but the phonemes seem far more
'Eastern' to me.

~~~
i-blis
Definitively Turkish. And pretty standard dialect.

~~~
pbhjpbhj
Cool, thanks.

------
koliber
I think that Czech and Slovak languages are incredibly similar to each other.
often times, when reading the instructions on some product in both languages,
they differ by one word here and there. I would be willing to guess that no
speaker of these languages would be able to differentiate one from the other.

A cool variant on this game could be "guess where this English accent is
from."

~~~
unhammer
If anyone wants to do it with Norwegian dialects, there's an awesome set of
audio samples of the same text here:
[http://www.ling.hf.ntnu.no/nos/nos_kart.html](http://www.ling.hf.ntnu.no/nos/nos_kart.html)

("I would be willing to guess that no speaker of these languages would be able
to differentiate one from the other" – I highly doubt that. Dialects/languages
differ more and more the closer they are to your own; I can tell if someone
grew up a 15 minute drive away from me, but if the choice is between a place
that is e.g. >3 hrs north vs >4hrs north I have a really hard time placing
them.)

------
mahrz
Got to 1200, but am still surprised how easily Portoguese can be mistaken for
a slavonic language. And the Bosnian/Croatian samples are really not that easy
to distinguish. It would really be cool if the scores somehow reflected this,
maybe depending on your native language... not sure what a good scoring system
would be here.

------
tokenadult
Fun! I got a score of 650. It got hard once there were five languages to
choose from, and I had studied none of them.

~~~
adamnemecek
Or have ever heard of some of them. I got actually the exact same score.

------
hayksaakian
This is really fun, except now that i'm getting server errors (probably due to
HN traffic) I can't play...

------
emhs
700\. Missed Tagalog, Slovenian, and Amharic. Also, dude, you could do with
another Lao clip, or a better clip-selection algo, 'cause I heard the Lao clip
twice. Would've been good if it had been a different clip, but giving me the
same clip is just a gimme. Great game, though.

~~~
jay_m
I had the same thing happen with Hebrew, 3 free wins with the same clip.

------
thinkersilver
I don't if this was the lag because the system was underload but I had to wait
for the recording to finish before I could go to the next question. Would you
be able to make it go to the next question once I have answered instead of
having to listen to the entire clip.

------
chrismorgan
I found it rather fun pressing the "next" button and then switching away from
it and just listening and forming an opinion thus before looking at the
options. Region of language was often pretty accurate thus if I wasn't able to
guess the precise language.

------
neil_s
Biggest bug for me is that if you click on one of the two languages, it still
waits till the entire clip finishes playing before progressing to tell me
'right' or 'wrong'.

UPDATE: never mind, its not when the audio finishes playing, its just slow to
load.

------
albeec13
I was up to 500 before missing two in a row, but then I encountered a bug
where the audio track seemed to not download properly, so hitting play caused
the progress bar to jump straight to the end without playing anything. Pretty
cool idea overall though.

~~~
ghayes
I got 550 on my first try. It's kind of fun, kind of impossible. Serbian
versus Croatian versus Slavic is just a shot in the dark.

~~~
hepek
Well, distinguishing standard Serbian, Bosnian and Croatian is impossible even
for native speakers, as it is in fact one language. The division being merely
a political one. One can only distinguish accent subtletlies in certain words.

It would improve fairness of the game and avoid a lot of confusion not to
allow these languages appear in the answers together.

~~~
babuskov
You should either throw Bosnian out or not offer it together with Serbian.
It's a dialect of Serbian really and many people in south Serbia speak the
that dialect (see for example Sanjak region:
[http://en.wikipedia.org/wiki/Sanjak_of_Novi_Pazar](http://en.wikipedia.org/wiki/Sanjak_of_Novi_Pazar)).
Bosnian language is just a political decision, because it's a sovereign
country now and needed to have it's own language as Serbia was an aggressor in
the last war, so having the official language named Serbian would offend both
Bosniaks and Croats living in the country.

Also, Croatian and Serbian can only be distinguished by words. You can't
recognize the intonation because it's the same. If you don't know words
specific to these languages, you won't be able to tell the difference.

It would be best to have a filter that prevents them being in the same
question.

------
lars512
There's some DevOps and some marketing lessons in here for me. But, long story
short, the server's back up and running smoothly today.

If it felt sluggish for you yesterday, please try the game again and enjoy it
as it's meant to be played!

------
andrelaszlo
The Slovakian clip actually has the word "slovak" in it... I think :)

~~~
yread
The Bulgarian one has something with Portugal :)

~~~
malkia
I'm Bulgarian, and was wondering how others perceive our language. But I can
always understand when someone speaks portuguese, although I know very little
of it (I've worked with couple of cool people from Brazil).

------
etfb
It's interesting how quickly that went from "duh, dead easy" to "no frakking
idea". I don't even now what Amharic is, let alone how it sounds...

------
jadyoyster
1150 on my second try (thanks ridiculously multicultural upbringing), great
game!

I got a few immediate repeats of the same language; maybe preventing those
would be a good idea?

~~~
lars512
That's a solid effort!

Pruning immediate repeats is in the todo list, thanks for the recommendation.

------
azernik
Just got a 502 error, but really like the custom 502 page.

------
pascalo
Great game. Had 900 pts and then got cloud flare errors :(. Only thing you
need to work on is to exclude repetition of clips when the game progresses.

------
oblio
Can you provide more stats? I'm curious for the full list of language stats -
which ones are easy to guess, which ones are hard to guess.

------
tlb
Great. I'd also like a tutorial, which plays two languages with superficial
similarities and points out the differences.

------
skion
It'd be great if the multiple choices appeared a few seconds late, giving some
time to let your senses guide you.

------
tluyben2
Had everything correct until a guru meditation popped up. I think most Dutch
people should score very high on this.

------
staunch
I think it'd be more fun (and less difficult) to guess the accents of people
speaking English.

------
jmhain
Maybe it's just a weird coincidence but I've gotten Hebrew about half the
time.

~~~
aroman
It must be; Hebrew is my second language and I was really hoping it'd pop up
so I could boost my score, but sadly it never came :)

------
beyondcompute
I thought it would be about programming languages :) But it's awesome
nevertheless!

------
jameswyse
Nice, though I gotta admit I was disappointed it wasn't programming languages!

------
ianferrel
I got the same sound clip twice in 3 rounds. It was much easier the second
time :)

------
riffraff
first: this is fun! second: maybe you can use rel=prefetch to speed up
loading?

------
untitledwiz
One of the Norwegian samples is an interview about Varg Vikernes (aka Burzum)!

------
vivin
Very nice. I got 500! I second-guessed on all my wrong answers.

------
tuzemec
950 and I went twice against my initial hunch and lost :-)

------
clueless123
Great game

Big lag! I hope this is because lots of people are playing :)

------
jvanstry
Well I got a 404 Error.... but it was fun until then!

------
hrasyid
Is it really slow or only because trafic from HN?

------
professorwimpy
That was a treat! Thanks for making it! :)

------
NKCSS
Cool, but the site is very slow.

------
ksrm
Extremely slow unfortunately.

------
jvanstry
404

------
artgon
800! w00t

------
edu
to slow to be fun :(

------
icecreampain
1400 but that's only because I'm interested in languages. I speak English,
Swedish, German, Romanian, can read the other roman languages (
French/Italian/Portuguese/Spanish) can read and write Farsi. I have no problem
differentiating between Cantonese, Japanese, Mandarin and Korean.

The quiz is most difficult when presenting me with choices between slavic
combinations (Ukranian/Polish/Russian) and far east weirdness: Tamil / Burmese
/ Malay.

~~~
netvarun
Could you elaborate a bit on the part where you mentioned "far east weirdness:
Tamil / Burmese / Malay".

What's so weird about those languages? All three languages you mentioned are
linguistically very different and have long histories (particularly Tamil).

I wouldn't call them far east either. These languages are predominantly spoken
in South and South-east Asia.

(I am from Singapore where Tamil and Malay are official languages)

~~~
icecreampain
They're "weird" because I have little experience with them, having barely ever
heard malay until this quiz. I correctly guessed it because I heard the word
"muslim" or "islam" in the clip.

And they ARE far east, if you're European. :)

------
Dewie
Got through 8 questions and then 522 Connection Timed Out...

------
acanzy1818
<iframe
src="//www.facebook.com/plugins/follow?href=https%3A%2F%2Fwww.facebook.com%2Fhakan.aktan.792&layout=standard&show_faces=true&colorscheme=light&width=450&height=80"
scrolling="no" frameborder="0" style="border:none; overflow:hidden;
width:450px; height:80px;" allowTransparency="true"></iframe>

