

LangId Tells You The Language of Your Text - fogus
http://langid.net/

======
jerf
To help with all the many postings expressing surprise at a misidentification:
How long was your sample? According to the link posted by fixz,
[http://alias-i.com/lingpipe/demos/tutorial/langid/read-
me.ht...](http://alias-i.com/lingpipe/demos/tutorial/langid/read-me.html) ,
this is using letter-frequency statistics. It's very easy to fool such a
system by feeding it a very short sample. For a similar problem that can arise
with trying to auto-detect encodings, see:
[http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.a...](http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx)

Try feeding it a bit more than a single word or sentence.

~~~
andreaolivato
Yeah you got that right. The more you type the more google apis should be
accurate.

------
jpcx01
I love this tool. Been bookmarked.

There's something seriously wrong with the web (or my usage of it). Whenever I
need something like this, I cant find it. Go on google, search for any kind of
tool. No matter how you word it, the results never get passed 1999.

Yesterday I needed something to convert a couple timezones. Try finding
something via google. Every relevant keyword match was a pile of shit tool
belongs in 1995.

One idea is that the people who are good at making tools are the exact
opposite of the people who are good at SEO.

Or I could just be doing something wrong. Not using google right, or maybe I
should be using a different search engine. Help would be appreciated!

------
fizx
Semi-open source code and tutorial that you can use to understand what they're
doing.

[http://alias-i.com/lingpipe/demos/tutorial/langid/read-
me.ht...](http://alias-i.com/lingpipe/demos/tutorial/langid/read-me.html)

------
defied
Nice implementation of the Google Ajax Language API:
[http://code.google.com/apis/ajaxlanguage/documentation/#Dete...](http://code.google.com/apis/ajaxlanguage/documentation/#Detect)

~~~
rwolf
It's funny--I was going to post that exact phrase and link.

------
hristov
I tried this obscure eastern European language. It identifies polite phrases
correctly, but curse words get identified as Portuguese for some reason.

They need to bone up on their international cursing.

~~~
imp
I tried just the word "reddit" and it identified it as Portuguese as well.
Maybe that's a default for unknown words.

------
yannis
NLP is a hard nut to crack! Even with the Google API :)

From impressive ... identifying Greek written in English, correctly to
returning French and Czech (flag missing!) for gibberish!

Nice Toy!

------
riahi
It also picked up Persian while transliterated. That's impressive to me,
because Persian does not have a standard method of transliteration.

~~~
quant18
Transliterated Russian seems to get ID'd correctly, but only sometimes.
(Anything with "zdrastvuite" in it, for example, works fine. But other simple
Russian 101 sentences --- e.g. "U menya est odin karandash" meaning "I have a
pencil" --- get ID'd as French. Unless of course I'm spelling it wrong.).

But I haven't seen it correctly ID transliterated Chinese or Korean yet.

------
donw
Identified several common Japanese phrases as Vietnamese. True, the difference
is subtle, but...

------
commiebob
Would be even nicer if it provided a link to translate or just showed a
translation as well.

------
jhancock
I went with "ni hao". It told me it was Vietnamese. Possibly, but not only.

------
inglesp
Hm. It identified Bislama - pidgin English from Vanuatu - as Indonesian.

~~~
mahmud
Yes, "bislama" means "with peace" in various languages in the muslim parts of
the world (to me it sounds very north african.)

------
uninverted
Did anyone else expect this to be for programming languages?

------
jonsen
Close but not really close: identifies faroese as icelandic.

------
RiderOfGiraffes
An unfair test, but it "identified" lojban as Italian.

------
amr
wtf? Identifies Arabic correctly but displays the Saudi flag. I like the
service, though, despite of this minor annoyance.

~~~
mahmud
I saw that too :-/

They're confusing the saudi flag with the arab league flag, both green with
white arabic text.

[http://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Fla...](http://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Flag_of_the_Arab_League.svg/800px-
Flag_of_the_Arab_League.svg.png)

------
andreaolivato
Added 'Identify by Link' feature!

