
Computer program deciphers a dead language that mystified linguists - ph0rque
http://m.io9.com/5576734/computer-program-deciphers-a-dead-language-that-mystified-linguists
======
willchang
The headline is technically correct, but quite misleading. The language in
question, Ugaritic, was deciphered in 1932. As it was discovered in 1929, it
can hardly be said that the decipherment was an arduous process. The
significance of the computer program is along the lines of:

> The key strength of our model lies in its ability to incorporate a range of
> linguistic intuitions in a statistical framework.

As someone working in NLP, this is quite remarkable, and I don't mean to
detract from it; but it should be noted that the model (like the linguists)
relied on the fact that Ugaritic is closely related to Hebrew. It would be
impossible to use such a model to decipher Linear A, which is probably not
related to any known language. (But the model could perhaps be used to check
if linguists had overlooked any connection between Linear A and another known
language.)

~~~
anateus
Exactly, hardly a lost language.

I happen to have taken an Ugaritic class in college, and after a semester
could read the texts without too much trouble. I am a native speaker of
Hebrew, so the roots are similar, which made it not very hard.

It used a novel alphabet (likely the first ever) that was a phonetic abjad (no
vowels) derived from Akkadian cuneiform (in turn derived from Sumerian
cuneiform).

As far as "few tablets" goes, I happen to have used VERY thick books full of
transcriptions of Ugaritic tablets. One of the advantages of knowledge about
Ugarit (the city) being lost is that until the recent rediscovery of its ruins
it lay mostly undisturbed.

So, if we discover another language with close genetic ties to a known tongue
and an alphabet that's different but similar, we'll have just the thing :) Of
course, this is better than what we had before, my jest doesn't mean to
detract!

Onwards with Linear A :>

------
Groxx
> _The computer program relies on a few basic assumptions in order to make
> intuitive guesses about the language's structure. Most importantly, the lost
> language has to be closely related to a known, deciphered language, which in
> the case of Ugaritic is Hebrew. Second, the alphabets of the two languages
> need to share some consistent correlations between the individual letters or
> symbols. There should also be recognizable cognates of words between the two
> languages, and words that have prefixes or suffixes in one language (like
> verbs that end in "-ing" or "-ed" in English) should show the same features
> in the other language._

So... it's a statistical decoder ring? Impressively effective, and a
_distinct_ possibility for accelerating the decoding of newly discovered
languages, but that doesn't sound like much more than a Markov chain attached
to a diff tool.

Also:

> _The lost language of Ugaritic was last spoken 3,500 years ago. It survives
> on just a few tablets, and linguists could only translate it with years of
> hard work and plenty of luck. A computer deciphered it in hours._

That's not "mystifying" to linguists, that's a mildly tough nut to crack.
Mystified them for a while, certainly - but so do many small-sample-size
languages.

edit: have not read the "original paper", this is all based off the article.

edit2: a brief skim of the original paper implies it finds cognates by
statistically-similar use of morphemes. It appears they've got their whole
algorithm in there, if anyone cares to investigate more deeply. So I wasn't
_too_ far off at least, and the article's writer did a decent job explaining
how it worked. Much better than newspapers typically manage :)

------
RiderOfGiraffes
Same story, different source, some comments already:

<http://news.ycombinator.com/item?id=1477122>

------
mahmud
This is a very tightly-knit gang of uber hackers. Their last project was a
program which generated articles _for_ Wikipedia:

<http://people.csail.mit.edu/csauper/?page_id=64>

------
petercooper
That gives me hope for Perl code from the 90s being intelligible even 1000
years into the future.

~~~
microtherion
It may be intelligible 1000 years from now, but probably no sooner ;-)

------
palish
So.. What did the translations actually _say_?

~~~
Groxx
> _The surviving Ugaritic texts tell the stories of a Canaanite religion that
> is similar but not identical to that recorded in the Old Testament,
> providing Bible scholars a unique opportunity to examine how the Bible and
> ancient Israelite culture developed in relation to its neighbors._

I missed it the first time through too. No idea what was in the data set
however, nothing mentions that.

------
shasta
Darmok and Jalad at Tanagra

Sokath, his eyes opened

------
kevbin
So what. I do this with "javac" every day.

