

Linguists identify 15,000-year-old ‘ultraconserved words’ - tokenadult
http://www.washingtonpost.com/national/health-science/linguists-identify-15000-year-old-ultraconserved-words/2013/05/06/a02e3a14-b427-11e2-9a98-4be1688d7d84_story.html

======
andolanra
I'm going to reserve judgment on the findings themselves until I read the
paper (although, having a background in historical linguistics, I've learned
to be _very_ skeptical of _any_ claim that posits historical superfamilies.)

However, I _will_ complain about the pop-science intro, because the thought
that a hunter-gatherer has some chance of understanding a sentence like the
one given is laughably preposterous. An Old English speaker would only barely
understand it (just as we would only barely understand _gief þæm menn ealdan
þisne fýr_ [1]), to say nothing of the many thousands of years of change
between us and the hunter-gatherers spoken of. There is a _vast_ difference
between 'two words have a statistically provable correspondence' and 'two
words are understood as the same word.'

[1]: My old English is pretty weak, so I take full responsibility for
grammatical errors.

Edit: accidentally used 'th' instead of 'þ' in the Old English.

~~~
DanBC
What do you think about theories like Vilayanur Ramachandran's that some words
exist because they are sort of onomatopoeic?

(<http://www.bbc.co.uk/radio4/reith2003/lecture4.shtml>)

~~~
nijk
That is someone's theory? That's what 'onomatopoeia' means.

~~~
DanBC
I've really dumbed it down, and horribly mangled it.

Read the link I provided, he does a much better job there. If you don't want
to read all of it just search for "Now, remember I said the third thing you
have to do in science is show that this is not just some quirk.", and read
from the top of that section.

'slush' is onomatopoeic. Is 'axe'? How about 'milk' or 'mother'?

------
tokenadult
This is the citation to the underlying research study.

Mark Pagel, Quentin D. Atkinson, Andreea S. Calude, and Andrew Meade.
Ultraconserved words point to deep language ancestry across Eurasia. PNAS
2013; published ahead of print May 6, 2013,

doi:10.1073/pnas.1218726110

Abstract link:

<http://www.pnas.org/content/early/2013/05/01/1218726110>

Link to .PDF full text:

[http://www.pnas.org/content/early/2013/05/01/1218726110.full...](http://www.pnas.org/content/early/2013/05/01/1218726110.full.pdf+html)

------
dmd
languagehat writes, over at metafilter:

Yeah, this is complete bullshit. Actual linguists weigh in in this LH thread [
<http://www.languagehat.com/archives/004994.php> ];

Marie-Lucie Tarpent writes:

The article announces "Eurasiatic" (a hypothetical supergroup originally
suggested by Joseph Greenberg of "Amerind" notoriety) as a major discovery by
scientists (not linguists) but seems somewhat confused as to its relation with
PIE. It also quotes a few words which the authors interpret as culturally too
important to have been lost or replaced, so that they have lasted 15,000 years
(one of these words is "bark" (of a tree) for which the authors propose an
explanation). Like Renfrew and the Proto-World people, the authors do not seem
to differentiate between the survival of a lexical item (although made
unrecognizable through millennia of phonological changes) and the survival of
the sounds that compose it (which are independent of the meaning of the whole
word) (eg Renfrew et al thought that words for 'nephew' had endured almost
unchanged for hundreds of years because of the importance of this concept, but
actually this longevity is due to the fact that the consonants in the word had
been more resistant to change than others, as shown by the behaviour of those
consonants in other words totally unrelated semantically to 'nephew'). In any
case, the article does not cite any actual forms, only meanings. Read it at
your own risk.

And Piotr Gąsiorowski says:

This is exactly the kind of approach which makes wishful thinking look like
science and gets it past reviewers. Even if the numerical methods are
basically sound, the data are garbage (obtained by the intuitive eyeballing of
reconstructions from the Tower of Babel database -- itself a highly
questionable source -- without any actual comparative analysis). No different
from ordinary "mass comparison", except perhaps for a tighter control of
semantic matches. [...] I realise that one has to start somewhere, but mass
comparison is at best of some help in formulating preliminary hypotheses, and
mass comparison based on unreliable data is no use at all.

~~~
kylebgorman
Very few reputable linguists believe in a hypothetical supergroup of
"Eurasiatic". The reason that the same linguists are skeptical is that related
languages diverge fast enough that after a few millenia, any similarities
beyond that are indistinguishable from chance similarity.

------
jbattle
I don't get this. I know this is a monstrously crude approach, but I put the
sentence given in the article to Google translate

"You, hear me! Give this fire to that old man. Pull the black worm off the
bark and give it to the mother. And no spitting in the ashes!"

After translating it into a few different languages, almost none of the words
sound anywhere close to their English counterparts. French and Danish had a
fair bit of similarities but that's to be expected. Am I fundamentally
misunderstanding something here?

For reference I tried estonian, albanian, czech, finnish, and georgian

~~~
coldtea
> _Am I fundamentally misunderstanding something here?_

Perhaps.

For one, Google translate is not very accurate, even for simple stuff.

Second, the translations it picks might no be the form that has survived from
the old times. Newer words might have put the use of the older form in
disfavour, but it can still exist in the language. If something was the same
in a language from 15,000 years ago up to 500 years ago, it won't be picked up
in Google Translate, which will pick the current vernacular.

Third, they probably give an exact table of the equivalencies they found in
the actual paper. It would be easier to check that if they have it available
for free.

Four, they speak about how words sound, not how the look. So you have to take
pronunciation of each language into account, which often can be quite
different from what is written.

~~~
freehunter
Yes, saying that "thou" is a word that has survived for hundreds of centuries
is a bit odd to me. As far as I can tell, the only reason thou even exists as
a word in modern English (at least in the US) is because it's so heavily used
in the Bible (not exactly a modern book). The only times I hear someone using
thou is in context of the Bible or ye olde tyme joking.

~~~
VLM
Bible wasn't originally written in English of course. The more recent the
translation, the less Thou you get. We didn't have much alternative for 2nd
person singular pronouns until "You" wiped "Thou" off the map somewhat
recently and attempts at turning "Thou" into a formal religious version of
"You" seem to have flopped. Then again we did pretty well without a second
person plural until y'all invented y'all, plus or minus some "ye" anyway.

Theres some idioms that'll never die even if no one understands the history.
"Holier than thou". Much like "Your name is Mudd" sometimes kinda horrifies
people who don't know the story of Abe Lincoln and Dr Mudd. I've met people
who claim to not know the history of "cotton pickin hands" or even acknowledge
its racial background, which seems hard to believe, then again cotton hasn't
been hand harvested in a couple generations locally so its possible the kids
were telling the truth about not knowing what they're saying...

~~~
bcoates
Looking up "cotton pickin" in Google ngram viewer, it appears to be mostly a
product of the 1960s-1970s ruralsploitation fad: The Beverly Hillbillies, Hee
Haw and Foghorn Leghorn.

It's probably just an intentionally quaint pseudo-folk saying invented by
someone in Burbank.

~~~
coldtea
I don't think the Google ngram viewer is an appropriate tool for something
that mostly belongs to the spoken language, and even more that of pour
illiterate blacks and southerners.

------
pwg
The original publication is linked to from here:
<https://news.ycombinator.com/item?id=5667610>

------
foob
_In fact, they calculated that words uttered at least 16 times per day by an
average speaker had the greatest chance of being cognates in at least three
language families. If chance had been the explanation, some rarely used words
would have ended up on the list. But they didn’t._

I don't think that this logic is valid. It assumes that if the languages had
had no common ancestry that any random word would be similarly likely to match
between the languages. I'm no linguist but in my experience the most
fundamental and commonly used words tend to be shorter, often composed of a
single syllable. It also makes sense that the more fundamental words would be
created early on and be given simple sounds while later words would be more
complicated by necessity. If this is the case then you would expect more
random coincidences within the cores of different languages than outside of
them. This is also consistent with the fact that most of the examples in the
article are very short: thou, give, hand, bark, spit, worm. I think that this
would need to be taken into account in the statistical analysis before
attempting to draw any conclusion.

------
juhanima
I wonder what kind of metrics could be achieved simply by measuring how many
mAhs it takes from google translate to translate from one language to another?

EDIT: Put simply, converting understandable sentences from one language to
another is a lot more than just counting word frequences. Sorry having been
mentioned an actual company name there!

I just wrote a lenghty text in Finnish and had it translated by an on-line
service to English. Suprisingly, it wasn't the vocabulary but is was the
structure that mattered. The machince got 99% of the words right, but it lost
the meaning in 1/10 of the cases.

EDIT2: Not really sure who I am arguing here, on a stale thread, but I just
have to finish the thought.

So it takes a measureable amount of work to analyze a sentence in a language
into an abstract syntax tree.

An another amount of work to synthetize that back into an understandable
sentence in another language.

The average sum of that work for all given sentences is the lingual distance
of the two languages. No?

Of course all language analyzers and synthetizers are not equally good. What
caught my eye, however, was how poor English _a network translator_ was able
to produce out of my Finnish. I'm pretty sure (althought the jury is still out
on that) that the Russian it produced out of the same origin was remarkably
more understandable. The main reason being the liberal word order.

I somehow assumed the english synthetizer would have been the top of the line.
Turned out if wasn't. Or else the language model was not deep enough, which
actually is a more plausible explanation.

Ok, thanks!

------
bane
Naive review of the interactive infographic

 _thou_

Altaic, Dravidian, Indo-European and Uralic sound similar enough

I can see how Chukchi-Kamchatkan and Kartvelian might end up where they are

And then we end up with Inuit-Yupik which sounds so dissimilar, the only thing
I hear in common with the others is that it was apparently made by a human

 _to give_

okay, I can kinda see maybe where Altaic, Inuit-Yupik and Uralic ended up
where they are, but they're definitely different sounds

Dravidian and Indo-European are a bit more similar, but don't sound like they
have anything in common with the others at all

 _hand_ okay, these are sorta similar, there's a clear m-vowel-n element in
most of them but Indo-European gets all "weird"

 _bark_

Altaic and Uralic may as well be slight accents

Inuit-Yupik diverges quite a bit

and Indo-European even more.

If I squint I can kind of see some similarity with a k-vowel then some other
stuff so...okay

 _to spit_ these are the most similar to my ear, it probably doesn't hurt hat
these appear to be onomatopoeia as well. Indo-European just seems to have
reversed the bits somewhere along the way. If you squint "spit" in English
would fit right in with these.

I always come away from these kinds of linguistic analyses scratching my head.
_Especially_ once you hear the words in the different languages.

------
meric
It says Chinese wasn't included in the seven family of languages.

Yet, I, a english and cantonese speaker, couldn't help but notice these two
words which sound similar in both languages.

Mother is "ma" in English, and also "ma" in Cantonese.

Father is "pa" or "dad" in English, and "ba" or "deh" in Cantonese.

Remarkably similar, considering both languages are considered by linguists to
be practically completely unrelated.

~~~
mkl
The words are similar because these are some of the first sounds babies learn
to make.

------
ryusage
Interesting. Is there some objective way of determining if the words are
cognates or not? In particular, in the audio section, the Inuit-Yupik example
for "thou" sounds very out of place.

~~~
mzs
I would like someone to answer as well. See I only know English, Polish, and
German pretty well. So take father, ojciec, and vater. I don't see it.
Thinking about it the languages I know, I would expect something like ta da pa
(or backward) at, ad, ap. Would they say the ojci it sort of like a at sound
and close enough?

Edit: Wait a minute, though matka and ojciec are the words for mother and
father in Polish, little kids might say more mama and tata. I might have
answered my own question here, but I too noticed some in the linked web page
did not sound particularly similar to me.

~~~
andolanra
Words are cognate if they stem from a common root. How can we tell? If we have
two sets of words S and T, then we need to postulate a third set R of 'proto-
words' as well as two mappings R → S and R → T where the mappings can (within
reasonable margins of error) predict the form of the words given the roots.

For example, take German and Latin: I assert the words _Vater_ and _pater_
(both 'father') are cognate, but to do so, I need to examine a whole _set_ of
corresponding words in both languages, so I take the sets {Vater, Fisch, Fuß}
and {pater, piscis, pes}. From these, I can see clear correspondences (e.g. a
German _f_ sound always corresponds to a Latin _p_ sound[1], the Latin _sc_
seems to correspond to the German _sch_ ) and some things that are unchanged.

Next, I come up with a set of roots like { _pater,_ piš, _pes} and a set of
rules to apply like {p → f in German, š → sc in Latin, š → sch in German}, and
I can state with some confidence that these words descend from a common root.

In practice, you'd do this with _way* more than three words—preferably you'd
use most of the language—and there is some wiggle room for related meanings or
exceptional changes, e.g. the German word _alt_ and English word _old_ seem
unrelated to the Latin word _senere_ 'to be old', because they actually stem
from words for 'to grow up', and are therefore cognate to the Latin _altus_
'high, tall'. Additionally, some rules are _more plausible_ than others based
on research and phonological considerations, so the change of s → h is well-
attested in actual languages and is consistent with similarities between the
two sounds, while the change of r → b is highly unlikely.

And while there is _some_ room for error, there is a limit e.g. one might
argue that English and Japanese are cognate because in Japanese, 'to eat' is
_taberu_ and 'to see' is _miru_ , and in English we eat off a table and see in
a mirror. This doesn't get borne out with more thorough phonological rules and
doesn't seem to reflect regular semantic processes, either.

tl;dr: Two words are cognate if they come from a common root; without direct
knowledge of the parent language (e.g. Latin for Italian and French), we
postulate one and show that regular processes can derive the child languages'
vocabulary.

[1]: If you don't speak German, know that the German _v_ is pronounced like an
_f_.

~~~
mzs
That was an excellent explanation, thank you. Also regarding ojciec, I found
this: <http://en.wiktionary.org/wiki/ojciec#Etymology> The changing over time
following rules was the missing part for me.

------
sinkasapa
Another negative response from a linguist
<http://languagelog.ldc.upenn.edu/nll/?p=4612>

------
Waiting4Hellban
Why couldn't the NYT have transitioned as eloquently as the Washington Post
onto the web?

------
graycat
So, suppose we want to know what 'bark' sounded like in the common language
15,000 years ago. Call that common language X.

Well, take languages A and B now where 'bark' is shared as a cognate.

Now, start with language A now and realize that as we go back in time 15,000
years ago to language X how 'bark' was pronounced changed year by year. Then
realize that as we go forward in time from language X 15,000 years ago to
language B now, how 'bark' was pronounced has changed year by year.

Then the change in how 'bark' is pronounced in languages now is smaller than
either the change from language A now to language X 15,000 years ago or from
language B now to language X 15,000 years ago.

So, how close is how 'bark' is pronounced now to how it was pronounced 15,000
years ago? Closer than to how 'bark' is pronounced between A and B now.

So, if 'bark' is pronounced nearly the same in languages A and B now, then it
was pronounced nearly the same 15,000 years ago. Thus we have an approximation
to how 'bark' sounded 15,000 years ago.

It may be that languages A and B have a common ancestor more recently than
15,000 years ago. So, for a better approximation want to pick for languages A
and B languages for which we have some hope that their most recent common
ancestor was X 15,000 years ago.

How to do this? For candidate pairs A and B, look at geographical and genetic
distances and pick the most distant pair.

~~~
Retric
Individual words often migrate between languages even if they don't share a
common ancestor.

~~~
graycat
Cute. And there are other issues with what I wrote. For a more careful
analysis, would have to do, say, a 'probabilistic' analysis, e.g., where in
principle can walk back to the beginning and start over but actually, e.g.,
with some common assumptions, in a space of dimension 3 or greater,
probabilistically keep getting farther from the starting point.

Your point should come out in the wash in estimating where there is a common
language and how far back if use more than one word, etc.

My guess, especially from many of the other comments on this thread, is that
the basic data is so bad, noisy, etc. and pronunciation changes so large that
the math I outlined, while basically correct, would mostly give such large
distances that we wouldn't learn much.

Maybe a better solution would be just to assume that the tree of languages
would follow the tree of genetic inheritance and, then, via genetic analysis
trace that tree.

But the article wanted to ask what one of those 'common' words would 'sound'
like 15,000 years ago, so I gave a way to get an approximation. The cute part
is while 'father' in English sounds a bit different from 'vater' in German,
the common ancestor should sound closer to either 'father' or 'vater' than
those to two each other, that is, closer rather than farther away. That is,
our intuition would be that the sound from 15,000 years ago should be much
different from either the current 'father' or 'vater', but with the math model
I gave that should be wrong.

------
ttrreeww
“I was really delighted to see ‘to give’ there,” Pagel said. “Human society is
characterized by a degree of cooperation and reciprocity that you simply don’t
see in any other animal. Verbs tend to change fairly quickly, but that one
hasn’t.”

~~~
nijk
This is the sort of unscientific wishfulness that undermines the analysis.

And is directly contradicted by known animal behavior.

~~~
ttrreeww
Your way of living is not a happy one.

~~~
csallen
It's unfortunate that you believe happiness requires self-delusion.

~~~
yekko
And you just know more than the experts. Armchair expert...

