
MIT claims to have found a “language universal” that ties all languages together - jonbaer
http://arstechnica.co.uk/science/2015/08/mit-claims-to-have-found-a-language-universal-that-ties-all-languages-together/
======
azinman2
A minor feature likely based on human constraints -- hardly any implications
of a universal grammar. Even Japanese breaks this easily, which instantly
kills universal in a Chomsky sense (it has to be absolute to be something our
brains have innately). 37 languages aren't much when there are so many out
there, particularly minor ones of isolated populations with entirely different
features.

Not super exciting just yet.

~~~
canjobear
Yes, I also think dependency length minimization is just a cognitive
constraint. (Lead author here.) The idea of calling this a universal in the
Chomsky sense is all from the press. The cool thing about it is that (previous
work has shown) you can use this constraint in principle to derive a lot of
the more substantive "universals" (really, overwhelming tendencies) of
language, such as that natural language expressions are usually well-nested,
and that in languages where the verb follows the object, the noun also follows
the adjective, and the preposition follows the noun, etc.

As for isolated languages, I just finished performing some dependency length
preference experiments on indigenous people in Bolivia, but haven't analyzed
the data yet, so we'll see :)

~~~
gohrt
What have you learned about how to talk to the press?

Bad coverage like this reflects back negatively on the research and the
institution, sadly.

Uni PR offices are famously bad at twisting and over-selling research results.

~~~
canjobear
I think I've learned there's not much you can do.

My advisor and I talked to two reporters: one from MIT News
([https://newsoffice.mit.edu/2015/how-language-gives-your-
brai...](https://newsoffice.mit.edu/2015/how-language-gives-your-brain-
break-0803)) and one from Science Magazine
([http://news.sciencemag.org/social-sciences/2015/08/all-
langu...](http://news.sciencemag.org/social-sciences/2015/08/all-languages-
have-evolved-have-common)). They both communicated with us about all kinds of
details in the articles, and let us comment on the drafts. We clearly stated
what we did and didn't want to claim, and they did a good job conveying what
we wanted while adding extra connections we hadn't thought of for popular
appeal.

On the other hand, we had no contact with anyone about the Ars Technica
article. I've also seen some other articles cropping up that are copying the
original articles, and making claims I wouldn't stand by. I don't think
there's anything we can do about that.

~~~
RobertoG
Have you considered demanding a correction?

They are putting words in your mouth. Somebody should keep journalist
accountable.

A few days ago I tried to follow the source of an article in an online
newspaper. The source was another online newspaper , in a different language,
and the source for this one another one. Along the way things were added and
removed, just like in a crazy telephone game.

It makes you think about the news we read and take for granted.

~~~
azinman2
"Scientists notice small trend that may or may not be of importance to
language" just isn't a very catchy headline.

------
powera
Since they only investigated 37 languages, isn't this only really evidence
that they found a feature that ties 98% (~= (38/39) of languages together?
Languages are known for having a few outliers with completely crazy rules.

Also, if two sentences are considered together, the average DLM would be
significantly lower for those sentences than for one random sentence of the
same length. So I'm not sure what this theory implies other than "the
definition of a sentence can be vague".

~~~
protomyth
Does anyone have a list of the 37 languages used?

~~~
krasin
It's on the page 3 in the paper:

Ancient Greek, Arabic, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian,
Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hebrew,
Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latin, Modern
Greek, Persian, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish,
Swedish, Tamil, Telugu, Turkish

~~~
tlb
70% (26/37) of those are Indo-European (I think). No Amerindian, Polynesian,
or Austronesian languages represented.

~~~
canjobear
I'd love to have more interesting languages in the sample. If you speak such a
language, and you are willing to parse ~2000 sentences for me, get in touch :)

~~~
protomyth
Did you approach any of the Native Americans tribes? The Navajo in particular
are very up on promoting their language (they even translated Star Wars).

------
compbio
Working with word co-occurence clustering based on Kolmogorov complexity I
have become convinced that there is a computational complexity element to
every language.

Words like "circumvent" and "environment" are close in regards to complexity.
Words like "us" and "me" are close in regards to complexity.

The counting argument tells us that most strings are not compressible. It is
then a wonderful feature of sensor data, natural language, DNA and computer
code that it can be compressed quite a bit. This means there is a certain
order in the language that compressors can use to keep the file size smaller.

There is a cognitive economy trade-off between the energy needed to keep a
system running and increased complexity. Less complex language helps us save
energy. We use short words for concepts that we use often. Very complex
concepts and words like "disambiguation" may be described with shorter simpler
words to someone who has not stored that word and general accepted meaning
yet.

In this complexity view languages evolve to use as little energy/computational
complexity to convey as much information as possible. The results found in
this article can also be explained using this view. Parsing a sentence like
"Throw the trash out" requires you to store in working memory the word "throw"
'till you get to the word "out" for the full concept "to throw out". Until you
get to the word "out", the "throw" remains in a superstate (could become
"throw in", "throw on" etc.). You need both words to form a mental picture of
someone throwing out the trash. This requires more computational energy to the
listener, and is hence ineffective. If you want your message to be heard, you
have to communicate in clear simple-energy sentences. So using simpler less
computationally intensive sentences benefits both the speaker and the
listener.

This would readily explain why natural languages beat the random benchmark.
Randomness has far less structure to use for compression by an intelligent
agent. Randomness is not optimized communication, since it is more
unpredictable.

In short: Simplicity and conveying information with little energy is a fitness
factor that natural selection optimizes for. This is universal to all natural
language speaking agents with a limited energy budget.

~~~
rndn
It’s probably more memetics (constrained by cognitive load) than genetics that
select the features of languages, and there are likely many other factors that
determine the complexity of languages and grammatical structures: For example,
there is a sweet spot between minimal symbol count and minimal lengths of the
words. In the extremes you have either short words but many symbols, or few
symbols but long words. At the same time the number of distinguishable
phonemes and therefore symbols are, of course, restricted by the sounds that
are producible by the average vocal tract.

Secondly, the communication channels thought→vocalization→hearing→thought or
even thought→typing→reading→thought are inherently very noisy, so you end up
with a lot of redundancy, like particles, introduction and transition phrases.

And lastly, I think that there are always some words and phases that are not
shaped by efficiency/cognitive load, but rather by whether it is fun or
fashionable to talk in a certain way. There is certainly some cultural
variance that can be orthogonal to efficiency.

~~~
compbio
Very thought provoking answer! Thanks.

As memes house in agents with an energy budget, I think that shorter simpler
memes have more chance to take hold and reproduce ("Make something people
want").

Words in a sentence are like models in an ensemble. Simple words are more
general and have a high bias and low variance ("Make stuff users want").
Highly complex words and sentences have a lower bias, but a higher variance.
You need to average a lot of them to get a clear picture. That's why the
sentences in scientific papers are usually so long, they need to gradually
cancel out the noise.

> There is certainly some cultural variance that can be orthogonal to
> efficiency.

Yes, agreed! Though same with memes, certain words or symbols without any
redundancy may have cultural value. You may gain energy by speaking a certain
language to a certain degree of sophistication. You may have to invest energy
to gain access to the information contained in symbols (or have agents "unzip"
these for themselves).

------
akud
DLM minimization doesn't seem to be a generative feature of language, just a
constraint. Languages have to be understood and used by humans, and all this
paper seems to show is that humans have limitations on working memory.
Anything, language or otherwise, that expects to be used by humans would
minimize the load on working memory.

~~~
kuschku
Especially because they admit themselves that two of the languages in their
set don’t even minimize.

Even with short sentences, German and Japanese often have large DLM values.

With more complex, nested sentences (which are really common in Germany), this
becomes more of an issue, because between two connected words you can have 7
subclauses.

(Seriously, read Karl Marx’ Das Kapital, or Günther Grass’ Der Krebsgang, or
read any other German author.)

~~~
canjobear
All of the languages are "minimized" in that dependency length is below the
random baseline.

German, Japanese, etc. are just much less minimized than other languages like
English and Indonesian. Working out why is the next step for us. I don't think
it's because these languages are inherently harder to understand. They just
represent different solutions to the communication problem.

~~~
compbio
Thank you! Very interesting research all around.

I think these old German and Japanese languages may have been hard to
understand for outsiders, but were used with high sophistication (you have to
invest energy to access this information) among insiders. For instance the
Japanese pillow words / Makurakotoba or German words for hard to translate
concepts like "Weltschmerz", "Kummerspeck" and "Torschlusspanik". All short,
useful words for communicating complex rich concepts, provided the agent knows
the meaning of these words.

~~~
conceit
weltschmerz - pity, sorrow

Torschusspanik (sic!) - fear of quick commitment

------
Myrmornis
It's not really possible to discuss this article if we can't access it.

    
    
      This item requires a subscription to Proceedings of the
      National Academy of Sciences.
    

This is really not good!

[http://www.pnas.org/content/early/2015/07/28/1502134112.full...](http://www.pnas.org/content/early/2015/07/28/1502134112.full.pdf)

~~~
canjobear
Here you go:
[http://web.mit.edu/futrell/www/papers/futrell2015largescale....](http://web.mit.edu/futrell/www/papers/futrell2015largescale.pdf)

------
matt_morgan
Cool. What is the measure of word dependency? I.e., I'll agree that in

John threw the trash out

"threw" and "out" are dependent on one another. But is that an either/or, or
are there degrees? It seems like "threw" and "trash" are also "dependent" in
that they don't make independent sense.

~~~
canjobear
Specifically, we used hand-parsed corpora developed by Google and a whole
bunch of computational linguists over the last 15 years.

The dependency representation is of course an incomplete picture of how words
hang together in a sentence. But it's the only format that's flexible enough
that you could dream of parsing 37 languages to (approximately) the same
standard.

~~~
dreamling
All your replies in this thread are more clear/interesting than the original
linked article.

Very cool stuff.

------
mring33621
Some combination of Enochian and LISP, no doubt. No, I didn't read the
article.

~~~
nerd_stuff
I read it hoping to make a Lisp/Esperanto joke but alas, they only found one
universal aspect of all languages, they did not find a universal language. The
universal tendency is to bundle words together:

> You can see this effect by deciding which of these two sentences is easier
> to understand: “John threw out the old trash sitting in the kitchen,” or
> “John threw the old trash sitting in the kitchen out.”

~~~
Someone
If it's John doing the sitting, I would expect an extra comma before sitting,
but for me, the first is a bit ambiguous: is the trash doing the sitting, or
John?

That may be because keeping 'threw' and 'out' together in that way in Dutch
feels wrong, or at least really, really awkward.

~~~
ghaff
"John threw out the old trash sitting in the kitchen" is a bit of an idiomatic
sentence in that understanding its intended meaning depends on knowing that
throwing out the trash while sitting down doesn't make a lot of physical
sense.

Writing somewhat more formally the sentence would be something like "John
threw out the old trash _that was_ sitting in the kitchen." (Although "threw
out" itself is somewhat informal language. "John disposed of" or something
along those lines would probably be used in a more formal context.)

------
eternalban
paper:
[http://web.mit.edu/futrell/www/papers/futrell2015largescale....](http://web.mit.edu/futrell/www/papers/futrell2015largescale.pdf)

------
hyperion2010
The title is significantly overstated. DLM isn't a language, it is a rule that
many languages seem to follow and says nothing about Chompsky.

~~~
prestonbriggs
It's "Chomsky" of course, but I _really_ like "Chompsky"!

~~~
ics
Well there's also this easter egg:

[http://left4dead.wikia.com/wiki/Gnome_Chompski](http://left4dead.wikia.com/wiki/Gnome_Chompski)

~~~
Someone
That may be inspired by
[https://en.m.wikipedia.org/wiki/Nim_Chimpsky](https://en.m.wikipedia.org/wiki/Nim_Chimpsky)

------
hownottowrite
From 2012: Daniel Everett: "There is no such thing as universal grammar"
[http://www.theguardian.com/technology/2012/mar/25/daniel-
eve...](http://www.theguardian.com/technology/2012/mar/25/daniel-everett-
human-language-piraha)

------
gioele
> Languages like German and Japanese have markings on nouns that convey the
> role each noun plays within the sentence, allowing them to have freer word
> order than English.

Since when has German a freer word order than English?

German has precise and strict rules about the placement of

1\. normal verbs 2\. verbs used in conjunction with modal verbs 3\.
conjunctions 4\. particles in separable verbs 5\. stressed parts of the
sentence

And we are not talking about rules followed only by prescriptivist
grammarians, but very common rules used in everyday conversations.

The article (the PR article, not the academic paper that I haven't read) looks
like a poorly researched piece.

------
waroc
Delving in the specifics of individual human languages would be a colossal
waste of productive time to otherwise pursue "harder" science and technology.

The only possible "language universal" will be machine language when humans
merge with machines.

The physical makeup of the biological brain which is subject to random
biochemical reactions just can't maintain something as consistent as how a
"language universal" should be.

------
PeterWhittaker
Related:
[https://news.ycombinator.com/item?id=10002595](https://news.ycombinator.com/item?id=10002595)

------
akyu
This seems to agree quite nicely with Jeff Hawkins theories. Mainly that our
brains are primarily doing temporal pattern recognition and language is no
exception according to Hawkins. By minimizing the temporal gap between related
items, you minimize the size of the patterns needed to convey an idea.

------
gohrt
Can mods change the poorly written misinformation-plagued blog article link to
the source paper:

[http://www.pnas.org/content/early/2015/07/28/1502134112](http://www.pnas.org/content/early/2015/07/28/1502134112)

------
crusso
Isn't the interesting part of "language universal" some kind of commonality
that is directly the byproduct of common human biology?

Closeness of concepts in words would seem to be a natural commonality because
of efficiency of communications between any two entities in general.

------
hell0
Language wants to be a picture, even in the case of Japanese and German,
grammar aside. 'Threw out' forms a nicer picture-action. There might be some
baseline for cognition for which languages serves as _cough_ 'higher level'
interpreter.

------
nerd_stuff
By the way if you ever want a great set of tools to play with language check
out Python's Natural Language Toolkit:
[http://www.nltk.org/](http://www.nltk.org/)

~~~
andreasvc
NLTK is nice but if you want to work with nontrivial amounts of data it's
better to turn to Stanford's NLP tools, or Spacy.

------
bknives
Isn't the universal language love?

------
bluesmoon
yeah yeah, wake me up when they find something that ties all JavaScript
implementations together :P

------
oneJob
really surprised this isn't getting more action on HN. the implications for CS
are immense.

~~~
andolanra
No, the implications are actually quite small. This article is a very bad pop-
science summary of relatively banal linguistic research.

Linguists have studied _linguistic universals_ for a long time, which are
properties that all human languages have. For example, one could try to
imagine (in the style of Borges) a language which had no nouns, and in which
all sentences are formed of relationships between verbs—but no natural
language has this feature: _all_ natural languages have nouns and verbs.

There are also _implicational universals_ , which are of the form, _if [some
language] has property X, then it will also have property Y_ , and
_tendencies_ , which are broad driving trends that might have individual
exceptions. An example of the latter is that languages that place the verb at
the end of the sentence _usually_ have postpositions rather than prepositions,
but this has exceptions (e.g., Latin.)

What's being studied here is a tendency in sentence structure: languages
_usually_ structure their syntax such that they can minimize the _dependency
length_ , or the distance between syntactically releated words in a sentence.
This has long been hypothesized, but this paper gives evidence for it in the
form of a large cross-language survey. Which is cool! But by no means does it
have major implications for CS in any way. (At least, no more than any of the
copious previous research on linguistic universals.)

EDIT: I should also add that this area of research is _not new_. In fact,
linguist Joseph Greenberg published an article called 'Some universals of
grammar with particular reference to the order of meaningful elements' in
1963. This is continuing research and, while good research, not particularly
groundbreaking or pioneering.

~~~
AnimalMuppet
I think it has implications for language design. Some syntaxes are going to be
better than others, based on this.

It also means that, for a function that takes several parameters, some
parameter orders are better than others.

~~~
canjobear
Yes, specifically you should order arguments so the order will be on average
from short to long.

I try to write my code this way. map, filter, and reduce are terrible from
this perspective. Unless you have do blocks like in Ruby or Julia!

Also, dplyr's %>% pipe operator is a great way to reduce dependency length in
R code.

~~~
jonahx
Thanks for all your answers here! Could you elaborate on the above? Why does
it imply short to long orderings are better? Also, I don't follow your comment
about map/filter/reduce being bad except unless you have ruby-esque do blocks.
Are you referring to something like map(<big function>, array)?

~~~
canjobear
Yeah, map(<big function>, array) creates a dependency that exists from when
you read "map" to when you read the name of the array, potentially spanning a
very long function. But if you have map(array) do <big function>, then you
only have a dependency from "map" to the beginning of the function.

In general, if you have a function call f(a, ..., y, z), when you parse that
(mentally, or in a shift-reduce parser) you have to keep the function name f
in memory all the way to z. So you want to make a, ..., y as short as
possible.

Similarly, dependency length minimization predicts that in English people will
want to order expressions from short to long after a verb or preposition.
There is a lot of evidence for this preference; it's been documented since the
1930s.

If there were a programming language where the function name came after the
arguments, like (a, b)f, then the best order would be long-to-short.

Similarly, the DLM prediction for verb-final languages like Japanese is that
people will prefer long-to-short orders. It appears that this preference does
exist, but it is much weaker than the short-to-long preference among speakers
of English-like languages.

------
jlukic
The day... has finally come.

------
olalonde
Lisp?

------
th0waway
But it uses loads and loads of parenthesis, so no one will ever use it...

