
A Simple Structure Unites All Human Languages - dnetesn
http://nautil.us/issue/76/language/this-simple-structure-unites-all-human-languages
======
aasasd
Steven Pinker's book ‘The Language Instinct’ talks about pretty much
everything a layman would want to know about the basics of language, in
layman's terms. It covers syntax, Chomskian grammars, prescriptive vs
descriptive grammar, history and relations of languages, Sapir-Whorf
hypothesis, biological and evolutionary basis for languages, language
acquisition in children, differences between human languages and animals'
attempts (spoiler: Pinker doesn't give any credit to claims that animals learn
language). Big plus of the book is that it uses Chomsky's theories extensively
but explains them without need for specialized knowledge (it's explicitly
mentioned how Chomsky is unreadable without linguistic training). All in all,
much recommended. The book even works well as an audiobook.

Also: in preparation to diving in Kafka's books I learned about a peculiar
feature of his style:

> _Kafka often made extensive use of a characteristic particular to the German
> language which permits long sentences that sometimes can span an entire
> page. Kafka 's sentences then deliver an unexpected impact just before the
> full stop—this being the finalizing meaning and focus. This is due to the
> construction of subordinate clauses in German which require that the verb be
> positioned at the end of the sentence._

>> _“Als Gregor Samsa eines Morgens aus unruhigen Träumen erwachte, fand er
sich in seinem Bett zu einem ungeheuren Ungeziefer verwandelt.” (original)_

> _“As Gregor Samsa one morning from restless dreams awoke, found he himself
> in his bed into an enormous vermin transformed.”_

There's a neat picture illustrating the difference in the order of the parse
tree:
[https://en.m.wikipedia.org/wiki/Franz_Kafka_bibliography#Eng...](https://en.m.wikipedia.org/wiki/Franz_Kafka_bibliography#English_translations)

~~~
matt-snider
Thanks for the book recommendation and the link.

From the Wikipedia article:

> German also lacks an informal language register

Can someone provide more insight into what this refers to? There are
definitely less formal or technical sounding word variants in German, and of
course duzen/siezen to add another level of formality, so I'm not sure what
this could refer to.

~~~
aasasd
The other comment is not quite right, a ‘language register’ is not a dialect,
even though apparently the whole classification is difficult and imprecise due
to the nature of languages as a continuum. A ‘language register’ means a
variant, choice of words, that are used in specific situations or settings:
[https://en.wikipedia.org/wiki/Register_(sociolinguistics)](https://en.wikipedia.org/wiki/Register_\(sociolinguistics\))

So, afaiu an ‘informal register’ would be something like brospeak or language
spoken at home and among friends, contrasted to that spoken with strangers and
at work. But I don't know what the situation is in German. With English and
Russian, every generation and each subculture invents their own slang just to
differentiate themselves―can't imagine how any country would avoid developing
informal language, considering the existence of Oktoberfest.

------
schoen
I know that Chomsky knows about inflectional morphology and so I'm sure that
his theory does try to account for it, but I was frustrated that all of the
examples here were only about word order. The author said

> Word order is, of course, is far more complex than I’ve shown here. There
> are languages with very free word order, and even within languages there are
> many intriguing complexities. However, this idea, that Merge can both
> combine bits of language, and reuse them, gives us a unified understanding
> of how the grammar of human languages works.

But none of the examples in this simplified account even gestured at noun
case, or at the prospect of expressing subject (or agent) with verb
conjugation, or at feature agreement.

Is there a straightforward way to understand why Chomsky thinks that this
approach addresses those phenomena?

~~~
eindiran
Many linguists believe the same set of principles act on sub-word units, a
theory called Distributed Morphology or DM[0]

"'Syntactic Hierarchical Structure All the Way Down' entails that elements
within syntax and within morphology enter into the same types of constituent
structures (such as can be diagrammed through binary branching trees). DM is
piece-based in the sense that the elements of both syntax and of morphology
are understood as discrete instead of as (the results of) morphophonological
processes."[1]

So a rather simplified way to think about it is that each word is a little
mini-tree and Merge operations create a branching structure between its
constituent parts. Each of those words is then part of the larger sentence-
level tree. The important this is that Merge is acting on the units at both
levels.

Similarly, Merge can act on structures that are the output of previous Merges,
allowing you to have a verb, in a particular conjugation (it's sub-word
structure), that selects for a particular type of supra-word structure (a
tree) that's headed by a noun with some set of features, eg a particular case.

Another point that is kind of alluded to in the article is that you can create
movement from one part of a tree to another with Merge. In previous theories
of syntax, there always needed to be both something like Merge and a special
"move" operation. But Merge simplifies things quite a bit in that regard.

[0][https://en.wikipedia.org/wiki/Distributed_morphology](https://en.wikipedia.org/wiki/Distributed_morphology)

[1]
[https://www.ling.upenn.edu/~rnoyer/dm/#how%20DM%20is%20diffe...](https://www.ling.upenn.edu/~rnoyer/dm/#how%20DM%20is%20different)

~~~
schoen
Whoa, that's amazing!

Can you explain a little more about how something like an agreement rule would
be analyzed in this framework?

I suppose I didn't quite understand your "that selects for a particular type
of super-word structure (a tree) that's headed by a noun with some set of
features". Is this sort of akin to a type system in programming? Like the verb
is only willing to bind with a subject noun phrase whose head has a particular
feature?

~~~
eindiran
In this framework, you can think about a morpheme as being a tuple of
features. Like you said, it is sort of akin to a type system, where passing
the wrong type to a function won't work. A morpheme will select for a feature
or set of features from whatever its merged with, and won't merge with
something it doesn't agree with.

I think using a language which doesn't care much about word order will be
illustrative here, so let's use Latin:

puella vidit canem

The girl sees the dog

We can break this down into:

puell - a vid - et can - em

girl - NOM.S.FEM sees - S.3 dog - ACC.S.FEM

So 'puella' is the tuple of features [+NOM, +S, +FEM], 'videt' is [+PRES,
+ACTIVE, +S, +3], etc.

Here, we want to do a Merge with 'puella' and 'videt': we say that 'puella'
selects for the features +NOM, +3, +S (nominative, third person, and singular)
in its verb, but doesn't care about the others. It can still agree with its
verb if the verb is passive or in the past tense. But if a verb is conjugated
in a way that violates the features it selects for (eg the verb is conjugated
as first person plural), 'puella' won't merge with it.

As you said, a phrase level structure will have the features of its
constituent parts bubble up to it. So once we've done the first Merge with
'puella' and 'videt', our structure is now selecting for a noun phrase that
has the feature +ACC. Because 'canem' meets this requirement, we can get the
final Merge necessary for our finished sentence.

{ { puella, videt }, canem }

Note that this account still works if we change the order of the sentence to
any configuration, we just need to reorder the merges.

~~~
jaclaz
Yep, but, as soon as you exit very simple phrases subject/action/object, the
approach may become complex to apply in practice, a couple (known) latin
(tricky) examples (JFYI):

mala mala mala sunt bona

Soli soli soli

~~~
eindiran
Unfortunately that's true, which is why linguistics is a field of study with
its own journals, rather than something that can be summarized neatly in the
space of an HN comment :P

This model really can account for quite complex language data though. For
example, check out this account of auxiliary verbs in Basque:
[https://www.academia.edu/3112898/A_Distributed_Morphology_An...](https://www.academia.edu/3112898/A_Distributed_Morphology_Analysis_of_Present_Tense_Auxiliaries_in_Zamudio_Basque)

Speaking to your examples: "mala mala mala sunt bona" isn't particularly
difficult to analyze this way, you just need to realize that the "mala"s are
different words (kind of like the famous English "Buffalo buffalo Buffalo
buffalo buffalo buffalo Buffalo buffalo" sentence). If I remember the proverb
correctly, it means "apples (mala) are good (sunt bona) for a painful jaw
(mala mala)".

You need an analysis that allows adjectives to Merge with nouns iff they match
case, gender and number, so that allows us to create a noun phrase "mala mala"
in the instrumental ablative. Then you need a way to have the case, gender and
number of subject bubble to the top of the phrase it will make with an
auxiliary so that the adjective after the auxiliary is feature restricted to
that case, gender and number. Once the elements of the auxiliary verb phrase
have Merged, you get:

{{ mala, sunt }, bona }

Finally you have a rule that allows auxiliary verb phrases to Merge with noun
phrases headed by an ablative. If you want the first "mala" to be the subject,
then re-Merge it with the whole sentence so far, which in effect moves it to
the top of the tree, leaving a trace in its original position.

I'm not sure what the second example means. My best guess is that it's the
dative singular of 'sol', a matching masculine dative singular of 'solus' and
a genitive singular of 'solum', so something like "for the only sun of the
land". If that's correct, you need our previously used rule for Merging
adjectives iff they match the noun in case, gender and number. Then you can
add an additional rule that genitive nouns can be Merged with noun phrases
(without any feature selection needing to take place) to form a new noun
phrase.

Hopefully that shows that Merge and feature selection as mechanisms can be
used outside of toy models, to actually account for real data.

~~~
jaclaz
Your translations/guesses are correct, "mala mala mala sunt bona" is afaik an
invented phrase, not entirely unlike "I Vitelli dei romani sono belli" (which
is bilingual Latin/Italian, meaning in italian "The calves of the Romans are
beautiful" but meaning in latin "Go, Vitellio at the sound of the Roman war
god") to trick/have some fun of Latin students, while "Soli soli soli" was a
phrase sometimes inscribed on sundials.

Anyway, yes, the Merge and feature can work just fine outside of "toy models"
the note was about about they soon becoming complex.

------
macleginn
The merge theory is ugly and opportunistic. Chomskian linguistics began as an
attempt at using formal-language machinery (context-free phrase grammars
mostly) for describing natural languages. It kinda worked with some
modifications in the form of Government & Binding theory, but Chomsky wanted
it to be not only a decent formalism, but also a valid theory of language in
the mind. Therefore, in the 1990s he said: wait, let's get rid of all the
complex stuff (deep structure, surface structure, layers of derivation) and
restate everything in terms of one and a half operations (merge and move; move
is a kind of merge, therefore 1.5). It _looked_ like it greatly simplified the
analysis of languages, but then it turned out that to actually build syntactic
parse trees for even quite simple sentences and somehow explain grammatical
case assignment, for instance, one needs to augment the basic system with all
kinds of hideous bells and whistles (arbitrary "feature checking", literally
dozens of "functional projections", weird sequences of merge & move
operations, etc.). Contemporary research in computational syntax truly and
firmly abandoned contemporary Chomskian linguistics (aka the minimalist
program) and is much closer to more traditional types of formal syntax
theorising such as HPSG, simply because minimalism is an incoherent
unformalisable incomprehensible mess. The title of this article is just a
hoax.

A good overview: [http://langsci-press.org/catalog/book/255](http://langsci-
press.org/catalog/book/255)

Edit: grammar

------
visarga
> Other animals, even extremely intelligent close evolutionary relatives like
> bonobos and chimpanzees, treat sequences of words as sequences, not as
> hierarchies. The same is true for modern artificial intelligences based on
> deep learning.

That's only true when you think about LSTMs, but stacked CNNs, tree-LSTMs,
graph neural net pooling and attention layers can do hierarchical aggregation.
Hierarchical representations have been at the centre of many papers. There's
even hierarchical reinforcement learning for describing complex actions as
composed of simpler actions.

And trees are not good enough to represent language. Graphs would fit better
because some leaf nodes in the tree resolve or refer to nodes on other
branches (e.g. when you say He referring to the word John present in another
place in the same text).

[http://www.arxiv-sanity.com/search?q=hierarchical](http://www.arxiv-
sanity.com/search?q=hierarchical)

------
davesque
Kind of light, but it makes me think of the methodology I adopted for learning
Japanese, which was to basically ignore the ordering of things. Don't get me
wrong; I speak the words in the right order. It's just that I didn't _try_ to
memorize the ordering of things while learning the language. Because, as I
think the article points out, your brain is pretty good at figuring that out
without trying.

The most helpful thing I think you can do while studying language, other than
placing yourself in real world scenarios where you use it, is to just read (or
make up) example dialogues. Words by themselves aren't that helpful because
they're often out of context. Sentences are better, but even they benefit from
being embedded in a larger structure such as a dialogue or paragraph. I guess
the point is, the more context, the better. Or, as the article would say, the
more merging the better.

------
ajuc
Example from the article in Polish:

to drink wine = "pić wino"

I drink wine = "piję wino" or "ja piję wino" or "piję ja wino" or "wino piję
ja" or "wino ja piję" or "ja wino piję"

boy caught fish = "chłopiec złapał rybę" or "rybę złapał chłopiec" or ...

There are many languages in which word order doesn't matter, it's the changes
to the words that encode role of the word in the sentence, and you can divide
the sentence into many alternative hierarchies (and certainly not all of them
are strictly binary - some parts of the sentence are trinary or even more
complicated, imposing binary structure on them is artifical and misleading).

I think this theory is very lacking in predictive power, says very little
about supposed "universal" language and still isn't really universal as there
are all sorts of exceptions.

Natural languages doesn't follow formal grammars strictly, especially not such
a simplistic one.

------
foxes
For those wanting these ideas described in a more mathematical "language"
[0-1] are a good start. If you are interested in studying objects and how they
compose, you are probably interested in category theory.

[0]
[https://golem.ph.utexas.edu/category/2018/02/linguistics_usi...](https://golem.ph.utexas.edu/category/2018/02/linguistics_using_category_the.html)
[https://arxiv.org/pdf/1809.05923.pdf](https://arxiv.org/pdf/1809.05923.pdf)

[1]
[https://arxiv.org/pdf/1809.05923.pdf](https://arxiv.org/pdf/1809.05923.pdf)

~~~
wadkar
Wow! Thanks a lot. This is a very fascinating application of category theory.

------
ta1234567890
This article only talks about the structure of writing in languages, which is
a pretty different thing than "all human languages".

The definition of language is (according to Google), human communication.

For there to be communication, you need a recipient. Which means that mere
written words have no meaning without someone reading and interpreting them.

You can analyze syntax and structure all you want, but meaning depends on
people's interpretations, which are subjective and depend on multiple other
factors.

For instance, if I'm angry, I'll read a text message or an email and interpret
it in a completely different way than if I'm calm. The meaning I interpret
will also depend on who sent me the message. Neither of these things are
captured in the syntax of the messages.

~~~
philippoi
That's a puzzling take. I see where you went with it, but I'm inclined to
think of writing as already including the recipient. Such as when I write
notes for myself. Often, I'm keen to do this in instances where I find a
thought worthwhile but also quite subtle or nuanced and therefore forgettable.
Sometimes, when I reread these notes quite a while later, I'm surprised at
what was occurring to me at the time and happy, sad, amused, or even befuddled
at what was going on in my head at that moment. My rereadings are also colored
by the context shifts you mention. But in those instances where I didn't
succeed in conveying my meaning in a way I can reconnect with later, I don't
think I'd consider that note not language, but rather a use of language that
failed. It's still language. Even if its communication value is less than
desired, it's a message built with the same tools as the messages that do work
and pass your communication test. The fact that the intended future reader is
there in the mind of the writer makes me think of it as language even before
it's read, if it ever is. Otherwise, what is a written message that isn't
read? It exists, but as what? How would you characterize it?

~~~
ta1234567890
Thank you for your very thoughtful and insightful reply.

> what is a written message that isn't read?

Imagine an ancient civilization that left written symbols, but the people are
long gone, there's no one that knows how to interpret them anymore.

During the time the symbols were not being seen or interpreted by anyone, what
would you call them?

But more important than that, is not that the symbols can't mean anything,
rather that the meaning will be assigned by the reader when they read it (not
just by the syntax of the symbols, which is what the article seemed to imply).
And that meaning can be very very different than what the writer intended it
to be.

What I'm basically saying is that meaning/interpretation of
communication/messages is fluid/dynamic. It depends on the writer, the
symbols, the reader and a lot of context. It is not fully contained or
captured just by the symbols in which we express it.

Using your comment as an example, your "rereadings are also colored by the
contexts shifts".

------
ecdavis
An interesting article on a possible counter-example:
[https://www.newyorker.com/magazine/2007/04/16/the-
interprete...](https://www.newyorker.com/magazine/2007/04/16/the-
interpreter-2)

~~~
eindiran
This is a widely cited counter-example, but I think most linguists think the
case is closed regarding whether this is actually a counter-example to the
Chomskyian program. Andrew Nevins and David Pesetsky have an extremely
convincing rebuttal of Everett's claim.

[http://semantics.uchicago.edu/kennedy/classes/s07/myths/nevi...](http://semantics.uchicago.edu/kennedy/classes/s07/myths/nevinsEtAl07.pdf)

[https://www.academia.edu/3112859/Evidence_and_argumentation_...](https://www.academia.edu/3112859/Evidence_and_argumentation_A_reply_to_Everett_2009_)

------
etiam
Don't miss the best part, in Randy Morris' comment:

 _Question I the depth this analysis of. Paring rule variability or recursive
process, or randomized association efficiency? Arbitrary hierarchy inherent
world model categories captures actor, act, actee of. New Guinea highlands
reference I languages of number large (day before yesterday), exploit
possibilities almost all categorical where._

------
kazinator
Merge is cons!

~~~
eindiran
This is exactly correct! And the head is equivalent to car, the complement is
equivalent to cdr, and the specifier is equivalent to cddr.

~~~
YeGoblynQueenne
But cons operates sequentially. I thought the point about merge is that it's
not sequential?

~~~
eindiran
I'm not sure what you mean about Merge being non-sequential. All Merge does is
take two elements α and β, then return a binary branching structure with α as
the head, like so:

{α, β}

Both α and β can be either some atomic unit, eg a word or a morpheme, or the
previous output of Merge (a tree).

This is their first example as rendered as cons:

(cons I (cons drink wine))

As merge:

{ I, { drink, wine } }

As a tree:

 _Edit: I can 't get the tree to look even remotely correct in HN formatting,
but you get the idea. It's in the article._

~~~
YeGoblynQueenne
It's a thread running through the article that the big thing about merge is
that it is hierarchical and not sequential. e.g., see the following passage
from the article:

>> [Merge] applies to discrete units of language (words or their parts). It
combines these, not sequentially, but hierarchically.

Or the paragraph at the start where human language abilities are contrasted to
bonobos' and chimpanzees' and deep learning models' sequential processing;
etc.

My knowledge of Lisp is rusty, but as far as I remember it cons is a list
operator that joins the head to the tail of a list (like the "|" in Prolog).
So it imposes an order - on a sequence. Apologies if I misremember this.

~~~
eindiran
Ah I see what you mean. I think the author of that article is making the point
that Merge is about creating trees, not appending words together in a more
traditional sense. But S-expressions can be used to express trees, so when
you're operating on trees notated as S-expressions, adding a parent node to
two expressions looks like you're concatenating them.

The author is just trying to say that Merge is doing the same thing we're
talking about cons doing to S-expressions to a tree and noting that it creates
hierarchy. Eg a new Merge says I've created a new top-level node that is a
parent for the two inputs. A second Merge says I've made the first input
c-command both inputs of my first Merge and created a new top-level node.

~~~
YeGoblynQueenne
But, if I don't misremember this also, cons creates an ordered pair, no?

------
zaptheimpaler
If you assume that all meanings can be represented by a parse tree, then the
claim here is straightforward - "Merge" is a constructor for a binary tree
node, and the claim is that Merge can construct any binary tree. I think the
trickier question is whether parse trees really do capture everything about a
sentence .. my guess is not.

~~~
reuben364
Well, a binary tree with unordered children.

------
tgv
How was "Merge" proposed in the early 1990s? Hierarchical structures and
compositional semantics predate that by decades.

If the proposition was Merge as a "universal" operation, then I'd say there is
no evidence that our brains implement such an operation, and that it has a
very shallow stack, which is domain dependent. That makes such an operation a
meaningless abstraction, not suitable for explaining anything about human
behavior whatsoever.

------
ummonk
Open-ended recursion is not quite universal. Piraha lacks arbitrary causal
embedding.

