Hacker News new | past | comments | ask | show | jobs | submit login
Talking is throwing fictional worlds at one another (nautil.us)
69 points by dnetesn 11 days ago | hide | past | favorite | 42 comments

I feel like the Piraha stuff comes up every so often, and I feel obligated each time to link to the famous Nevins and Pesetsky rebuttal: https://dspace.mit.edu/bitstream/handle/1721.1/94631/Nevins-...

Beyond the disagreement about what recursion is (as used to define the faculty of language narrow (FLN), from the Chomsky, Hauser, Fitch paper mentioned in the interview), there are significant problems with Everett's argument in general which are covered quite well in the rebuttal.

It doesn't inspire confidence that the German examples in that response are all wrong.

But I can recommend Everett's book "Don't sleep, there are snakes", which is only partly about language.

I'm curious (having some basic German) what's wrong with the German examples? Isn't most of the German taken from the article he is responding to?

edit - actually I think I see it - he's not inflecting the articles correctly after "von"?

The claim is that "the English possessive is potentially recursive, while the Saxon genitive [its counterpart in German] is not". I think they are confusing some issues. And by the way, I thought Saxon genitive was a term from English grammar.

Anyhow, in German, you can use the Saxon genitive in preposition, if the word in Genitive can be used like a personal name: "ich sehe Karls Auto". And it is true that you cannot recurse on that: "ich sehe Karls Bruders Auto" is not German, except maybe in German poems two hundred years ago, when the language was raped. Even "ich sehe meines Vaters Auto" will sound outside of the norm.

However, you can recurse in postposition: "ich sehe das Auto meines Vaters", "ich sehe das Auto des Bruders meines Vaters". You do not have to say "ich sehe das Auto vom Bruder von meinem Vater".

Several edits, sorry.

Would argue that "ich sehe meines Vaters Auto" is valid, but only for lyrics and/or old German.

In regards to this discussion, there is this one tweet summarizing the main back and forth:


The note about the Piraha language not having embedding is incredibly interesting. Example taken from Wikipedia:

>Everett stated that Pirahã cannot say "John's brother's house" but must say, "John has a brother. This brother has a house." in two separate sentences.

Compounding this on the apparent lack of numbers/counting, expressing higher order concepts seems like it'd be borderline impossible in the language as-is. I wonder if many other languages started out "simple" like this, and developed grammatical complexity alongside their society?

Of course, the issue here could just be a lack of understanding and insufficient ability to ask questions to native speakers by people researching it. It only has 250 speakers and picking it up as a second language with zero resources leaves plenty of room for error. More recent research suggests it might be possible, but no non-native speakers are really sure.

Everett is an extremely controversial figure within linguistics, to put it very mildly.

Recursive embedding is one of the foundational marks of human language (though not the only), and finding a natural language without it would cast long shadows across the existing literature while making that language's "discoverer" very famous.

His data have been challenged by a number of other linguists, despite Everett's attempts to keep others from access to first hand sources.

Note too that the existence of alternate constructions such as "John has a brother. This brother has a house." do not preclude the language's ability to accommodate embedding.

Anyway, I'm no orthodox Chomskyist (in part because I'm not well enough educated to be:) but I think all of Everett's claims should be taken with as many grains of salt as we can find in our immediate vicinity.

Traces of recursion have been found in animal "language" also - https://www.nature.com/scitable/blog/cognoculture/the_less_h... describes recursive elements to certain bird song.

More recently here's a recent experiment - https://cosmosmagazine.com/people/behaviour/complex-linguist... - which seems to show that some primates have similar "recursive" reasoning skills as young (under 4) children.

The studies on song birds have for the most part been very poorly conceived and based on a simple misunderstanding. (The fact that a grammar can be defined using recursive rules doesn't entail that its string language can't be recognized by non-recursive procedures.) For example, the common A^nB^n string language (which can be defined using a recursive CFG) can obviously also be recognized by counting.

But the a^nb^n language is context free so there is no regular grammar that can represent it. If a couting mechanism can be used to represent a^nb^n the result will still have to be a context-free grammar, otherwise a^nb^n would not be context-free.

No, that's exactly the mistake. You can check that the counts are equal without parsing the string at all. So the ability to recognize the string language is not evidence that the string is being parsed according to a particular CFG (or any other grammar). In other words, it is not evidence that the birds are pairing up each A with the corresponding B.

Most of the literature on birds stems from a confusion of the distinction between regular string languages and context free string languages with the distinction between grammars with and without recursive rules. The two distinctions are largely orthogonal. It is certainly possible, for example, to define certain regular string languages using recursive grammars, and to define certain context free string languages without using recursive grammars.

When you say that "the result must be a context free grammar" I think what you mean to say is that the string language defined must be a context free (and non-regular) string language. But that does not in any way entail that the only way to recognize the string language is by means of a particular context free grammar.

Ah, I see - my mistake. You're saying that one can count n "a"s followed by n "b"s without the need for a context-free grammar (or any grammar!). I agree.

What I meant by "the result will still have to be a context free grammar" was a context-free grammar that incorporates a counting mechanism, as part of its definition. Something like this, perhaps (in Definite Clause Grammars notation so you can actually run it as a Prolog program):

  'S' --> 'A'(N), 'B'(N).
  'A'([1]) --> 'A'.
  'A'([1|As]) --> 'A', 'A'(As).
  'B'([1]) --> 'B'.
  'B'([1|Bs]) --> 'B', 'B'(Bs).
  'A' --> [a].
  'B' --> [b].
Though that one is recursive (or at least the counting mechanism is) and I'm not sure how you'd do the same thing without recursion to be honest (or without going up a couple of levels to Turing-completeness so you can call arbitrary functions). So, never mind- I just misunderstood what you meant.

I see, fair enough. I think I may have misread your original comment.

No, I misread yours :)

I've never understood why it was supposed to be so earth-shaking. Even if true, there's no reason that you have to use embedding. And human beings don't really parse via stacks: we can't parse many deeply center-embedded sentences and other garden path sentences even if they're "valid" under some context-free grammar.

That CFGs model so many languages so well is fascinating and puts constraints on the language faculty, but if some language happens to not use the full power of it, that shouldn't come as quite such a surprise. Semantics are not context-free and we know that the underlying brain mechanism is far more complicated than finite-state-machine-plus-stack.

A full study of Piraha would likely be illuminating if it really is as different from other languages as Everett says -- and it's practically useless to have him say only "It doesn't have feature X". Unless he can also show that a Piraha child can't learn another language -- that would really mean something.

The fact that linguists make a controversy out of it means I must be missing something, but I can't figure out what.

My understanding is that Everett has done his best to make a controversy of it, with most linguists preferring to try to ignore him. Unfortunately that's hard when he's doing the Pinker jig, engaging in pop linguistics, etc. So people are forced to engage with it.

I generally agree; finding a language which opts not to use recursion would be a weird phenomenon, and might make certain claims about UG weaker. But it wouldn't invalidate all of traditional linguistic research for the past however many years or totally upend our current theoretical frameworks.

I am hesitant to take the lack of the ability to embed as a fact. My impression is that Everett had a theory in mind when he surveyed Piraha and that he sought to confirm his theory. That is, I am worried that this account is subverted by "theory selects data". So I think what we need is a lot more data, ideally collected by someone who the field views as (relatively) neutral.

This is a very valid concern. Pirahã is super isolated, culturally, linguistically, physically. There are a few hundred speakers deep in the Amazon. Everett is _the_ world expert on Pirahã language (second is likely Steven Sheldon), pretty much everything on the subject is viewed through his lens. As I understand it, it was only the past decade or so when others have really started publishing on the subject in ernest.

And even Everett didn't catch the bilabial trill affricate till 2004, like 20+ years in to his research.

Interesting interview, but weird that the interviewer pulls the title "Talking Is Throwing Fictional Worlds at One Another" from his own words, not the interviewee's. I guess he was rather smitten with his own turn of phrase.

_The difference between fiction and reality? Fiction has to make sense._

― Tom Clancy

And wanted to prove his own point.

The title is a bit clickbaity, but some parts of the article were interesting (there is a lot of noise though, if what you’re looking for are technical details).

In case some of you are interested by the “merge” structure they mention, you can look for “Minimalist Program”, by Chomsky and others. That’s where it comes from:


relevant XKCD: https://xkcd.com/2043/

I've worked with some clever animals, and agree they didn't show any signs of recursive embedding (they "lex" but don't "parse"), but must note that modern anglophone humans are relatively embedding-impoverished compared with prior centuries, a reduction which we can observe in comparing the number of stack levels necessary to parse the subordinations and coordinations[1] in the dendritic periods of the 1st president of the US with the number necessary to parse the sequential utterances of the most recent.

[1] https://www.archives.gov/exhibits/american_originals/inaugtx...

> "In these honorable qualifications, I behold the surest pledges, that as on one side, no local prejudices, or attachments; no separate views, nor party animosities, will misdirect the comprehensive and equal eye which ought to watch over this great assemblage of communities and interests: so, on another, that the foundations of our National policy will be laid in the pure and immutable principles of private morality; and the pre-eminence of a free Government, be exemplified by all the attributes which can win the affections of its Citizens, and command the respect of the world."

I've been working on a conlang based on word vectors and "morphemes are phonemes", a very tight coupling between the syllables and semantics (it's an abugida but written in ascii). Gu=earth, va=motion, mu=machine, things like that. Initially there was no requirement for order of the morphemes, guvamu=muvagu=vagumu, whatever sounded best. Then I ran smack into, "what's the difference between a ground vehicle (car) and an earth-mover?"

So I needed recursion and a bit more grammar to make this hierarchical word fusion work better.

This reminds me a lot of Chinese. Many of the words are 2-3 character descriptions. Eg the word for machine is used as a suffix for flying machine (airplane), thinking machine (computer), washing machine etc. For historical reasons related to steam engines, the word for train roughly translates as fire machine. Another good example is how ying can be combined with the character for land to describe England (ying guo), further combined with people to describe the English (ying guo ren), or combined with the character for writing to describe English (ying wen). The same thing works for France (fa guo), Germany (de guo), America (mei guo) etc. It also works with the generic word for foreign. Foreign country (wai guo), foreigner (wai guo ren), foreign language (wai wen). My personal favorite is the word for faucet translates as "water dragons head".

I'm not a linguist but that to me is more a list (a sequence of qualifiers) than a recursion (a nesting). However, a quick browse shows I'm wrong so for those who also aren't sure



Try diagramming that sentence. (base: "I behold pledges") Even just the last few words of that sentence:

    ... exemplified by
        all the attributes
          which can
            win the affections
              of its Citizens, and
           command the respect
              of the world.
have more structure than the pair of sentences:

    On @foxandfriends


I'm not sure that those links completely show that you're wrong. The example you're replying to appears deeper than it is because each nested element is quite broad, so the absolute depth is not as much as it might seem (I think).

Agreed, there are listings, but even they are not always just a flat unary tree. For instance, there's the structure within the listing of:

    (no ((local prejudices) or (attachments))
     no (separate views)
     nor (party animosities))
which is itself embedded within

    ... will misdirect ...
    as on one side ... so, on another, ...

Re: the XKCD strip

While "carhouse" may not be a thing, you might be surprised to learn of the existence of a "car condo" (or "motor condo"): https://www.irongatemotorcondos.com/

Yes, really. This business not only exists, but is thriving.

> Despite that seeming constraint, Adger argues in his new book, Language Unlimited, that the sentences we make are infinite in faculty, form, and expression.

This is such a journalist thing to write. The idea that the number of sentences that can be produced using a given language is potentially infinite is such an old one that it hardly has anything to do with Adger, and Adger hardly needed to "argue" it.

"I think about it like even numbers. There’s an unlimited number of even numbers but obviously they’re limited, right? Because 3s and 7s aren’t in there. Language is like that. There’s an unlimited number of possible things we can say, of sentence structures, but not anything can be a sentence structure. So you’re absolutely right. Language is unlimited, but it’s unlimited in a limited way."

This is such a tantalizing idea, but there doesn't seem to be anyway to extend it beyond "we don't know what it would be like to think with a different language paradigm, but it would probably be different." I want to know how human thoughts are shaped and constrained by human language.

The issue, in my opinion, is separating the fact from the language shaping the thought to the thought shaping the language. It quickly leads into strong linguistic relativity, which has been proven wrong. And even the weak form, in my opinion, isn't very strong proof that it's necessarily a specific language, as opposed to cultural influences, affecting it.

an attempt: https://news.ycombinator.com/item?id=23856845

or Story of Your Life

Formal diagrams (still potentially unbounded combinations of a bounded symbolic repertoire, but connected in more than just a temporal dimension) are as close as I can think of at the moment.

Anyone know of more recent work on https://www.microsoft.com/en-us/research/wp-content/uploads/... ?

Shameless plug for my essay [1] that discusses some similar ideas to this article. I focus more in-depth on what the consequences of compositional grammar are for ML-based NLP and why we should be looking to theories of psycholinguistics and philosophy of language for inspiration as a research community.

[1] https://rohan.bearblog.dev/humans-spoke-vectors/

> We don’t do this by putting the two words in a sequence, like an artificial intelligence or a bonobo would do. Instead we build a new hierarchical unit. This unit puts together the verb drink with the noun wine to create the phrase drink wine, with wine being the grammatical object of drink.

AI can do compositionality.

I wish this had less of a clickbait title. I have no idea what the article is about other than linguistics.

Relevant XKCD: https://xkcd.com/114/

I'm generally a fan, but that's quite possibly the least funny XKCD ever.

My computational linguist friend disagrees and asked me to design this for his iPad: https://m.imgur.com/gallery/oPzM8U7

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact