
Tom Wolfe takes on linguistics - KC8ZKF
http://languagelog.ldc.upenn.edu/nll/?p=26936
======
anonymousDan
How is Chomsky perceived on the field of linguistics today? My impression was
that his theories are well respected, but recently I've come across several
people dismissing his work as outdated and even wrong. Not sure whether this
is just an attempt by people to denigrate him because of his political beliefs
though.

~~~
glup
Generative linguistics is pretty isolated these days... chugging along but
using a set of methodologies and premises that are pretty far removed from the
rest of academia. Generative linguistics generally had trouble in two key
areas 1) first language acquisition (where increasingly ornate innate
machinery is needed to explain how kids arrive at the 'right' grammar) and 2)
language processing, where the structural representations were never great at
predicting processing difficulty, nor implementable in a machine.
Consequently, these two fields have long been in psychology (and the reason
I'm a psych graduate student, of all things!), and draw heavily on statistics
cognitive science, NLP and in the latter case, information theory.

Chomsky is brilliant, no doubt, and critically showed the world how much
latent structure there is in language. However, he is IMHO pretty wrong in
thinking that this latent structure can't be learned (e.g. inferring a
probabilistic context-free grammar).

One thing to be said for generative linguistics is that a whole lot of super-
interesting phenomena have only been characterized within specific generative
frameworks--language documentation and psycholinguistic research can only
progress with a formal system in which language can be rigorously
characterized. So one of the huge projects in the future will be translating
the 50+ years of research into the formalisms in the modern computer science /
statistical NLP / Bayesian cognitive science / learning-centric developmental
psychology stack.

~~~
foldr
>a set of methodologies and premises that are pretty far removed from the rest
of academia.

The rest of academia has agreed on a set of methodologies and premises? No
wonder we feel left out!

>increasingly ornate innate machinery is needed to explain how kids arrive at
the 'right' grammar

This is too vague to respond to. But theories do tend to get more complex as
we learn more, since there's more data to account for.

>where the structural representations were never great at predicting
processing difficulty, nor implementable in a machine.

Which structural representations assumed by (say) GB theory are not
implementable in a machine?

>he is IMHO pretty wrong in thinking that this latent structure can't be
learned (e.g. inferring a probabilistic context-free grammar).

Chomsky always agreed that it was possible to learn context-free grammars via
statistical methods. That's why he placed such a great emphasis in the 60s on
showing that context-free grammars are not a suitable model for natural
language.

Your last paragraph is fairly astonishing, insofar as it admits that
generative linguistics has obtained lots of interesting results which cannot
be characterized in "modern" terms. That sounds like a pretty strong
indication that generative linguistics has got something right!

~~~
laretluval
> Which structural representations assumed by (say) GB theory are not
> implementable in a machine?

Most of the individual constraints proposed in the myriad papers on GB and
Minimalism are probably implementable by machine. But no one in Chomskyan
generative syntax seems interested in explicitly spelling out the full set of
principles that would underly a large-coverage grammar--except maybe
Steedman's Minimalist Grammar formalism, which is ignored by most people who
call themselves "syntacticians". In contrast, the HPSG and LFG communities
have attempted to provide large-scale grammars and a lot of NLP work has used
them in a serious way, but those communities are no longer very active.

Is there a work which lays out modern Minimalist generative syntax in full
formal detail, and shows how this formal system handles a very large range of
different syntactic phenomena? Such that it would be possible to produce a
large hand-parsed corpus? It seems like this is what would be needed for
generative syntax to have relevance outside linguistics departments. If it
exists and I'm just unaware of it, I'd be glad to hear about it!

~~~
foldr
I'm not sure why you're asking this about Minimalism specifically. There are
already wide coverage parsers based on various generative frameworks (e.g.
HPSG, LFG, CCG).

It's also a bit odd to suggest that any framework for which there isn't a wide
coverage parser lacks any relevance outside linguistics departments. I know of
lots of examples of cross-disciplinary work involving generative linguistics,
but most of it doesn't relate to parsing at all. I'd say that wide coverage
parsers are actually a pretty niche interest, which is one reason why people
don't tend to work on them much.

------
d_burfoot
If you are a hacker who is interested in linguistics, you might want to check
out the work we are doing at my startup Ozora Research
([http://ozoraresearch.com](http://ozoraresearch.com)).

Our goal is to build high-quality, sophisticated NLP solutions for "cleantext"
: text that has been professionally written and copy-edited, so that it
doesn't have obvious errors of spelling, grammar, punctuation, etc.

What makes our work different is that, unlike almost all other research in
NLP, we don't depend on human-annotated data (like the Penn TreeBank) for
training and evaluation. Instead, we build our models into lossless data
compressors, and evaluate by invoking the compressor on a large text corpus.

In addition to this change in evaluation methodology, we are also deeply
interested in the _empirical_ study of text - that is, in the questions of
traditional syntax research. For example, an important part of English syntax
relates to verb argument structure: verbs differ in the types of arguments
they can accept. Some verbs can accept infinitive complements, that-
complements, or indirect objects; others cannot. Our system contains a module
that handles argument structure and forwards this information to the parser
and compressor.

I wrote a book presenting a rationale for why scientists should be deeply
interested in large-scale, lossless data compression. Roughly, empirical data
compression research is a variation of the scientific method: if theory A
achieves better compression than theory B, then A is closer to the truth.
Scientists can use this principle to search systematically through theory-
space and find high-quality theories.

[http://arxiv.org/abs/1104.5466](http://arxiv.org/abs/1104.5466)

Interestingly, compression has a direct link to generation. By sending random
bits into the decompressor, you obtain sample sentences that illustrate the
model's "idea" of what English text looks like. By studying the discrepancy
between the sample sentences and real text, you can discover ways to improve
the model. This is a very productive technique that we use routinely. Chomsky
declared that his goal was to build a machine that generates grammatical
English sentences, but none of his followers actually tried to build such a
device. So in some sense our research is actually closer to Chomsky's stated
goal than what Chomsky's followers are doing.

I would love to hear from people who might want to collaborate on linguistics
work, or use the data compression idea to explore other fields (eg. computer
vision, bioinformatics, astronomy).

~~~
Natanael_L
I have a feeling you'll enjoy this:

[https://www.reddit.com/r/subredditsimulator](https://www.reddit.com/r/subredditsimulator)

------
dnautics
> "Chomsky et alii"

Brilliant. I suspect wolfe specifically extends the usually abbreviated latin
word _alii_ because alii in Hawaiian means "royalty".

~~~
delazeur
The Latin word is actually _alia_. It could be a pun from Wolfe, but it could
also be a typo.

~~~
carbocation
I have no background in Latin, but I think that _alia_ is neutral plural and
_alii_ is masculine plural. In this case _et alii_ makes more sense.

~~~
delazeur
You're right [1]. Interesting, I've always seen _et alia_ before, but it looks
like _et alii_ is more often correct.

[1] [http://latin-dictionary.net/search/latin/alia](http://latin-
dictionary.net/search/latin/alia)

------
nwatson
> KINGDOM OF SPEECH is a captivating, paradigm-shifting argument that speech —
> not evolution — is responsible for humanity's complex societies and
> achievements.

It's time to read the Old Testament story of the Tower of Babel once more.
That's always been a fascinating story for me ... the notion that people were
forming a real or virtual tower that threatened divine order and so needed to
be split by making their speech mutually unintelligible.

~~~
mcguire
Check out Umberto Eco, _The Search for the Perfect Language_.

------
mathattack
Referenced (and unfortunately gated) article from the source:
[http://harpers.org/archive/2016/08/the-origins-of-
speech/](http://harpers.org/archive/2016/08/the-origins-of-speech/)

------
CodeCube
For a second there, I thought I read, "David Wolfe takes on linguistics", and
figured I was in for a laugh

------
pc86
Not to be confused with Tom Wolf, the current Pennsylvania governor.

------
grandalf
Anyone have a link to a non-gated version of the original?

