
Noam Chomsky on Where Artificial Intelligence Went Wrong - ColinWright
http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/?single_page=true
======
iandanforth
Chomsky is right up there with Minsky in being part of the problem. His ideas
about language being part of the genome are fanciful nonsense. Skinner produce
reams of reproducible empirical observations of behavior which are, today,
critical to the evaluation of the performance of AIs. Chomsky has produced
interesting theories, but mostly derailed linguistics on the basis of the
argument 'language is complicated, something magical must be going on.' His
contributions to political debate are monumental. His contributions to science
are negligible if not entirely detrimental.

~~~
MatthewPhillips
I hope you're kidding about his contributions to political debate. Chomsky has
the maximalist attitude that represents everything that is wrong with
political debate today. His hatred of U.S. foreign policy is so extreme that
he will defend absolutely everyone the U.S. opposes which means occasionally
defending tyrants and denying genocide.

~~~
rdtsc
> His hatred of U.S. foreign policy

Sadly enough for (U.S. foreign policy and its supporters) he actually supports
his claims very well with sources and footnotes, and if you read through this
works, you might find that perhaps there is a reason others (who don't just
watch Fox News) don't agree with said policy, and also that somehow Americans
in certain parts of the world are not "hated because of our freedoms". There
are other reasons.

~~~
twoodfin
Cranks can load their screeds with sources and footnotes, too.

There's a reason no one outside the campus left takes Chomsky's political
opinions seriously, and it's not a massive conspiracy.

------
lsb
_The "right" way is to take endless numbers of videotapes of what's happening
outside the video, and feed them into the biggest and fastest computer,
gigabytes of data, and do complex statistical analysis -- you know, Bayesian
this and that -- and you'll get some kind of prediction about what's gonna
happen outside the window next. In fact, you get a much better prediction than
the physics department will ever give. Well, if success is defined as getting
a fair approximation to a mass of chaotic unanalyzed data, then it's way
better to do it this way than to do it the way the physicists do, you know, no
thought experiments about frictionless planes and so on and so forth. But you
won't get the kind of understanding that the sciences have always been aimed
at -- what you'll get at is an approximation to what's happening._

Chomsky seems to keep using naïve models as a strawman, and Norvig rightly
calls him on it. If you use simple models, you can only get simple insights,
but statistical machine translation (for example) builds probabilistic
context-free grammars, which maps human notions of language far better than
"make sure every three words in sequence is plausible".

~~~
AndyNemmity
Chomsky is agreeing that it's making a map. He just doesn't think that map is
very useful on a scientific level, but is useful on an engineering level.

You're responding that he's wrong, because it's useful on an engineering
level.

Right? I'm reading many comments here and they seem to keep boiling down to
this notion. Am I wrong?

~~~
mturmon
At the top level, I think this captures it.

If, like Chomsky, you value having a model of the underlying cognition process
rather than a set of black-box predictors for aspects of that problem (e.g.,
various corpus-driven translators), then you might be really annoyed that the
black-box people are so satisfied with their results.

~~~
Evbn
The Church was angry that the sun didn't revolve around the Earth. We don't
get to pick the prettiest models. Right makes right.

~~~
mturmon
I object to your glibness. Probably both methods (first-principles cognitive
modeling vs. high-degree-of-freedom black box learning) will prove
informative, just in different ways.

Or in your terms, we may not get to pick the prettiest models, but we owe it
to ourselves to explore the space of models to see if we can find the
structure in it.

The engineer in me is pleased by the undoubted success the data-driven
learning culture has had on problems of real importance. But this work is
highly empirical, with a tendency to point solutions, and someone is likely to
come in later on and generalize these methods (e.g., why do some families of
black-box predictor or features outperform others for language learning).
There's room for both approaches.

Norvig's reply to Chomsky's original remark contains a reference to Leo
Breiman's well-informed remarks on this question
([http://projecteuclid.org/DPubS?service=UI&version=1.0...](http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1009213726)).

Breiman, as author of basic books on measure theory as well as on
classification trees, was able to walk both sides of this line ("make a first-
principles model" vs. "use lots of data"). He spent considerable energy over
the years trying to introduce the data-intensive approach to conventional
statistics. For instance, he was one of the handful of bona fide statisticians
who would attend and contribute to neural net and machine learning
conferences. Probably this strategy is more productive than Chomsky's grumpy-
old-man warnings (or sagacious warnings, depending on how you look at it).

------
jholman
I haven't finished reading TFA yet, but so far it's really good, because it
sounds like Chomsky is actually getting to the point.... which he sometimes
does, and sometimes absolutely doesn't (or so it often seems to me).

If you're interested in this area, which you might call the philosophy of
artificial mind, or philosophy of cognitive science, I strongly suggest
reading the link to Norvig's article (linked in TFA, but here it is again:
<http://norvig.com/chomsky.html> ). In particular, I'd suggest reading it
before reading the actual interview with Chomsky (maybe ideally reading it
after the prefatory part of TFA).

My own inclination is that on the face of it, I find Norvig's approach less
satisfying, as Chomsky appears to, but upon much consideration my current
belief is that Chomsky's approach is too mystical and too just-so, and
Norvig's approach at least has the merit of bearing fruit... fruit that one
day might be concentrated into a concise and elegant theory.

~~~
AndyNemmity
I don't see any mysticism in Chomsky's approach.

You seem to agree with Norvig that doing massive data analysis on language
will come to a scientific understanding, which would be a first in science.

Chomsky doesn't. If anything, Chomsky is grounded in reality, and Norvig and
AI researchers are grounded in hope that this way of mapping out something
will create meaningful understanding of the system.

~~~
mjn
This reminds me a bit of what scientists in other fields refer to as
"empirical equations", which are equations fit from data without any
particular theoretical backing or reason to believe that their components are
a good model of reality. They're useful in that they may predict observations
well, especially over a specific range of observables, but they don't
necessarily give us an understanding of what's going on. An example is the
historical Prony equation for hydraulic pressure loss
(<http://en.wikipedia.org/wiki/Prony_equation>), which doesn't actually
correctly model what happens in fluid flow, but _does_ happen to empirically
produce fairly good results over a range of values, partly through the use of
two magic numbers fit from data.

Another example might be the winning entry in the Netflix competition. If your
goal is to predict film preferences (which is Netflix's goal, of course), it
appears to give pretty good estimates. But I don't think even its authors
would claim that it's a a scientific model of how humans form preferences.

In both cases the underlying problem is that there are fairly general
functional forms, such as a few terms of a Taylor series, or a forest of
decision trees, that can empirically model almost anything to a certain degree
of accuracy, given enough data, even if the underlying process looks nothing
like them. Therefore they can give accurate predictions that work in practice,
without being accurate models of what's happening in the underlying system.
Chomsky appears to be of the opinion that statistical NLP systems are more of
that variety, so may be good engineering solutions without being good
scientific models.

~~~
AndyNemmity
Exactly. Well said, and clearly articulates the point here.

What I don't get is, what Chomsky is saying is all together the standard, and
yet people are insulting him for what is all together a very simple idea you
expressed plainly.

Yes, NLP systems are going to have many engineering uses, and Chomsky agrees.
Are they going to help in the true scientific understanding of the systems?

It's unlikely. It's likely to be "good engineering solutions without being
good scientific models" as you elegantly put it.

~~~
sampo
_Yes, NLP systems are going to have many engineering uses, and Chomsky agrees.
Are they going to help in the true scientific understanding of the systems?_

So, what we have is (1) the engineering / statistical modelling / machine
learning approach, and (2) the deep theoretical "Chomsky approach".

Chomsky despises, maybe rightly so, the engineering approach because it only
provides tools that work, approximately, in practice but don't provide any
deep "scientific" understanding.

The deep theoretical approach has a vision of a comprehensive theory that
really provides understanding. Once we manage to find the deep fundamental
theory, practical applications will be a child's play.

But here's the catch: Has the Chomsky approach takes us any closer to that
deep theoretical progress? Why has all the practical progress come from the
engineers? What if the deep theoretical thinkers are completely lost, like the
alchemists in dark medieval times, in their theories and as they despise the
data-driven approach, they also refuse to let empirical observations guide
them to a right direction.

I don't think looking down on the engineers and their modest practical success
is any kind of merit, if your only merit is dreaming of a deep theory, but
making no measurable success towards said theory.

~~~
rdtsc
> Why has all the practical progress come from the engineers?

I see what you are saying and actually it is a good point. Where are the
robots built on Chomsky's theory? A very valid question. I don't know the
answer to it, Chomsky doesn't either. But I think what you mean by practical
progress isn't what he mean progress. That is his point.

You have to see where he is coming from. He is an academic his ultimate goal
is to understand how things work. Training a set of neurons with input data
and ending up perhaps with millions activation weights in the end is not
helping that goal even if this new machine can play chess, make coffee and
drive you to work. I think that is his take on it.

I say we need both. There is no reason to not strive for both. There is not
reason to turn all radical and start burning books and claim one approach
should completely replace the other. I hope we one day find (or find that we
can't find) a good explanatory model for meaning, language, learning,
personality, or conscience, but in the meantime I enjoy playing chess with my
computer, and I hope pretty soon I'll have my car drive me to work by itself.

~~~
sampo
Your point is also good. "Shallow engineering" will not give us deep
theoretical understanding, and we need also deep theoretical understanding.
But in our need for deep theory, we should not accept just _any_ theory. The
theory should be testable, and it should eventually yield some practical
applications. Theoretical thinking can get quite lost if it's not guided by at
least some connection to empirical data.

Early L. Ron Hubbard presented a theory (Dianetics) on the causes of mental
illness. It's a theory alright, just not a very good one, and not very
testable. We would still benefit from a better theoretical understanding of
human mental health and illness, but we should not take just anybody, just
because he is a deep thinker and has a theory.

------
reso
So much hate in this thread for one of men who laid the foundations of
Computer Science. We may disagree with his ideas now, and possibly his
theories are out of date, but the magnitude of his contribution to
linguistics, and by extension, neurology, psychology, and computer science,
puts him in the very highest rank of scientists.

~~~
Evbn
Ad hominem cuts both ways. Being right once, no matter how significant,
doesn't grant a free pass to BS through other topics.

------
TheAudientVoid
"The approach taken by Chomsky and Marr toward understanding how our minds
achieve what they do is as different as can be from behaviorism. The emphasis
here is on the internal structure of the system that enables it to perform a
task, rather than on external association between past behavior of the system
and the environment. The goal is to dig into the 'black box' that drives the
system and describe its inner workings, much like how a computer scientist
would explain how a cleverly designed piece of software works and how it can
be executed on a desktop computer."

The article completely misunderstands what Behaviorism actually is. Once
again, the misunderstanding comes from confusing the Methodological
Behavoirism advocated by John Watson, Edward Thorndike and others, who did
indeed try to model people as a "black box", with Skinner's Radical
Behavoirism, which denies that there's even a box to be opened; rather,
Skinner holds that there is only a locus where a series of environmental
processes happen to converge and interact in interesting ways. Some of these
processes are much older than the others, and are expressed in genes; others
are relatively newer, and are learned. Skinner did not deny that genetics
played a role in language acquisition, nor did he ever maintain people are
born a blank slate. While Skinner found Chomsky's work regarding universal
grammar unconvincing, he maintained that it was not, as Chomsky claimed,
directly opposed to his own work -- the two theories were orthogonal, and
would succeed or fail independently of each other.

------
zepolud
I've always considered Norvig's position pretty much self-evident for anybody
who has dealt with real world problems. At the same time, Chomsky is still way
more interesting to read, even if he's wrong. And he's wrong mostly for
assuming that unsupervised learning has to be the end result when in fact it
could be an intermediate step to more refined symbolic theories.

In any case, this kind of antagonism between the two approaches might be
useful as it keeps the field more vital, preventing yet another stagnation.

~~~
AndyNemmity
By real world problems you're talking about engineering and not science.
Chomsky says Norvig's position is great for engineering, just not science.

The whole disagreement stems from the idea that this will help us with a
scientific understanding of language or not. The burden is on Norvig's side to
prove it, and it hasn't.

~~~
snowwrestler
I think the phrase "scientific understanding" is essentially qualitative,
meaning that humans can disagree about what we "understand" and how
"scientific" it is, and there's no objective way to adjudicate that fight.

How many physicists really believe they fully understand quantum mechanics?
The theory is probablistic and strange but produces very accurate predictions.

The true measure of science is matching observations to hypotheses, and in
that respect the approach that Norvig defends has demonstrated success.
Google's language tools work well much of the time. Watson beat Ken Jennings.

Other branches of science are beginning to make more use of "big data"
approaches as well. A friend doing post-doctoral research on evolution spends
most of his time behind a laptop coding against big sets of digitized genetic
information.

------
KVFinn
Chomsky is damn sharp at 83. I sure hope I hold up that well.

------
gavanwoolery
While I respect (some of Chomsky's work) it is amazing to me that he thinks
language is something more than mathematics/statistics. Language _is_ math, it
is basically an advanced form of discreet mathematics. We can reproduce
virtually anything on a computer, and it is no more "shallow" than artificial
light from a lightbulb is "artificial." There is no magic going on, we are
biological computers, walking number crunchers. Our brains just happen to
operate with chemicals and analog signals, rather than transistors and digital
signals. His view on this carries the drawbacks of academia, which has a
tendency to over-complicate and over-formalize thinking.

~~~
Cacti
I completely agree. The fact that math = language, and that it's built into
our DNA, is astounding. And from there that the Halting Problem is basically
tied into our DNA, that is, how we think, how our brains work, our ability to
conceive ideas, is just mind-blowing. What Chomsky has done in his academic
career in on par with the greatest scientists in our history.

~~~
koningrobot
Math is a model of reality. Math does not exist in your DNA; DNA exists
subject to the rules of that which math models.

------
6ren

      What it strongly suggests is that in the evolution of language, a computational
      system developed, and later on it was externalized.
    

So, the beginning of human language was not communication, but a computational
system. Interesting stuff starting at that quote. NB: lots of mistakes in the
transcript, sometimes rendering it unintelligible.

------
jhales
Reminds me Einstein arguing God doesn't play dice.

(Not claiming Chomsky is wrong,just a similar figure making a similar claim.)

------
ilaksh
The field is now called AGI. It isn't mentioned in this article. Everyone
seems to be ignoring the whole field of AGI (artificial general intelligence).
Or maybe they truly are ignorant of it.

Anyway, suffice to say, AI and AGI didn't stop progressing, and Chomsky is no
longer any sort of expert in those fields.

Even Norvig isn't up to speed on the most advanced approaches to AGI, but at
least he enters the same room with people who are aware of the field. For
example, he gave a talk at the recent Singularity Summit.

The Fifth Conference on Artificial General Intelligence is going to be in
Oxford in December. <http://agi-conference.org/2012/>

Here is some information for people who are interested in pertinent ideas
related to AGI.

[http://www.amazon.com/How-Create-Mind-Thought-
Revealed/dp/06...](http://www.amazon.com/How-Create-Mind-Thought-
Revealed/dp/0670025291)

<http://opencog.org/theory/>

>OpenCog is a diverse assemblage of cognitive algorithms, each embodying their
own innovations — but what makes the overall architecture powerful is its
careful adherence to the principle of cognitive synergy.

>The human brain consists of a host of subsystems carrying out particular
tasks — some more specialized, some more general in nature — and connected
together in a manner enabling them to (usually) synergetically assist rather
than work against each other.

<http://wiki.opencog.org/w/Probabilistic_Logic_Networks>

> PLN is a novel conceptual, mathematical and computational approach to
> uncertain inference. In order to carry out effective reasoning in real-world
> circumstances, AI software must robustly handle uncertainty. However,
> previous approaches to uncertain inference do not have the breadth of scope
> required to provide an integrated treatment of the disparate forms of
> cognitively critical uncertainty as they manifest themselves within the
> various forms of pragmatic inference. Going beyond prior probabilistic
> approaches to uncertain inference, PLN is able to encompass within uncertain
> logic such ideas as induction, abduction, analogy, fuzziness and
> speculation, and reasoning about time and causality.

<http://wiki.opencog.org/w/AtomSpace>

Conceptually, knowledge in OpenCog is stored within large [weighted, labeled]
hypergraphs with nodes and links linked together to represent knowledge. This
is done on two levels: Information primitives are symbolized in individual or
small sets of nodes/links, and patterns of relationships or activity found in
[potentially] overlapping and nesting networks of nodes and links. (OCP
tutorial log #2).

[http://www.izhikevich.org/publications/large-
scale_model_of_...](http://www.izhikevich.org/publications/large-
scale_model_of_human_brain.htm)

Large-Scale Model of Mammalian Thalamocortical Systems

> The understanding of the structural and dynamic complexity of mammalian
> brains is greatly facilitated by computer simulations. We present here a
> detailed large-scale thalamocortical model based on experimental measures in
> several mammalian species. The model spans three anatomical scales. (i) It
> is based on global (white-matter) thalamocortical anatomy obtained by means
> of diffusion tensor imaging (DTI) of a human brain. (ii) It includes
> multiple thalamic nuclei and six-layered cortical microcircuitry based on in
> vitro labeling and three-dimensional reconstruction of single neurons of cat
> visual cortex. (iii) It has 22 basic types of neurons with appropriate
> laminar distribution of their branching dendritic trees. The model simulates
> one million multicompartmental spiking neurons calibrated to reproduce known
> types of responses recorded in vitro in rats. It has almost half a billion
> synapses with appropriate receptor kinetics, short-term plasticity, and
> long-term dendritic spike-timing-dependent synaptic plasticity (dendritic
> STDP). The model exhibits behavioral regimes of normal brain activity that
> were not explicitly built-in but emerged spontaneously as the result of
> interactions among anatomical and dynamic processes. We describe spontaneous
> activity, sensitivity to changes in individual neurons, emergence of waves
> and rhythms, and functional connectivity on different scales.

[http://www.sciencebytes.org/2011/05/03/blueprint-for-the-
bra...](http://www.sciencebytes.org/2011/05/03/blueprint-for-the-brain/)

Essentials of General Intelligence: The direct path to AGI

<http://www.adaptiveai.com/RealAI_chap_ver2c.htm>

>General intelligence, as described above, demands a number of irreducible
features and capabilities. In order to proactively accumulate knowledge from
various (and/ or changing) environments, it requires:

>1\. Senses to obtain features from ‘the world’ (virtual or actual),

>2\. A coherent means for storing knowledge obtained this way, and

>3\. Adaptive output/ actuation mechanisms (both static and dynamic).

>Such knowledge also needs to be automatically adjusted and updated on an
ongoing basis; new knowledge must be appropriately related to existing data.
Furthermore, perceived entities/ patterns must be stored in a way that
facilitates concept formation and generalization. An effective way to
represent complex feature relationships is through vector encoding (Churchland
1995).

>Any practical applications of AGI (and certainly any real-time uses) must
inherently be able to process temporal data as patterns in time – not just as
static patterns with a time dimension. Furthermore, AGIs must cope with data
from different sense probes (e.g., visual, auditory, and data), and deal with
such attributes as: noisy, scalar, unreliable, incomplete, multi-dimensional
(both space/ time dimensional, and having a large number of simultaneous
features), etc. Fuzzy pattern matching helps deal with pattern variability and
noise.

>Another essential requirement of general intelligence is to cope with an
overabundance of data. Reality presents massively more features and detail
than is (contextually) relevant, or that can be usefully processed. This is
why the system needs to have some control over what input data is selected for
analysis and learning – both in terms of which data, and also the degree of
detail. Senses (‘probes’) are needed not only for selection and focus, but
also in order to ground concepts – to give them (reality-based) meaning.

<http://en.wikipedia.org/wiki/Hierarchical_temporal_memory>

> A typical HTM network is a tree-shaped hierarchy of levels that are composed
> of smaller elements called nodes or columns. A single level in the hierarchy
> is also called a region. Higher hierarchy levels often have fewer nodes and
> therefore less spacial resolvability. Higher hierarchy levels can reuse
> patterns learned at the lower levels by combining them to memorize more
> complex patterns.

> Each HTM node has the same basic functionality. In learning and inference
> modes; sensory data comes into the bottom level nodes. In generation mode;
> the bottom level nodes output the generated pattern of a given category. The
> top level usually has a single node that stores the most general categories
> (concepts) which determine, or are determined by, smaller concepts in the
> lower levels which are more restricted in time and space. When in inference
> mode; a node in each level interprets information coming in from its child
> nodes in the lower level as probabilities of the categories it has in
> memory.

>Each HTM region learns by identifying and memorizing spatial patterns -
combinations of input bits that often occur at the same time. It then
identifies temporal sequences of spatial patterns that are likely to occur one
after another.

~~~
confluence
I've heard of OpenCog before and it, along with the Singularity crowd gives me
a weird amateur, bullshitty, vague, generalist feeling that Noam Chomsky does.
Basically - where's the beef? What has been done by either crowd apart from
taking credit from those who do things in the actual industry/real world?

My fundamental aversion to both OpenCog and the entire Singularity crowd is a)
their statements are so general as to the point of being useless and b) they
don't do anything. Google makes search simple - go to google.com and find out.
Google makes cars drive themselves - ask Nevada/California and if you're a
member of the press - request a test drive today. IBM's Watson definitively
beat world champions in front of everyone and before that they did it with
Blue Gene.

Everyone in the other communities fall under this category: All talk - no
walk.

The entirety of what I've gotten out of both groups is essentially little more
than what religious people get out of going to a sermon at a church. The
future will be grand, lots of bullshitty buzz words, lots of hand waving with
huge claims - no hard calculations, no hard examples of what they've actually
achieved.

I'll stick with Norvig/Google and his/their demonstrated achievements and
knowledge over the talk, hype and vaporware projects of groups that have yet
to show any hard progress apart from a bunch of lectures to rich people with a
lot of vague words.

The SENS movement gives me the exact same feeling.

All talk - no walk.

~~~
corporalagumbo
Chomsky's expertise is in linguistics and political analysis. Stephen Pinker's
The Language Instinct is a good, readable introduction to some of Chomsky's
work (and the wider field to which he is pivotal.) Chomsky's Manufacturing
Consent is probably his classic work of political analysis.

<http://en.wikipedia.org/wiki/Noam_Chomsky_bibliography>

He's no quack.

~~~
confluence
You know in the soft sciences everyone is a _quack_ because fundamentally they
don't practice - wait for it - science. Science stops false connections by
correctly attributing cause to its respective effect. Social sciences do not.
For all intents and purposes, the vast majority of social science is either
unreproducible, vague, mixing correlation with causation, uses dependent
variables, poorly reasoned, statistical quirks, pushed by agendas or
fundamentally flawed.

> _are effective and powerful ideological institutions that carry out a
> system-supportive propaganda function by reliance on market forces,
> internalized assumptions, and self-censorship, and without overt coercion_

\--
[http://en.wikipedia.org/wiki/Manufacturing_Consent:_The_Poli...](http://en.wikipedia.org/wiki/Manufacturing_Consent:_The_Political_Economy_of_the_Mass_Media)

That's pretty self-evident to the point of being, well, pointless - admen of
the 60s made their bread using this, and the PR pioneers of the 30s were
already experts. But please let's all listen to what he has to say next. Let
me guess: killing people is bad, and not killing people is good. If you call
that amazing thinking, I'd hate to see the idiotic version.

Even better:

> _Geoffrey Sampson maintains that universal grammar theories are not
> falsifiable and are therefore pseudoscientific theory. He argues that the
> grammatical "rules" linguists posit are simply post-hoc observations about
> existing languages, rather than predictions about what is possible in a
> language. Similarly, Jeffrey Elman argues that the unlearnability of
> languages assumed by Universal Grammar is based on a too-strict, "worst-
> case" model of grammar, that is not in keeping with any actual grammar. In
> keeping with these points, James Hurford argues that the postulate of a
> language acquisition device (LAD) essentially amounts to the trivial claim
> that languages are learnt by humans, and thus, that the LAD is less a theory
> than an explanandum looking for theories.

Sampson, Roediger, Elman and Hurford are hardly alone in suggesting that
several of the basic assumptions of Universal Grammar are unfounded. Indeed, a
growing number of language acquisition researchers argue that the very idea of
a strict rule-based grammar in any language flies in the face of what is known
about how languages are spoken and how languages evolve over time. For
instance, Morten Christiansen and Nick Chater have argued that the relatively
fast-changing nature of language would prevent the slower-changing genetic
structures from ever catching up, undermining the possibility of a genetically
hard-wired universal grammar. In addition, it has been suggested that people
learn about probabilistic patterns of word distributions in their language,
rather than hard and fast rules (see the distributional hypothesis). It has
also been proposed that the poverty of the stimulus problem can be largely
avoided, if we assume that children employ similarity-based generalization
strategies in language learning, generalizing about the usage of new words
from similar words that they already know how to use.

Another way of defusing the poverty of the stimulus argument is to assume that
if language learners notice the absence of classes of expressions in the
input, they will hypothesize a restriction (a solution closely related to
Bayesian reasoning). In a similar vein, language acquisition researcher
Michael Ramscar has suggested that when children erroneously expect an
ungrammatical form that then never occurs, the repeated failure of expectation
serves as a form of implicit negative feedback that allows them to correct
their errors over time. This implies that word learning is a probabilistic,
error-driven process, rather than a process of fast mapping, as many nativists
assume.

Finally, in the domain of field research, the Pirahã language is claimed to be
a counterexample to the basic tenets of Universal Grammar. This research has
been primarily led by Daniel Everett, a former Christian missionary. Among
other things, this language is alleged to lack all evidence for recursion,
including embedded clauses, as well as quantifiers and color terms. Some other
linguists have argued, however, that some of these properties have been
misanalyzed, and that others are actually expected under current theories of
Universal Grammar._

\-- <http://en.wikipedia.org/wiki/Universal_grammar#Criticisms>

Looks like I'm not the only one that sees through bullshit.

Let me repeat - just to imprint on people's minds:

> _This implies that word learning is a probabilistic, error-driven process,
> rather than a process of fast mapping, as many nativists assume._

Chomsky's theories are, and always were, DOA.

~~~
Volpe
> Science stops false connections by correctly attributing cause to its
> respective effect.

So was Aristotle a _quack_ as well?

I ask because, he was pre-science, and pretty much laid the foundation for
what became the scientific-method. (i.e empiricism).

Perhaps before you dismiss large bodies of knowledge you should look up the
history of science, and see that it has flaws in and of itself...

------
hodder
Norvig on Chomsky: <http://norvig.com/chomsky.html>

~~~
Goronmon
This is actually linked in the article.

------
jmmcd
> if success is defined as getting a fair approximation to a mass of chaotic
> unanalyzed data, then it's way better to do it this way than to do it the
> way the physicists do, you know, no thought experiments about frictionless
> planes and so on and so forth

This is factually incorrect! For physicists, those thought experiments are
absolutely essential, and we would need many, many more orders of magnitude of
statistical processing of video signals in order to get close to the real-
world-useful physical predictions that we arrive at through thought
experiments, equations, and so on. The contrast that Chomsky is missing is
that for language, the statistical processing is amazingly successful, and the
thought experiment style of investigation, while productive, has not been
shown useful in real world tasks like translation.

For those arguing against Chomsky, none of the above means that we should
abandon a theory-driven or symbolic approach to language.

If Chomsky and his opponents would just recognise that they have different
goals (not just different ways of approaching the same goal), we wouldn't have
to have this same argument every few months.

------
cuspy
Those who argue that Chomsky singularly "pioneered" or "revolutionized" the
study of formal language should thoroughly read the book "Linguistics and the
Formal Sciences" by Marcus Tomalin. It is a great historical account of the
development of this particular strain of formal linguistic study. Knowing more
about the intellectual environment and his predecessors and contemporaries
helps to erode the mythology of Chomsky as the sole revolutionary catalyst in
the development of formal language theory. In fact the major principles of his
classic theory of syntax can be thought of as fairly incremental developments
from previous work. Many of the specific claims he is well known for had been
made by others before.

As for his contributions to cognitive science, I think one side of the field
simply feels that he is clinging to some outmoded notions of what Bayesian
modeling can achieve in terms of explanatory power.

As a counterpoint, EVERYONE should read Andy Clark's beautifully written BBS
paper "Whatever Next? Predictive Brains, Situated Agents, and the Future of
Cognitive Science."

------
chj
Given enough time, a monkey can type out a Hamlet. But who can live that long
to wait? I think that is why I tend to agree with Chomsky.

------
achy
Like many things, Chomsky is both right and wrong. He's right in that if we
studied the structure and functioning of the brain we could built more
accurate AI. He's terribly wrong because the structures of the brain are fuzzy
and give rise to probabilistic and 'statistical' functioning that is itself
based on training (much like the models he derides).

~~~
zerostar07
How are brain structures fuzzy?

~~~
jklio
Optical illusions where we see one thing then another are numerous, e.g.
<https://en.wikipedia.org/wiki/Rubin_vase>

In these there is obviously some contest going on between fuzzy classifiers,
as there is in conceptual association games, misinterpretations of song lyrics
between people and errors like the Freudian slip. There are at least large
parts of our brains that seem to operate in this manner.

That said our use of logic and reason certainly says there is a part of our
brain that works in a non-fuzzy way, or at least can be trained to work like.
However, while there are people who understand the odds and are just there for
a good time, it's instructive to go to a Casino and see how many people
believe they can win and believe in lucky charms.

This topic is a minefield of semantic games with hidden assumptions and people
arguing across each other though.

~~~
zerostar07
I think you mean human behavior is fuzzy. Brain circuits are sometimes chaotic
but pretty deterministic. The sensory input is noisy though.

------
ChristianMarks
_Like maybe when you add 7 and 6, let's say, one algorithm is to say "I'll see
how much it takes to get to 10" -- it takes 3, and now I've got 4 left, so I
gotta go from 10 and add 4, I get 14. That's an algorithm for adding -- it's
actually one I was taught in kindergarten. That's one way to add._

I get 13.

------
jimdanz
I just came here to thank OP for passing single_page=true in the url.

------
anigbrowl
Meh, he's just trotting out warmed-over Searle without bothering to give his
ideas any credit. I have to admit to an intense dislike of Noam Chomsky.

------
dbaupp
AI hasn't "gone wrong" yet. At least, we don't know that it has, and we can't
truly know until we've solved it (for some concrete value of solved).

------
sh_vipin
To some extent, I agree. But for some reason, I have a feeling that "Wearable
Computing" is gonna have similar fate.

------
brianstorms
Chomsky's famous review of Verbal Behavior was from 1959. He REISSUED it in
1967, as the document the article links to plainly and clearly states. Geez,
this is a pretty stupid mistake for Katz and The Atlantic to make. They should
fix it.

------
confluence
Noam Chomsky irritates me here's why - he's vague, so astonishingly vague that
he can hide his uselessness within it.

> _Chomsky derided researchers in machine learning who use purely statistical
> methods to produce behavior that mimics something in the world, but who
> don’t try to understand the meaning of that behavior. Chomsky compared such
> researchers to scientists who might study the dance made by a bee returning
> to the hive, and who could produce a statistically based simulation of such
> a dance without attempting to understand why the bee behaved that way._

\-- [http://www.tor.com/blogs/2011/06/norvig-vs-chomsky-and-
the-f...](http://www.tor.com/blogs/2011/06/norvig-vs-chomsky-and-the-fight-
for-the-future-of-ai)

What does that even mean?

> _But the number of parameters in his theory continued to multiply, never
> quite catching up to the number of exceptions, until it was no longer clear
> that Chomsky’s theories were elegant anymore. In fact, one could argue that
> the state of Chomskyan linguistics is like the state of astronomy circa
> Copernicus: it wasn’t that the geocentric model didn’t work, but the theory
> required so many additional orbits-within-orbits that people were finally
> willing to accept a different way of doing things. AI endeavored for a long
> time to work with elegant logical representations of language, and it just
> proved impossible to enumerate all the rules, or pretend that humans
> consistently followed them. Norvig points out that basically all successful
> language-related AI programs now use statistical reasoning_

> _But his fundamental stance, which he calls the “algorithmic modeling
> culture,” is to believe that “nature’s black box cannot necessarily be
> described by a simple model.” He likens Chomsky’s quest for a more beautiful
> model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his
> lack of satisfaction with answers that work. “Tide goes in, tide goes out.
> Never a miscommunication. You can’t explain that,” O’Reilly once said,
> apparently unsatisfied with physics as an explanation for anything._

\-- [http://www.tor.com/blogs/2011/06/norvig-vs-chomsky-and-
the-f...](http://www.tor.com/blogs/2011/06/norvig-vs-chomsky-and-the-fight-
for-the-future-of-ai)

AI went wrong when Chomsky came around with his rule based translation ideas
that were hideously wrong and probably set us back 20 years - see here:

<http://norvig.com/chomsky.html>

He's a more irritating linguistic version of Richard Dawkins (who doesn't have
an active research career).

> _Every time I fire a linguist, the performance of the speech recognizer goes
> up_

\-- <http://en.wikipedia.org/wiki/Frederick_Jelinek>

~~~
Cacti
In which way is he vague? He basically reinvented a Turing Machine with human
language and brought linguistics around to the idea that, yes, language isn't
something that's vaguely "out there" tabula-rasa-style, it's built into our
genetics at a very fundamental level. Fundamental enough that he tied
linguistics DIRECTLY to math and from there to programming. The Chomsky
Heirarchy is no joke.

Your link relating to statistical models is only a tiny, tiny part of
Chomsky's fundamental arguments and even then is debatable.

~~~
duaneb
> it's built into our genetics at a very fundamental level.

Chomsky's evidence for this is.... iffy at best. Yes, I think we are
predisposed to HAVE language, but I don't think we can learn as much as he
proposes about the structure of modern language from the human genome.

~~~
corporalagumbo
1) You should read The Language Instinct.

2) I don't think the problem with learning about language from the genome is
specific to language. There are just so many layers of molecular interactions
between the genetic code and activity at our level of reality that trying to
link the two is incredibly difficult, and we are not even close to having the
computing power or theoretical models necessary to link them up. But that
doesn't mean that language and genes aren't linked.

------
naturalethic
Wake me when Chomsky finds some principals.

