I'm not a linguistics expert, but from my vantage point as a computer scientist, it would be hard to conclude that. Chomsky essentially founded the field of formal language theory, and was the first to propose many widely-used constructs, such as the context-free grammar. They may or may not explain how English works, but pretty much everything on the Chomsky hierarchy has been put to productive use in computers.
In a much more general sense, his arguments against blank-slate learning are pretty mainstream in machine learning as well. Chomsky's specific approaches aren't too widely used (though there is some work on grammar induction), but the idea that inductive bias is key to machine learning is widespread, though perhaps in a weaker sense than the very specific and strong inductive bias Chomsky proposes for language learning.
Chomsky did great work for CS (and NLP) through formal language theory, but his work in linguistics -- in particular, his anti-empiricism and hostility to statistical views of language -- was very unhelpful for getting computers to understand language. See e.g. http://www.cis.upenn.edu/~pereira/papers/rsoc.pdf
Put more simply, I'm saying that he did valuable scientific work in developing his formal analysis of grammars, even if it fails to capture human language. He intended it to capture human grammar, but it's an interesting computation-amenable model of grammar even if it isn't how humans speak, because it is pretty much how we now write programming languages. When computer scientists today talk about "grammars", for example someone saying that they're writing "an ANTLR grammar for Clojure", they mean it in the Chomskian sense.
Or are you saying that Freud, despite being entirely wrong, got things moving and eventually helped folks to start taking psychology as a serious science? Or another point I am not understanding?
Yes looking back Freud's theories are bogus. However, that is only looking back and taking in consideration what has happened after. However, whatever happened after in the theory of how mind and human psychology was very much influenced by Freud. A lot of it was a reaction but it was still a reaction to something. Those that came after studied Freud and it was their background.
To say it another way, it hard to predict what would have happened if Freud was not there. We could have been in a worse shape maybe today.
I think that his general linguistic theories might share the same fate as Freud's theories: large, complex theories for which there is simply not much empirical support, making it difficult for people to continue to work on them without their originator imbuing them with his authority.
Though you have to wonder how much of that is a lack of interest in research. PCRE has been shown to not fit his hierarchy, they are more than regular but less than context-free.
It certainly makes me doubt language (the technology of it) as purely cultural and learned.
So we can say first and formost that language is influenced by genetics. Having a tongue, vocal chords, hands, ears. These all help to acquire language.
So now the question is to what degree does genetics influence language and language acquisition?
Is it fully governed by genetics? We know that removing a child from the company of others during development eliminates complex language and severely stunts their ability to acquire language. So, in a single individual, language has never been observed to arise spontaneously and the facility to acquire language is something you can lose.
If you damage certain portions of the brain aspects of language can be removed. See Fluent Aphasia for a particularly odd example. However, these aphasias are known to remit and studies show that other portions of the brain have taken over from the damaged portion. So we know that the capacity for language is not entirely localized to a genetically determined location.
We also know there was a time when there wasn't language and now there is, so at least once in human history language did arise spontaneously. The article you mention is evidence that language can evolve spontaneously in groups of humans. In fact there is evidence that the rudiments of language arise in groups of many animal species including apes, whales, and dolphins. Many animals communicate, but with insufficient sophistication as to be described as a language.
So it is safe to say that there are genetic factors in producing communicative behavior. Sounds, postures, marking etc. I feel it is also safe to say that there are genetic factors that predispose groups of some species to develop more complex systems of communication (but these are not limited to humans) and that groups of humans are particularly good at complex communication.
Unfortunately for Chomsky there is little to no evidence that any one type of language is more likely than another. Grammars, phonemes, words, abstract concepts ... all of these have huge between language variation. The similarities between languages are well explained by either physical characteristics of speech production or regularities in the environment of the languages origin.
So yes, language is both learned, and cultural, even if it isn't purely so.
Consider a different example: constructing artificial vision. Human vision is the result of evolution, of course. It is incredibly inefficient, but evolution is blind (sorry). When we now construct computer vision, we capture an optical image and transmit it optically as long as possible, since this preserves information. The human eye does not: it sends information via neurons to the visual cortex, compromising immensely in bandwidth. That's why we need visual error-correction mechanisms in the brain, and redundancy of visual information (achieved by the eye moving rapidly many times per second, for example).
When we construct optical computer vision that achieves the a similar thing as human vision, the two have nothing in common. You can't plug in the artificial front end into the biological back end. The two systems produce literally different images, that are not comparable. The systems will not communicate with each other. We have skipped the legacy of biological evolution completely. To create an artificial system that accurately corresponds to the biological would be an immense waste.
The same goes for intelligence. Our 'wet' evolved intelligence is an entirely different picture from the project of Strong AI. They will produce very different manifestations of interacting with the world. Why would the latter ever result in the illusion of free will or the self-referentiality of personal identity, two things we assume are parts of human-level intelligence? They are like the neurological channels for conveying an optical image: hopelessly inefficient, but the ones that make sense in the light of our evolutionary legacy.
As a side note, yes we know of a time when there was no language, but it not a good representation to consider that a binary switch. We know of complex communication between other animals, and given that even current languages are in rapid flux, I think it is is fair to think of the transition from pre-language to language a continuum. Language is still arising.
That said his point about the brain being wired to take the less computationally intensive route is a very important insight which I think extends beyond genetics and throughout the evolution of all biological processes.
I don't have his books on me, so I may misconvey this point, but IIRC, he also mentioned regular languages (you know, like with regexes) as another example of a computationally "easier" language family we don't pick up with our language organ. We don't speak arbitrary languages. The space of languages is filtered by genetics.
He delves into this point more deeply in many places and addresses the points you raise with a precision beyond what's found in interviews.
I suppose you could turn it around - assuming our language processing is optimal (bit of a leap) you can infer things about our hardware architecture by the languages which we parse efficiently.
— Noam Chomsky
Sadly enough for (U.S. foreign policy and its supporters) he actually supports his claims very well with sources and footnotes, and if you read through this works, you might find that perhaps there is a reason others (who don't just watch Fox News) don't agree with said policy, and also that somehow Americans in certain parts of the world are not "hated because of our freedoms". There are other reasons.
There's a reason no one outside the campus left takes Chomsky's political opinions seriously, and it's not a massive conspiracy.
But the idea that Chomsky's political "contributions" somehow dwarf his contributions to science? That seriously floors me.
I think it would be nice to have some more references or more explanation before declaring Chomsky detrimental to science.
But he dominated. Where was Norvig or IBM's Watson back then?
Unfortunately things like these don't happen in a vacuum. You can only say looking back that this is didn't or that was a bad idea. You should have said that at that time when the theory was advanced or came up with a better one.
We didn't exactly live in in a totalitarian regime where say some Politburo dictated what the official theory about Genetics should be and everyone else gets sacked, and now finally we have freedom from oppression and we can get back on track.
Ironically that is something that people allude to with regards to Chomsky (ironic because of his anarchist political beliefs); cf. the book the Linguistics Wars. The bottom line is that his theories were very dominant, not because of overwhelming empirical support, but because of his authority.
I don't think that requires that we view language solely as a probabilistic map between sounds and objects. All sorts of emergent behavior appear to be "magical" at first glance.
From what it sounds like, you are just dismissing compelling philosophical issues because it frustrates your beliefs.
As an epiphenomenon of neural firings? Or is saying that 'not allowed' somehow?
The interesting point is the expression power of your model. : to take an example I am somehow familiar with, current large vocabulary speech recognizers have millions of parameters. They work relatively well, but they are very difficult to interpret, and it is hard to see how they help us understanding how speech recognition actually works in our brain.
To make a somehow flawed analogy, every Turing complete language is equivalent, but getting the machine code of a very large project is not very interesting if you want to understand it, while it is mostly enough if you just want to use it.
In general he does. In this article he doesn't talk about he talks about approaches to AI.
> there have been some very good baby studies that show babies inherently know statistics needed to learn probabilistic associations.
Very good. Can we identify how that works and then build a robot that has the same mechanism in a more efficient way than simply simulating a brain at a molecular level. That is his argument here.
> They didn't understand neurons and biology and genetics so well then, so yay, magic things are possible!
So where were you 4-5 decades ago when he proposed his theory to propose a better one?
Why is a real observation from your senses more privileged inside your brain that a random well-formed value by a (hypothetical) random number generator neuron?
As to your second question, I don't see how it relates to my argument, but I'll answer anyway. If you're comparing a observation to a random number, you're looking at the observation qua value, in which case it has the same status. If, however, you look at the level of interpretation (what it means in your brain), the observation has a complex set of relations with the rest of your brain and gives rise to a perception, wheres the random number value is just noise that has to be tolerated by the brain.
Yes you can get some things to work and some to work well but the idea is that perhaps there is a better model that describes the mechanism or the encoding of meaning. That's what Chomsky is trying to say in this particular article. Some stopping at a brute force approach is a fine engineering approach but that doesn't mean everyone should, it is still worth trying to find a better model for it, if at least, just to gain an understanding.
Oh, so, linguistics is just magic then?
Chomsky seems to keep using naïve models as a strawman, and Norvig rightly calls him on it. If you use simple models, you can only get simple insights, but statistical machine translation (for example) builds probabilistic context-free grammars, which maps human notions of language far better than "make sure every three words in sequence is plausible".
You're responding that he's wrong, because it's useful on an engineering level.
Right? I'm reading many comments here and they seem to keep boiling down to this notion. Am I wrong?
If, like Chomsky, you value having a model of the underlying cognition process rather than a set of black-box predictors for aspects of that problem (e.g., various corpus-driven translators), then you might be really annoyed that the black-box people are so satisfied with their results.
Or in your terms, we may not get to pick the prettiest models, but we owe it to ourselves to explore the space of models to see if we can find the structure in it.
The engineer in me is pleased by the undoubted success the data-driven learning culture has had on problems of real importance. But this work is highly empirical, with a tendency to point solutions, and someone is likely to come in later on and generalize these methods (e.g., why do some families of black-box predictor or features outperform others for language learning). There's room for both approaches.
Norvig's reply to Chomsky's original remark contains a reference to Leo Breiman's well-informed remarks on this question (http://projecteuclid.org/DPubS?service=UI&version=1.0...).
Breiman, as author of basic books on measure theory as well as on classification trees, was able to walk both sides of this line ("make a first-principles model" vs. "use lots of data"). He spent considerable energy over the years trying to introduce the data-intensive approach to conventional statistics. For instance, he was one of the handful of bona fide statisticians who would attend and contribute to neural net and machine learning conferences. Probably this strategy is more productive than Chomsky's grumpy-old-man warnings (or sagacious warnings, depending on how you look at it).
Economics also play a large part in how information is parsed. The advancement of AI outside of academia is largely dependent on what it's being used for and how it's being used. Where great strides are being made is in search because it can be monetised and the computational power required is commensurate with the number of users/frequency of use and ROI. A complex model that can provide better insights but limits the number of concurrent users isn't as useful in a commercial sense.
If you're interested in this area, which you might call the philosophy of artificial mind, or philosophy of cognitive science, I strongly suggest reading the link to Norvig's article (linked in TFA, but here it is again: http://norvig.com/chomsky.html ). In particular, I'd suggest reading it before reading the actual interview with Chomsky (maybe ideally reading it after the prefatory part of TFA).
My own inclination is that on the face of it, I find Norvig's approach less satisfying, as Chomsky appears to, but upon much consideration my current belief is that Chomsky's approach is too mystical and too just-so, and Norvig's approach at least has the merit of bearing fruit... fruit that one day might be concentrated into a concise and elegant theory.
You seem to agree with Norvig that doing massive data analysis on language will come to a scientific understanding, which would be a first in science.
Chomsky doesn't. If anything, Chomsky is grounded in reality, and Norvig and AI researchers are grounded in hope that this way of mapping out something will create meaningful understanding of the system.
Another example might be the winning entry in the Netflix competition. If your goal is to predict film preferences (which is Netflix's goal, of course), it appears to give pretty good estimates. But I don't think even its authors would claim that it's a a scientific model of how humans form preferences.
In both cases the underlying problem is that there are fairly general functional forms, such as a few terms of a Taylor series, or a forest of decision trees, that can empirically model almost anything to a certain degree of accuracy, given enough data, even if the underlying process looks nothing like them. Therefore they can give accurate predictions that work in practice, without being accurate models of what's happening in the underlying system. Chomsky appears to be of the opinion that statistical NLP systems are more of that variety, so may be good engineering solutions without being good scientific models.
What I don't get is, what Chomsky is saying is all together the standard, and yet people are insulting him for what is all together a very simple idea you expressed plainly.
Yes, NLP systems are going to have many engineering uses, and Chomsky agrees. Are they going to help in the true scientific understanding of the systems?
It's unlikely. It's likely to be "good engineering solutions without being good scientific models" as you elegantly put it.
So, what we have is (1) the engineering / statistical modelling / machine learning approach, and (2) the deep theoretical "Chomsky approach".
Chomsky despises, maybe rightly so, the engineering approach because it only provides tools that work, approximately, in practice but don't provide any deep "scientific" understanding.
The deep theoretical approach has a vision of a comprehensive theory that really provides understanding. Once we manage to find the deep fundamental theory, practical applications will be a child's play.
But here's the catch: Has the Chomsky approach takes us any closer to that deep theoretical progress? Why has all the practical progress come from the engineers? What if the deep theoretical thinkers are completely lost, like the alchemists in dark medieval times, in their theories and as they despise the data-driven approach, they also refuse to let empirical observations guide them to a right direction.
I don't think looking down on the engineers and their modest practical success is any kind of merit, if your only merit is dreaming of a deep theory, but making no measurable success towards said theory.
I see what you are saying and actually it is a good point. Where are the robots built on Chomsky's theory? A very valid question. I don't know the answer to it, Chomsky doesn't either. But I think what you mean by practical progress isn't what he mean progress. That is his point.
You have to see where he is coming from. He is an academic his ultimate goal is to understand how things work. Training a set of neurons with input data and ending up perhaps with millions activation weights in the end is not helping that goal even if this new machine can play chess, make coffee and drive you to work. I think that is his take on it.
I say we need both. There is no reason to not strive for both. There is not reason to turn all radical and start burning books and claim one approach should completely replace the other. I hope we one day find (or find that we can't find) a good explanatory model for meaning, language, learning, personality, or conscience, but in the meantime I enjoy playing chess with my computer, and I hope pretty soon I'll have my car drive me to work by itself.
Early L. Ron Hubbard presented a theory (Dianetics) on the causes of mental illness. It's a theory alright, just not a very good one, and not very testable. We would still benefit from a better theoretical understanding of human mental health and illness, but we should not take just anybody, just because he is a deep thinker and has a theory.
Maybe (probably) all the sophistication of natural language is an emergent property of the pile on of lots of little similarly-shaped details like atoms. Sure, high level rules are nice approximations that satisfy our human craving for patterns, but that doesn't mean those patterns are how the brain really works, quite the opposite in fact.
High level models are illustrations, useful for game programmers and artists to efficiently create simulations and plausible imaginary creations. Low level models are how things actually work.
You can hardly blame the AI researchers for sticking with methods that have been very successful (at least practically speaking) after they had their funding cut out from under them in the 90's for the perceived "failures of AI".
I do like the idea of developing a theory (e.g. vision is processed via algorithm in the brain represented by X) and then attempting to find evidence of that. It can help to avoid the reductionist idea that you need to model everything down to the cellular level in order to understand the brain. It's like trying to disassemble code in memory and then read the algorithm rather than examining every 1 and 0 in memory and trying to make heads or tails of them.
That's enough rambling from me..
If the extreme "design" part the spectrum is what Chomsky talks about: "develop a theory of how something happens and use it to find evidence that supports or refutes that theory", then the extreme end of the "evolution" part of the spectrum must be this:
Choose a set of rules, use genetic programming techniques to rearrange those rules. Evidence you require will be acquired and utilised by the algorithm itself. The only crux is the set of rules to choose from and implementation of the technique. IMO, the more complicated the rules and the more they interact with each other, the better.
...only then can you get results like this: "...Five individual logic cells were functionally disconnected from the rest– with no pathways that would allow them to influence the output– yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones."
While we don't know whether the design approach will work, we have examples of evolution creating working intelligence.
The article completely misunderstands what Behaviorism actually is. Once again, the misunderstanding comes from confusing the Methodological Behavoirism advocated by John Watson, Edward Thorndike and others, who did indeed try to model people as a "black box", with Skinner's Radical Behavoirism, which denies that there's even a box to be opened; rather, Skinner holds that there is only a locus where a series of environmental processes happen to converge and interact in interesting ways. Some of these processes are much older than the others, and are expressed in genes; others are relatively newer, and are learned. Skinner did not deny that genetics played a role in language acquisition, nor did he ever maintain people are born a blank slate. While Skinner found Chomsky's work regarding universal grammar unconvincing, he maintained that it was not, as Chomsky claimed, directly opposed to his own work -- the two theories were orthogonal, and would succeed or fail independently of each other.
In any case, this kind of antagonism between the two approaches might be useful as it keeps the field more vital, preventing yet another stagnation.
The whole disagreement stems from the idea that this will help us with a scientific understanding of language or not. The burden is on Norvig's side to prove it, and it hasn't.
How many physicists really believe they fully understand quantum mechanics? The theory is probablistic and strange but produces very accurate predictions.
The true measure of science is matching observations to hypotheses, and in that respect the approach that Norvig defends has demonstrated success. Google's language tools work well much of the time. Watson beat Ken Jennings.
Other branches of science are beginning to make more use of "big data" approaches as well. A friend doing post-doctoral research on evolution spends most of his time behind a laptop coding against big sets of digitized genetic information.
He doesn't think that. Look at his linguistics work - google Chomsky Hierarchy. He very much does not think that it's magic.
All he's saying is that some kinds of statistical modeling - while stupidly useful and practical - don't give us a lot of explanatory power.
He's, metaphorically, complaining about folk who are happy using Boyle's Law because "it works", when he'd like more folk figuring out what atoms are all about.
What it strongly suggests is that in the evolution of language, a computational
system developed, and later on it was externalized.
(Not claiming Chomsky is wrong,just a similar figure making a similar claim.)
Anyway, suffice to say, AI and AGI didn't stop progressing, and Chomsky is no longer any sort of expert in those fields.
Even Norvig isn't up to speed on the most advanced approaches to AGI, but at least he enters the same room with people who are aware of the field. For example, he gave a talk at the recent Singularity Summit.
The Fifth Conference on Artificial General Intelligence is going to be in Oxford in December. http://agi-conference.org/2012/
Here is some information for people who are interested in pertinent ideas related to AGI.
>OpenCog is a diverse assemblage of cognitive algorithms, each embodying their own innovations — but what makes the overall architecture powerful is its careful adherence to the principle of cognitive synergy.
>The human brain consists of a host of subsystems carrying out particular tasks — some more specialized, some more general in nature — and connected together in a manner enabling them to (usually) synergetically assist rather than work against each other.
> PLN is a novel conceptual, mathematical and computational approach to uncertain inference. In order to carry out effective reasoning in real-world circumstances, AI software must robustly handle uncertainty. However, previous approaches to uncertain inference do not have the breadth of scope required to provide an integrated treatment of the disparate forms of cognitively critical uncertainty as they manifest themselves within the various forms of pragmatic inference. Going beyond prior probabilistic approaches to uncertain inference, PLN is able to encompass within uncertain logic such ideas as induction, abduction, analogy, fuzziness and speculation, and reasoning about time and causality.
Conceptually, knowledge in OpenCog is stored within large [weighted, labeled] hypergraphs with nodes and links linked together to represent knowledge. This is done on two levels: Information primitives are symbolized in individual or small sets of nodes/links, and patterns of relationships or activity found in [potentially] overlapping and nesting networks of nodes and links. (OCP tutorial log #2).
Large-Scale Model of Mammalian Thalamocortical Systems
> The understanding of the structural and dynamic complexity of mammalian brains is greatly facilitated by computer simulations. We present here a detailed large-scale thalamocortical model based on experimental measures in several mammalian species. The model spans three anatomical scales. (i) It is based on global (white-matter) thalamocortical anatomy obtained by means of diffusion tensor imaging (DTI) of a human brain. (ii) It includes multiple thalamic nuclei and six-layered cortical microcircuitry based on in vitro labeling and three-dimensional reconstruction of single neurons of cat visual cortex. (iii) It has 22 basic types of neurons with appropriate laminar distribution of their branching dendritic trees. The model simulates one million multicompartmental spiking neurons calibrated to reproduce known types of responses recorded in vitro in rats. It has almost half a billion synapses with appropriate receptor kinetics, short-term plasticity, and long-term dendritic spike-timing-dependent synaptic plasticity (dendritic STDP). The model exhibits behavioral regimes of normal brain activity that were not explicitly built-in but emerged spontaneously as the result of interactions among anatomical and dynamic processes. We describe spontaneous activity, sensitivity to changes in individual neurons, emergence of waves and rhythms, and functional connectivity on different scales.
Essentials of General Intelligence: The direct path to AGI
>General intelligence, as described above, demands a number of irreducible features and capabilities. In order to proactively accumulate knowledge from various (and/ or changing) environments, it requires:
>1. Senses to obtain features from ‘the world’ (virtual or actual),
>2. A coherent means for storing knowledge obtained this way, and
>3. Adaptive output/ actuation mechanisms (both static and dynamic).
>Such knowledge also needs to be automatically adjusted and updated on an ongoing basis; new knowledge must be appropriately related to existing data. Furthermore, perceived entities/ patterns must be stored in a way that facilitates concept formation and generalization. An effective way to represent complex feature relationships is through vector encoding (Churchland 1995).
>Any practical applications of AGI (and certainly any real-time uses) must inherently be able to process temporal data as patterns in time – not just as static patterns with a time dimension. Furthermore, AGIs must cope with data from different sense probes (e.g., visual, auditory, and data), and deal with such attributes as: noisy, scalar, unreliable, incomplete, multi-dimensional (both space/ time dimensional, and having a large number of simultaneous features), etc. Fuzzy pattern matching helps deal with pattern variability and noise.
>Another essential requirement of general intelligence is to cope with an overabundance of data. Reality presents massively more features and detail than is (contextually) relevant, or that can be usefully processed. This is why the system needs to have some control over what input data is selected for analysis and learning – both in terms of which data, and also the degree of detail. Senses (‘probes’) are needed not only for selection and focus, but also in order to ground concepts – to give them (reality-based) meaning.
> A typical HTM network is a tree-shaped hierarchy of levels that are composed of smaller elements called nodes or columns. A single level in the hierarchy is also called a region. Higher hierarchy levels often have fewer nodes and therefore less spacial resolvability. Higher hierarchy levels can reuse patterns learned at the lower levels by combining them to memorize more complex patterns.
> Each HTM node has the same basic functionality. In learning and inference modes; sensory data comes into the bottom level nodes. In generation mode; the bottom level nodes output the generated pattern of a given category. The top level usually has a single node that stores the most general categories (concepts) which determine, or are determined by, smaller concepts in the lower levels which are more restricted in time and space. When in inference mode; a node in each level interprets information coming in from its child nodes in the lower level as probabilities of the categories it has in memory.
>Each HTM region learns by identifying and memorizing spatial patterns - combinations of input bits that often occur at the same time. It then identifies temporal sequences of spatial patterns that are likely to occur one after another.
My fundamental aversion to both OpenCog and the entire Singularity crowd is a) their statements are so general as to the point of being useless and b) they don't do anything. Google makes search simple - go to google.com and find out. Google makes cars drive themselves - ask Nevada/California and if you're a member of the press - request a test drive today. IBM's Watson definitively beat world champions in front of everyone and before that they did it with Blue Gene.
Everyone in the other communities fall under this category: All talk - no walk.
The entirety of what I've gotten out of both groups is essentially little more than what religious people get out of going to a sermon at a church. The future will be grand, lots of bullshitty buzz words, lots of hand waving with huge claims - no hard calculations, no hard examples of what they've actually achieved.
I'll stick with Norvig/Google and his/their demonstrated achievements and knowledge over the talk, hype and vaporware projects of groups that have yet to show any hard progress apart from a bunch of lectures to rich people with a lot of vague words.
The SENS movement gives me the exact same feeling.
All talk - no walk.
Comparing Google Search and IBM Watson to OpenCog and other early-stage research efforts is silly. Google Search and IBM Watson have taken fairly mature technologies, pioneered by others over decades of research, and productized them fantastically. OpenCog is a research project and is aimed at breaking fundamentally new research ground, not at productizing and scaling-up technologies already basically described in the academic literature.
Lecturing is a very small percentage of what those of us involved with OpenCog do. We are building complex software and developing associated theory. Indeed parts of our approach are speculative, and founded in intuition alongside math and empirics. That's how early-stage research often goes.
Of course you can trash all early-stage research as not having results yet. And the majority of early-stage research will fail, probably making you tend to feel vindicated and high and mighty in your skepticism ;p .... But then, a certain percentage of early-stage research will succeed, because of researchers having the guts to follow their intuitions in spite of the ceaseless tedious sniping of folks like you ;p ...
- Ben Goertzel
He's no quack.
> are effective and powerful ideological institutions that carry out a system-supportive propaganda function by reliance on market forces, internalized assumptions, and self-censorship, and without overt coercion
That's pretty self-evident to the point of being, well, pointless - admen of the 60s made their bread using this, and the PR pioneers of the 30s were already experts. But please let's all listen to what he has to say next. Let me guess: killing people is bad, and not killing people is good. If you call that amazing thinking, I'd hate to see the idiotic version.
> Geoffrey Sampson maintains that universal grammar theories are not falsifiable and are therefore pseudoscientific theory. He argues that the grammatical "rules" linguists posit are simply post-hoc observations about existing languages, rather than predictions about what is possible in a language. Similarly, Jeffrey Elman argues that the unlearnability of languages assumed by Universal Grammar is based on a too-strict, "worst-case" model of grammar, that is not in keeping with any actual grammar. In keeping with these points, James Hurford argues that the postulate of a language acquisition device (LAD) essentially amounts to the trivial claim that languages are learnt by humans, and thus, that the LAD is less a theory than an explanandum looking for theories.
Sampson, Roediger, Elman and Hurford are hardly alone in suggesting that several of the basic assumptions of Universal Grammar are unfounded. Indeed, a growing number of language acquisition researchers argue that the very idea of a strict rule-based grammar in any language flies in the face of what is known about how languages are spoken and how languages evolve over time. For instance, Morten Christiansen and Nick Chater have argued that the relatively fast-changing nature of language would prevent the slower-changing genetic structures from ever catching up, undermining the possibility of a genetically hard-wired universal grammar. In addition, it has been suggested that people learn about probabilistic patterns of word distributions in their language, rather than hard and fast rules (see the distributional hypothesis). It has also been proposed that the poverty of the stimulus problem can be largely avoided, if we assume that children employ similarity-based generalization strategies in language learning, generalizing about the usage of new words from similar words that they already know how to use.
Another way of defusing the poverty of the stimulus argument is to assume that if language learners notice the absence of classes of expressions in the input, they will hypothesize a restriction (a solution closely related to Bayesian reasoning). In a similar vein, language acquisition researcher Michael Ramscar has suggested that when children erroneously expect an ungrammatical form that then never occurs, the repeated failure of expectation serves as a form of implicit negative feedback that allows them to correct their errors over time. This implies that word learning is a probabilistic, error-driven process, rather than a process of fast mapping, as many nativists assume.
Finally, in the domain of field research, the Pirahã language is claimed to be a counterexample to the basic tenets of Universal Grammar. This research has been primarily led by Daniel Everett, a former Christian missionary. Among other things, this language is alleged to lack all evidence for recursion, including embedded clauses, as well as quantifiers and color terms. Some other linguists have argued, however, that some of these properties have been misanalyzed, and that others are actually expected under current theories of Universal Grammar.
Looks like I'm not the only one that sees through bullshit.
Let me repeat - just to imprint on people's minds:
> This implies that word learning is a probabilistic, error-driven process, rather than a process of fast mapping, as many nativists assume.
Chomsky's theories are, and always were, DOA.
I wonder if you know you're being ironic here. Plenty of us have never even read Chomsky's political works and have been exposed to him solely through mentions in the CS literature, like the Dragon book, or more in-depth stuff on his theory of context-free grammars. There is a startling amount of proof that he not only writes about politics but, at one time or another, actually worked for a living and helped our field produce useful stuff.
One point: Sampson's criticisms about linguists producing post-hoc descriptions could just as easily have been (and were, I believe) applied to Newton's theories. Good science includes mapping and describing phenomena.
Another point: negative feedback on errors is not enough to account for the explosive speed of language acquisition in children. Not to say that this sort of feedback doesn't occur, or isn't useful, but it only really is used when children learn exceptions (I.e. irregular verb forms in English) or vocabulary (and even much of vocabulary is rule-generated.) Basic language rules are encoded, and children's brains only require minimal stimulus to record the specific settings of the rules for the language they are learning.
Everett is very controversial, for example:
Everett (2005) has claimed that the grammar of Pirahã is exceptional in displaying 'inexplicable gaps', that these gaps follow from a cultural principle restricting communication to 'immediate experience', and that this principle has 'severe' consequences for work on universal grammar. We argue against each of these claims. Relying on the available documentation and descriptions of the language, especially the rich material in Everett 1986, 1987b, we argue that many of the exceptional grammatical 'gaps' supposedly characteristic of Pirahã are misanalyzed by Everett (2005) and are neither gaps nor exceptional among the world's languages. We find no evidence, for example, that Pirahã lacks embedded clauses, and in fact find strong syntactic and semantic evidence in favor of their existence in Pirahã Likewise, we find no evidence that Pirahã lacks quantifiers, as claimed by Everett (2005). Furthermore, most of the actual properties of the Pirahã constructions discussed by Everett (for example, the ban on prenominal possessor recursion and the behavior of WH-constructions) are familiar from languages whose speakers lack the cultural restrictions attributed to the Pirahã. Finally, following mostly Gonçalves (1993, 2000, 2001), we also question some of the empirical claims about Pirahã culture advanced by Everett in primary support of the 'immediate experience' restriction. We conclude that there is no evidence from Pirahã for the particular causal relation between culture and grammatical structure suggested by Everett. -- Pirahã Exceptionality: A Reassessment, http://dash.harvard.edu/handle/1/3597237
Pirahã actually has two color terms, 'dark' and 'light', which is Stage I in http://en.wikipedia.org/wiki/Basic_Color_Terms:_Their_Univer..., http://en.wikipedia.org/wiki/Linguistic_relativity_and_the_c...
Dr. Freud would have had a good deal to say about your apparent fixation with bovine feces...
Seriously though, your comments are playing fast and lose with a range of fields that you’re conflating and dismissing. Not all social sciences are “soft” and many have empirically-based real world applications that shape your (and everyone’s really) everyday lives.
So was Aristotle a quack as well?
I ask because, he was pre-science, and pretty much laid the foundation for what became the scientific-method. (i.e empiricism).
Perhaps before you dismiss large bodies of knowledge you should look up the history of science, and see that it has flaws in and of itself...
Not all of those projects I listed identify themselves as AGI. However, they should go in the same group.
And anyway, all of those projects have demonstrated progress. If you looked into them at all then you would see that. Ben Goertzel is using some aspects of his AGI research in mainstream (narrow) AI projects. OpenCog has released a number of solid demonstrations of current features. And Goertzel isn't hand-waving or bullshitting in his numerous books and scientific papers, for example Probabilistic Logic Networks: A Comprehensive Framework for Uncertain Inference (336 pages).
Hawkins has demonstrated very interesting progress with his software and has a commercial application https://www.numenta.com/grok_info.html
Voss is using his system at Adaptive AI as a commercial enterprise.
Qualcomm is funding Brain Corporation (Izhikevich et al) so obviously they are taking it seriously. A bakery in Tokyo has tested Brain Corporation's machine vision technology to power a semi-automated cashier system
I know Chomsky is a serious scientist with considerable accomplishment.
I have seen totally loony stuff in videos of AGI conferences (Tachyons and stuff). Open Cog may be better than that. But it hasn't proved that it is better than that.
The 1970-80's AI involved the Chomskyan paradigm of "draw up a naive design of the mind and/or brain and implement it". That failed so badly that you need a really good argument why you can do things differently - at least to move into mainstream science. That is, Ben Goertzel seems nice, smart and enthusiastic but I can't see him bringing anything new to the "table". Jeff Hawkins had interesting ideas with his temporal paradigm but it seemed like the model he chose to instantiate wasn't all that different from that used by the statistical-brute-force crowd. And Numenta has had really few announcements for a six year old enterprise.
And the companies paying for AI to be added to their systems. That happened from the start but it wasn't ever enough. What's different here from the stuff from twenty years ago?
AGI is mainstream science, these days. The keynote of the 2012 AAAI conference (the major mainstream AI research conference each year), by the President of AAAI, was largely about how the time has come for the AI field to refocus on human-level AI. He didn't use the term "AGI" but that was the crux of it.
The "AI winter" is over. Maybe another will come, but I doubt it.
What's different from 20 years ago? Hardware is way better. The Internet is way richer in data, and faster. Software libraries are way better. Our understanding of cognitive and neural science is way stronger. These factors conspire to make now a much better time to approach the AGI problem.
As for my own AGI research lacking anything new, IMO you think this because you are looking for the wrong sort of new thing. You're looking for some funky new algorithm or knowledge structure or something like that. But what's most novel in OpenCog is the mode of organization and interaction of the components, and the emergent structures associated with them. I realize it's a stretch for most folks to realize that the novel ingredients needed to make AGI lie in the domain of systemic organizational principles and emergent networks rather than novel algorithms, data structures or circuits -- but so it goes. It wouldn't be the first time that the mass of people were looking for the wrong kind of innovation, hmm?
Regarding tachyons in videos of AGI conferences, could you provide a reference? AGI conference talks are all based on refereed papers published by major scientific publishers. Some papers are stronger than others, but there's no quackery there.... (There have been "Future of AGI" workshops associated with the AGI conferences, which have had some freer-ranging speculative discussions in them; could you be referring to a comment an audience participant made in a discussion there?)
I wish you luck (well sort-of - with great power would come great responsibility and all-that).
I wasn't making up the tachyon guy. If I have time, I'll dig the video (it'd be a little hard since the hplus website reorganized). He was presenter and not an audience member, had at least a paper at one of these conferences. I can easily believe the AGI conferences have gotten better.
I would stick to the point that AGI needs to make clear how it will overcome previous problems - clear to mainstream science is useful for funding but clear to yourselves so you have ways to proceed is most important.
I don't necessarily agree exactly with Herb Dreyfus' critique but I think that in the minimum a counter-critique to his critique is needed to clarify how an AGI could work.
A good summary of his argument would be: http://leidlmair.at/doc/WhyHeideggerianAIFailed.pdf
I mean, I have worked in computer vision (not that much even). There's no shortage of algorithms that solve problem X but nothing in particular weds them together. Confronted with a new vision problem Y, you are forced to choose one of these thousand algorithms and modify it manually. You get no benefit from the other 999.
As far as open source methodologies solving the AGI question, I've followed multiple open source projects. While certain things might indeed work well developed using the "bazaar" style, I haven't seen something as exacting a computer language come out of such a process - languages tend to require an individual designer working rather exactly - with helpers certainly but in many, many situations almost alone (look at Ruby, Perl, Python, etc). I would claim AGI would at least exactly as a computer language, possibly more-so. Further, just consider how the "software crisis", the limitations involved in producing large software with large numbers of people, expresses the absence of AGI. Essentially, to create AGI, you would need to solve something like a boot strapping problem so that you cause the intentions of the fifty or five thousand people working together to add up to more than what fifty or five thousand intentions normally add up to in normal software engineer. I suppose I believe some progress on a very basic level is needed to address.
And, just for comparison, here's the agenda for the most recent ICML conference:
To me, the AGI conference seems to have a much higher ratio of "speculative ideas"/"technical results" talks. Also to me, this pretty much justifies the "all talk - no walk" assessment.
You are correct that the AGI conferences have a higher ratio of "speculative ideas"/"technical results" to ICML. This is intentional and I belief appropriate -- because AGI is at an earlier stage of development than machine learning, and because it's qualitatively different in character than machine learning.
Machine learning (in the sense that the term is now typically used, i.e. supervised classification, clustering, data minign, etc.) can be approached mainly via a narrowly disciplinary approach. Some cross-disciplinary ideas have proved valuable, e.g. GAs and neural nets, but the cross-disciplinary ideas there have quickly been "computer science ized"...
OTOH, I think AGI is inherently more complex and multifarious than ML as currently conceived, and hence requires more "out of the box" and freely multi-disciplinary thinking.
I think that in 10-15 years, when the AGI field is much more mature, the conferences will seem a bit more like ML conferences in terms of the percentage of papers reporting strong technical results. BUT, they will never seem as narrowly disciplinary as ML conferences, because AGI is a different sort of pursuit...
which indicates it's possible to have a selection of papers both technically sharp and interdisciplinary. We should all be so lucky to attract such a set of papers.
I'm reminded of the 1958 editorial by Peter Elias in the IEEE Information Theory Transactions ("Two Famous Papers"): http://oikosjournal.files.wordpress.com/2011/09/elias1958ire...
I sincerely wish you, your conference, and your research enterprise the best.
In terms of engineering yeah, the trend in AI at the moment is applied statistics for sure, and it wins hard.
What does that even mean?
If it's so easy to sum it up in a chapter of a book, why don't they build it and allow others to examine it, submit it for review, write papers and submit to ACM, build fantastic machines based on it? All I want is a bit of proof.
You and I have similar interests: we would like for AGI to happen. Even though I'm not sure what AGI means. It's a sort of dream right now for me, but perhaps more of a reality for you?
Most of that large post is neat, but it's not going to convince me if I've never heard of AGI and if I currently know something about AI.
The most you can do is to go do what I mentioned earlier: go build systems, investigate, write papers and go to conferences and I don't mean conferences where it's just AGI people.
No it's not.
Many of those links in grandparent post were from or about opencog. I can make long blog posts about opencog that refer to opencog as proof, too...but it wouldn't mean anything. Religious people do that sort of thing all the time.
The proof would be in the pudding, right? So if AGI at least has some hypothesis, then it should be able to produce some results, right?
I very much want AGI to happen. You want AGI to happen. Our interests are in agreement. However, there isn't really much proof about any current hypothesis, as far as I can tell, that can produce any real system. It's a dream so far.
I don't mean it has to be a really solid understanding of conscience but it's so undefined, unknown area right now we can't even approach it.
Instead of making long blog posts and replies to comments, and then get offended when people don't buy into it, the most people can do right now is to go investigate, hypothesize and try to build something.
AGI is one of those little goofy microtrends: so far as I can tell it's essentially a rebranding of Strong AI by soft computing reactionaries responding to AI getting dominated by domain specificity (otherwise known as "being successful"). To claim, as the grandparent appears to be doing, that AI is now properly called Artificial General Intelligence, is crackpottery at its finest.
Wake me up when AGI even appears on the first results page of a Google search for "AGI".
I think it's fundamentally the difference between soft bullshit and hard calculations. Everyone can talk about AI, or linguistics, or statistics (or any complex field) in very general, undefined and bullshitty terms.
But what we need, and what the machine learning guys are bringing is hard calculation - 1 + 1 = 2 or input data, get features and make decisions well above human abilities.
My question to all the fringe folks: Where's the beef? What have they done? Where are the automatic cars built on Chomsky's theories? Where are the talking robots from the AGI? What methods have the SENS people got? Are the singularity folks just leeches off gullible rich people - selling them a future and taking their cash in the process without providing any real value?
Chomsky's a pretty substantial walker.
> Every time I fire a linguist, the performance of the speech recognizer goes up.
I still don't see Chomsky robots walking around, Chomsky translation translating my text to French or Chomsky AI driving cars. Nope - all Google/IBM/Microsoft/DARPA/Boston Dynamics/etc. AKA Hard science-engineers utilising statistics, not soft science blowhards.
Only thing I see Chomsky doing is talk - a lot.
The model probably doesn't have a great deal of relation to what happens inside folks heads - but it was stupidly useful for making computers do stuff though. Might be better techniques now - I don't know. But saying that it wasn't applicable practical work is basically ignoring the NLP stuff that was happening in the 80's and 90's.
(let alone the more obvious useful stuff for us geeky folk - the formal grammar stuff we use and think about for compilers. Chomsky Hierarchy, etc.)
And, I'm not enough of a historian of science to know, but it seems to me that the basic results Chomsky proved on Regular Languages and CFGs paved the way for the Hidden Markov Models (HMMs) that have been so effective in language understanding. Basically, the HMMs are the natural probabilistic extension of Regular Languages.
I'm not sure if Viterbi and the other developers of the basic HMM toolkit were directly influenced by Chomsky, or if state machines were just in the air. Certainly Chomsky's basic work in the late 1950s predated Viterbi's work in the 1960s.
Mathematicians aren't - you can prove their correctness over abstract planes and use them to for example run a hedge fund or a software company into trillions of revenue when making testable predictions in macro reality.
Theoretical physicists that use mathematics to make testable predictions are too. You can use them to also make electric engines, statistical extractors and accurate physical simulations that are corroborated with empirical evidence.
If the prediction is not testable, is unfalsifiable, is unreproducible, is not independent and is not supported by overwhelming evidence - it is bullshit - no ifs, buts or ands.
It can only be 'proved' in the 'contrived' world of pure mathematics... what bullshit!
The idea being, that you'd have to back pedal, and change your qualification. Which I could then use to apply to other fields, that you deem as 'quackery', and thus undo the foundation of your argument.
Instead, you just denied the reality of what you said... I didn't count on that. Well done.
Did you really look into AGI, for example the past conferences or those projects, and conclude that it is just invaluable holistic mumbo-jumbo?
That is so unfair and inaccurate, I can't see how you can possibly be evaluating things rationally if you really came to that conclusion.
Anyway, you have to at least include AGI if you are serious about human-like AIs.
I think part of the problem is that we don't even know what AGI really is. How do you define conscience in a rigourous way. Doesn't mean it can't be done without such work but it just seems soooo undefined right now that people are suspicious if someone comes along and claims to have a partial solution.
Do we even know what we're looking for? When we do know or have an idea, I am willing to imagine that we'd have more AI research in that area and the AGI would be taken more seriously.
This is factually incorrect! For physicists, those thought experiments are absolutely essential, and we would need many, many more orders of magnitude of statistical processing of video signals in order to get close to the real-world-useful physical predictions that we arrive at through thought experiments, equations, and so on. The contrast that Chomsky is missing is that for language, the statistical processing is amazingly successful, and the thought experiment style of investigation, while productive, has not been shown useful in real world tasks like translation.
For those arguing against Chomsky, none of the above means that we should abandon a theory-driven or symbolic approach to language.
If Chomsky and his opponents would just recognise that they have different goals (not just different ways of approaching the same goal), we wouldn't have to have this same argument every few months.
As for his contributions to cognitive science, I think one side of the field simply feels that he is clinging to some outmoded notions of what Bayesian modeling can achieve in terms of explanatory power.
As a counterpoint, EVERYONE should read Andy Clark's beautifully written BBS paper "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science."
In these there is obviously some contest going on between fuzzy classifiers, as there is in conceptual association games, misinterpretations of song lyrics between people and errors like the Freudian slip. There are at least large parts of our brains that seem to operate in this manner.
That said our use of logic and reason certainly says there is a part of our brain that works in a non-fuzzy way, or at least can be trained to work like. However, while there are people who understand the odds and are just there for a good time, it's instructive to go to a Casino and see how many people believe they can win and believe in lucky charms.
This topic is a minefield of semantic games with hidden assumptions and people arguing across each other though.
I get 13.
> Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way.
> But the number of parameters in his theory continued to multiply, never quite catching up to the number of exceptions, until it was no longer clear that Chomsky’s theories were elegant anymore. In fact, one could argue that the state of Chomskyan linguistics is like the state of astronomy circa Copernicus: it wasn’t that the geocentric model didn’t work, but the theory required so many additional orbits-within-orbits that people were finally willing to accept a different way of doing things. AI endeavored for a long time to work with elegant logical representations of language, and it just proved impossible to enumerate all the rules, or pretend that humans consistently followed them. Norvig points out that basically all successful language-related AI programs now use statistical reasoning
> But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. “Tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” O’Reilly once said, apparently unsatisfied with physics as an explanation for anything.
AI went wrong when Chomsky came around with his rule based translation ideas that were hideously wrong and probably set us back 20 years - see here:
He's a more irritating linguistic version of Richard Dawkins (who doesn't have an active research career).
> Every time I fire a linguist, the performance of the speech recognizer goes up
Your link relating to statistical models is only a tiny, tiny part of Chomsky's fundamental arguments and even then is debatable.
Chomsky's evidence for this is.... iffy at best. Yes, I think we are predisposed to HAVE language, but I don't think we can learn as much as he proposes about the structure of modern language from the human genome.
2) I don't think the problem with learning about language from the genome is specific to language. There are just so many layers of molecular interactions between the genetic code and activity at our level of reality that trying to link the two is incredibly difficult, and we are not even close to having the computing power or theoretical models necessary to link them up. But that doesn't mean that language and genes aren't linked.
I'm sorry - but when did Chomsky get a degree in biology or neuroscience.
And by the way I do think that judging human performance by simple metrics is problematic, but not because it's statistics or not 'high-level', simply because it doesn't take enough information into account; it's a shortcut to the actual concept of quality, which is dangerous when metrics are used in decision-making. Automated metrics give an air of objectivity which an expert opinion doesn't have, even though the latter may well be much more informed.
Noam Chomsky may very well be a real life link farm or content stuffer. Hence why the impact/importance of the papers that link to him are important.