Chomsky's one paragraph quote at the beginning of this article is more clear and thoughtful than the rest of this. I feel the author's missing the point.
In the case of language, observing and reporting statistical probabilities in written/spoken language output does very little to explain the cognitive systems used in acquiring and using language. Even one statistical anomaly serves to show that statistical learning is NOT the entire picture when it comes to language development.
There was another article on HN a while back that had another great quote from Chomsky that does well to illustrate what I feel is his main point here: "Fooling people into mistaking a submarine for a whale doesn't show that submarines really swim; nor does it fail to establish the fact". Creating a computer that can produce millions of grammatical utterances does little to show that we understand language systems. Now, if a computer could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances, that's a different story. But that story will take a lot more than statistical learning to write.
Chomsky is just appealing to our own biases. We don't want to be statistical approximation machines, so that makes it easy to dismiss attempts to mimic us with statistical approximation machines.
However, the preponderance of evidence* so far suggests that we are just statistical processing machines. Hence why Chomsky seems way off the mark.
*We know that various layers in the visual and auditory systems basically just compute ICA, and we know that the brain is incredibly plastic. Large areas can be removed and the remainder will compensate. That makes it seem likely that all neurons compute something like ICA (or at least that degrades to ICA when confronted with visual or auditory input.)
Chomsky is just appealing to our own biases. We don't want to be statistical approximation machines
Do we? Intellectual cynicism that makes people to publicly reduce humans and humanity to something mechanical and predictable is probably the most popular attitude I see online. This doesn't mean it's wrong in all cases, but surely it's not something exceptional.
The kind of intellectual cynicism you describe is often a prerogative and mode of those highly educated in the sciences, and therefore of a very small majority of humans generally, who otherwise in my experience do tend to think of themselves as agents free of statistical determinations and as having wills and minds rather than cogs of any sort.
Statistical learning as a field benefits humans most when its augments our actions. I think instead of comparing humans and statistical algorithms to see who fares better, we should focus on how the two can blend together, and help each other out. As the author points out, all the success stories are largely man-machine collaborations (imagine search engines without user data and inputs).
Thanks for the comment, brockf. I'm sorry the essay didn't make sense to you. Let me try again on a few points.
Are you saying that one statistical error in a probabilistic model makes the entire model wrong? Then you'd equally have to say that one logical error in a categorical model makes it equally wrong. And manifestly, there are many logical errors in all grammars. So I'm not sure what your point is here.
I'm interested to know: I quoted Chomsky: "That's a notion of [scientific] success that's very novel. I don't know of anything like it in the history of science." Do you agree with him? If so, do you judge all the Science and Cell articles as not being about accurately modeling the world and only about providing insight? Or do you think Chomsky meant something else by that?
I understand that there are two goals, accurately representing the world, and finding satisfactorily simple explanations. I think Chomsky has gone too far in ignoring the first, but I acknowledge that both are part of science. I further think that statistical/probabilistic models of language are better for both goals. This is obvious to me after working on the problem for 30 years, so maybe it is hard for me to explain why. I think Manning, Pereira, Abney, and Lappin/Shreiber do a good job of it. Also, I don't see how a system that successfully learns language could be anything other than statistical and probabilistic. I agree it is a long ways away ...
>I further think that statistical/probabilistic models of language are better for both goals.
Could you give some concrete examples? As a linguist, I don't see that statistical models are currently giving us much insight in those areas where current syntactic theory does give some insight. So for example, we don't seem to have learned much about relative clauses, ergativity, passivization, etc. etc. through these models. On the whole, statistical methods seem very much complementary to traditional syntactic theory. This seems to be Chomsky's view also:
"A quite separate question is whether various characterizations of the entities and processes of language, and steps in acquisition, might involve statistical analysis and procedural algorithms. That they do was taken for granted in the earliest work in generative grammar, for example, in my Logical Structure of Linguistic Theory (LSLT, Chomsky 1955). I assumed that identification of chunked word-like elements in phonologically analyzed strings was based on analysis of transitional probabilities — which, surprisingly, turns out to be false, as Thomas Gambell and Charles Yang discovered, unless a simple UG prosodic principle is presupposed. LSLT also proposed methods to assign chunked elements to categories, some with an information-theoretic flavor; hand calculations in that pre-computer age had suggestive results in very simple cases, but to my knowledge, the topic has not been further pursued."
>Or do you think Chomsky meant something else by that?
He presumably means what he said, namely that merely creating accurate models of phenomena has never been the end goal of science. You acknowledge this yourself when you say that you take both modeling and explanation to be part of science.
What about the middle ground of structured probabilistic/statistical models? By introducing strong assumptions and prior information you create models that still have great flexibility, but have meaningful parameters which can be interpreted theoretically. These appear to me to solve both Chomsky's apparent non-interpretive model complaint and the technical problem of training a model with a large number of parameters.
On one end of the continuum, n-gram models for large n with infinite training data estimate the empirical distribution of language and thus are the best you can possibly do. On the other end, rule based grammars directly transcribe intelligible "rules" of language generation and comprehension. Both ends are clearly fraught with problems.
In the middle we have topic models, recursive grammars, decision trees, various ad-hoc smoothing methods, each of which both allowing for more tractable training and introducing more meaning to the parameters of the trained model.
I feel like effort here provides (somewhat unsatisfactory) answers to both criticisms. I think it's fair to say that probabilistic/statistical models deserve more attention in a lot of fields in order to overcome a history of neglect, however.
>In the case of language, observing and reporting statistical probabilities in written/spoken language output does very little to explain the cognitive systems used in acquiring and using language.
Unless, of course, those cognitive systems are nothing more than some statistical probabilistic mechanism. I don't know anything about the field, but the article was interesting to me in that it seemed to at least partly argue that. I know, for me at least, I'll frequently produce a sentence and then repeat it to myself a few times to see if it "sounds right." Now, I don't know what is happening to determine that, but perhaps I'm comparing it to some statistical probabilistic model I have in my head?
> Even one statistical anomaly serves to show that statistical learning is NOT the entire picture when it comes to language development.
1) Does it? Maybe it shows the specific statistical probabilistic model in question is wrong. Consider, as Chomsky did, a model which predicts zero probability for a novel sentence. Clearly, as you say, one anomalous novel sentence is all it takes to disprove such a model. But what about other models which can handle them? The "anomaly" may not be an anomaly anymore.
2) Do you have some anomaly in mind which shows statistical probabilistic models don't work?
The article was very interesting to me, but I don't know anything about the field. I guess my main question boils down to: Is it possible that language acquisition and production is nothing more inside our heads than a simple statistical probabilistic model?
"Now, I don't know what is happening to determine that, but perhaps I'm comparing it to some statistical probabilistic model I have in my head?"
I had a non-native Japanese teacher once who, when asked a question on proper Japanese usage, would often stop for a second, clearly playing the sentence or phrase over again in his head, and way "no, they don't really say that" or "yes, they do say it that way."
Clearly, he was using his extensive experience listening to Japanese over many years to determine grammaticality, so at least a statistical model, if not conclusively a probabilistic one.
A simple statistical model is probably not the only thing human infants are using when they learn language. Linguists make a pretty good case that there must be some structure in-place for infants to acquire language robustly, quickly, and with the kinds of noisy input (overheard speech) they have to work with.
It's not my field so I can't give examples off the top of my head, but the argument involves rapid acquisition of syntax and near-complete absence of errors that you'd expect to see in a simple statistical model.
Exactly. Almost everyone can identify recursive grammar (except people in a small South American tribe who speak a non-recursive language).
You don't need a raw MC to assess the likelihood of "The DOG ate my homework", "My WASHING MACHINE ate my homework", and "MY LEGALLY ate my homework". You need P(WASHING MACHINE = NOUN), and P("My NOUN ate my homework").
But that's not right. You could also have P(WASHING MACHINE = NOUN THAT CAN EAT STUFF). Or maybe P(EAT = HUMOROUS TERM FOR DESTROYED), P(HUMOROUS SENTENCE), P(WASHING MACHINE = OBJECT THAT CAN DESTROY HOMEWORK).
Anyway, it's really bloody hard to put it all together. But that's what humans do. I'd imagine that we store it in our short term memory, then make a few quick parses of it, under varying assumptions, and keep the ones that are most consistent.
In reality, Chomsky is fighting the same sort of battle that happened when Newton and Leibnitz were around (no, not the battle between Newton and ... the rest of the world really). OK, you have gravity. But what causes it? Why? It's an interesting question, but not necessarily one that will lead anywhere.
In the case of "my washing machine ate my homework" and other non-standard expressions, many people will be confused by making the obvious associations. It's only when the new rules are explained to them that they come to understand what was meant.
The failure of a machine to understand a sentence given a set of rules, may simple mean that it needs to be taught new rules.
If that was true, then why did humans evolve to speak at all? Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made? And how do we make completely novel utterances that attempt to express our otherwise abstract thoughts?
> Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made?
Why not? Look at it from the bottom up:
Communication is a fundament of life, from intra-cellular to inter-cellular to inter-organism interactions (another fundament is the ability to keep oneself in a low entropy state, at the expense of the rest of the world).
Human speech is an evolution of mammal communication. It grew up in complexity, from grunts and other basic noises, along with our way of living, up to what we have now.
> And how do we make completely novel utterances that attempt to express our otherwise abstract thoughts?
Speech is a big collage. New is either the result of
* a recombination of the sub-parts of past speech
* the definition of a new word in terms of older words, or sometimes arbitrarily (for proper nouns).
There's a big difference between "grunts and basic noises" and language. Or at least, that's my opinion. In this same line, I don't believe dogs/monkeys/birds/bees have language, despite the ability to communicate.
This view is just to simplistic to hold its weight when you really look at the intricacies of language and its evolutionary history which, by the way, I would suggest comes from manual gesture and not grunting.
> There's a big difference between "grunts and basic noises" and language. Or at least, that's my opinion. In this same line, I don't believe dogs/monkeys/birds/bees have language, despite the ability to communicate.
This view is just to simplistic to hold its weight when you really look at the intricacies of language and its evolutionary history which, by the way, I would suggest comes from manual gesture and not grunting.
But you're probably right about gestures.
Wild chimps have a vocabulary of about 66 signs. We can also observe tribes with languages more primitive than ours (no pronouns, for example). But there's a missing link of several millions of years of evolution between both.
What are the (known) intricacies of the evolution of our ability to communicate?
There's no definitive proof for the statistical argument, but a growing amount of (neuro)scientific evidence points to it. What's (are) your alternative hypothese(s)?
I think that most people who believe in some form of the motor theory of speech perception will also believe that speech evolved from manual gesture.
Others scoff at the motor theory. In fact, I'd say I'm in the minority by bringing it up with any regularity.
If the question, what is "known" about the evolution of our ability to communicate, I wouldn't have much to point you towards. Most is theory based on modern evidence, somewhat like armchair psychology. Other people point to our ability to integrate non-verbal gestures into our comprehension, activation of our motor cortex prior to semantic/phonetic network activation when disambiguating difficult speech sounds, our ability to synthesize visual/auditory sources of information when the visual information relates to speech gestures (mouth/tongue movements), etc.
Why, if speech is simply a reaction to statistics we are tracking and behaviours that have been rewarded, would the first utterances have been made?
That criticism can be lobbed at all abilities that we claim came about due to evolution - which, to be clear, is all of them. The statistical model would be the mechanism, but it wouldn't be the reason why it evolved. That answer is relatively boring, and is the same one as all evolutionary processes: it appeared randomly from mutation, and it provided benefit to those that had it.
Excellent questions! That I hope someone will investigate. But brockf seemed skeptical that it was even possible for there to be an evolutionary process that produced humans with a statistical-learning-engine in their brains for language. Which I find curious, since - and this is my point - the same can be said for everything that is a result of evolutionary processes. That is, his complaint has nothing to do with language and statistical processoes. The same complaint could be lobbed at eyes.
Just to clarify my position (as it is misunderstood above): I believe it is one of the most important factors in acquiring language. 100%. However, I personally believe that it's a domain-general tool exploited by a domain-specific language module adhering to evolved instincts in language acquisition.
And why can't that domain-general tool be some kind of statistical machine? I ask this because I don't see why what you said is incompatible with it - in fact, I agree with what you said - but I suspect that the mechanism is probably statistical in nature.
There's no doubt that Noam Chomsky founded a paradigm of academic activity. Linguists can generate an unlimited number of papers and monographs by finding problems and proposing intellectually convincing solutions.
From an engineering standpoint, however, Chomsky's view of grammar has been remarkably barren when it comes to machine processing of natural language. It's made a major contribution to artificial languages but despite a lot of effort it hasn't added much performance to what can be done with statistical methods.
I'd agree that a hidden Markov model that does POS tagging with high accuracy doesn't provide an intellectually satisfying model for "how language works", but you don't need to have a model for "how language works" in order to use it.
I feel there is excessive emphasis on "what Chomsky said" and "what Chomsky did". Norvig chooses to point out that the principles and parameters framework is, let's say, imperfect, but, well, Chomsky would agree. Moreover, if you zoom out and stop obsessing about quotes from "Syntactic Structures", you will realize that a lot of the work that's being done in theoretical linguistics is not quite as barren. Yes, statistical methods for (say) anaphora resolution can be extremely efficient, but basically very few people had thought about anaphoric relations in any systematic way before generative linguistics came around.
Moreover, rule-based NLP approaches also have their place, and they are often the direct result of theoretical advances. A case in point is the modelling of morphophonology (which is necessary for spell checking, dictionaries and text generation for morphologically complex languages): many successful approaches are those based on finite-state machines, which could not have happened without Johnson and later Koskenniemi using them to formalize the rule-based approach pioneered by Halle and (yes) Chomsky (well, not quite, but this is still the point of reference for rule-based phonology).
(I am a theoretical phonologist, but my colleagues who do actual NLP work of this type tell me that statistical methods aren't that great for the sort of work they do.)
There is no guarantee that the Kolmogorov complexity of any interesting system will fit inside the rational parts of our heads. There pretty much is a guarantee that we will not be able to fully understand our own brains, using our brains; the part of our brain that can understand things is just dwarfed by the size of the rest of it. (We really do quite a lot with not very many free neurons.) Even if there is a generative theory that can explain human speech in less bits than a direct lookup table, there's no guarantee we can find it, and the null hypothesis must be that we won't because there is no such theory.
We should look for it, but we should not expect to find it.
I'd agree that a hidden Markov model that does POS tagging with high accuracy doesn't provide an intellectually satisfying model for "how language works", but you don't need to have a model for "how language works" in order to use it.
I'd agree that an adhoc equation that fits observed core sample data doesn't provide an intellectually satisfying model for "how sedimentation works", but you don't need to have a model for "how sedimentation works" in order to use it.
That's actually from something I worked on a long time ago. People actually use adhoc models of sedimentation and the formation of sedimentary rock for practical purposes. I also think most people suspect that we'd learn something valuable by figuring out the underlying reason why the data fits the particular description.
I think Norvig acknowledges the point you are making here, namely that the statistical approach does not explain the cognitive systems behind language. However (if I understand correctly) he implies that those systems might be too complex to be adequately explained, let alone emulated and we can achieve more by observing them as black boxes, analyzing their outputs, i.e. language as it is used.
"if a computer could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances"
To perfectly achieve this goal, you might have to simulate 4 billion years of evolution under the same conditions as it happened on Earth, and a few thousand years of cultural evolution as it led to our languages and our cultural context. Language is incredibly complex and changing, many of its details might be incidental, i.e. results of random events, so it seems unreasonable to pretend that we can deduce it all from some elegant first principles. At least that is my reading of Norvig's argument.
> I think Norvig acknowledges the point you are making here, namely that the statistical approach does not explain the cognitive systems behind language.
If that is the case, then the argument that Norvig is making is irrelevant to the argument Chomsky is making. Chomsky simply makes the point that statistical accounts lack explanatory adequacy. As someone who has worked closely with many of his students and who has received extensive training on his scientific program, I can say with confidence that Chomsky would have no objection whatsoever about the usefulness of statistical approaches to linguistic engineering problems. The results speak for themselves. He would go on to say, however, that how well a statistical approach solves a linguistic engineering problem is irrelevant to the question of how humans do what they do.
The answer to the question may well be statistically grounded. That is a valid hypothesis and a logical possibility which should be taken seriously. However, it is incumbent on the proponents of such an answer to provide evidence that it is what humans are doing. Here are some examples of the kinds of evidence necessary:
* evidence that humans are capable of performing the kinds of computations that the statistical approach requires,
* evidence that the statistical approach works with the relatively limited amount of data that a human receives,
* evidence that the statistical approach fails in ways that humans fail
How well a statistical approach succeeds at an engineering task is not an item on this list, simply, again, because engineering tasks are irrelevant to what humans actually do.
Let me specifically say that statistical approaches are not, from the start, ruled out as potential candidates for the algorithms underlying human language. It's just that a case has to be made for them using the right kind of evidence.
Finally, I'll reiterate what others have pointed out: from a scientific perspective, that something is hard to explain doesn't mean that we shouldn't try. And, those that have given up (as you suggest Norvig has) shouldn't fault those who haven't for calling them out on it.
In situations like this, I tend to speak in theoretical absolutes. A computer that "could - like humans - learn to produce infinite, novel, contextual, and meaningful grammatical utterances" isn't even on the timeline right now, but it's the theoretical goal in showing that we understand language acquisition (ontogenetic development), evolution (phylogenetic development), and production.
Just because that goals seems unattainable doesn't, to me, mean that we need to aim any lower. Now, this is premised on my belief that mimicking phenomena with statistical learning is not as intellectually satisfying as understanding the underlying cognitive systems, but that's not believed by everyone.
I agree that learning to produce and interpret varied utterances is a worthy goal, but the fact is that (far as it still is today) lowly statistical methods have gotten us closer to this goal than the other, chomskian approach. It could be a situation where aiming lower lets you shoot higher.
This is a fundamental misunderstanding of what modern generative linguistics is all about (to be fair, it is extremely widespread). The aim of this branch of science is expressly not to "learn to produce and interpret varied utterances" (called E-language in the jargon), but to understand the cognitive processes behind the production and interpretation of utterances (called I-language). Now you may agree or disagree with the methods and assumptions used in the pursuit of this goal, but it is patently unfair to accuse the field of failing to do something it never set out to do.
You provide no evidence for the last statement: "that story will take a lot more than statistical learning to write."
The existing evidence overwhelming suggests that a computer that can "learn to produce infinite, novel, contextual, and meaningful grammatical utterances" will be based on probabilistic models. In fact it's hard to imagine how it could possibly be otherwise.
The computer is observing noisy sensory input and is trying to make inferences about how to communicate with some future reader. Mathematically, there is only one way to write this problem: probability. It's true that the learned model may have amazing structure to it, but this will almost certainly be learned via probabilistic models rather than being hand-coded by some future Chomsky.
That fact does not imply we will understand language systems or the human mind. The Chomsky route may be better suited for that task.
Where is the existing evidence? And what evidence in modern science can possibly look into the future and make a prediction about something that right now is so far beyond our grasp?
Statistical learning explains a lot. I'm a huge fan of it. Skinner's Behaviourism also explained a lot in psychology. But, just as in the case of behaviourism, I fear that statistical learning has/will hit a wall at which its explanations become futile and overly simplified.
My personal belief is that, at that wall, we'll see that human language instincts and evolved language-specific mechanisms will be what we are looking for.
Consider catching a ball. We know how to design a robot that will catch a ball: it will be the hardware for moving an "arm" and a "hand" for the catching, as well as computer hardware and some software for the logic. The software will solve differential equations in order to predict where the ball will be, and when to move the "arm" and "hand" to the correct spot in order to catch it.
No one, as far as I know, argues that humans actually solve differential equations in their head when they catch a ball. They just... catch it. Perhaps with some failed attempts along the way, but as a part of growing up, we learned basic eye-hand coordination.
The notion that syntax and grammar as we have formalized it exist in our brains is the same as saying that differential equations exist in our brains. I find it much more likely that we innately have rough models for syntax, grammar, mechanical movement and object trajectories, but that it takes significant trial-and-error for us to tune those models until the point of competence. I think these models have to be at least partly statistical - otherwise, we wouldn't need to learn anything - and that while our formalisms may be nice approximations of what we do in our brains, I see no reason why they have to be exactly it.
By actually I meant solve them in the same way that you and I solve them; analytically, using our formalisms. Rather, I'm proposing that our brains are using some statistical model that gives results pretty damn close to what the analytical answers would be. And that something similar is true for syntax and grammar.
I don't think Norvig was arguing Chomsky was completely wrong about what he said, more that statistical models are a hell of a lot more important than Chomsky implies.
Looking at the statistics and evidence is of great importance in trying to form models and answers to the "why" questions. Although mimicking a bee dance may not mean we understand it, it does provide a basis for founding and comparing theories.
When is the pretending so good that it ceases to be pretending? How much Mocking does the Mocking Bird need to partake in before it is the Creating Mocking Bird?
What it didn't seem like Norvig got was difference between understanding and a highly sophisticated pretender. Gut Level vs Self Aware intelligence. Both are valid forms of intelligence but only one is a valid form of understanding.
I think statistical methods are a form of intelligence that are highly mechanical and could never achieve human level cognition (ie fart jokes). But I could be wrong, usually am more than half the time.
the problem is whether or not there is any way to "ground" meaning. for physics, the "unreasonable effectiveness of mathematics" might suggest that there are simple "meanings" that underly physical "laws".
but there's nothing to say that the same is true for intelligence or language. maybe the brain is nothing more than a particularly flexible "neural net", which statistical methods are modelling quite well. in that case, "intelligence" is not qualitatively different from "a good simulation of intelligence".
the same problem occurs in free will - does it "really" exist? if we're just (mechanical, predictable, although highly complex) machines then it is difficult to imagine how it can. yet the intuition is that there is clearly some meaning to the idea of a "free agent".
these are hard questions. people don't know the answers. instead we look to what daniel dennett calls "intuition pumps" (see his book "elbow room" on the free will problem) - simple parallels that "feel right". from those, we use intuition to argue in one direction or another. but the problem with that approach is that it depends on what you choose as a "hint".
some advance is being made through experiment. imaging of neural activity in the brain, for example, or the recent discovery that people who believe they have free will behave differently to those that don't.
This is not a new debate. Within Linguistics there has been a continuous push against statistical NLP models. Read the introduction of Manning's book, even he seems to be defensive about NLP.
Chomsky is a colossus, his achievements are well-known. However, at one point in many disciplines it comes to pass that the pioneers who pave the way in time become the very impediment to new ideas. His emphasis on Semantics have warped the minds of many generations of researchers (and some other ideas on universal grammar, too).
I experienced this first hand, my advisor, Prof. Raskin, a great researcher on semantics, nevertheless thought that statistical approaches were not the way to go. Sadly, in many Linguistics departments people are just not equipped with the statistical tools necessary to have a basic understand of what's being done in the NLP field. So NLP is generally taught under CS, EE, or CompE.
"If Chomsky had focused on the other side, interpretation, as Claude Shannon did, he may have changed his tune. In interpretation (such as speech recognition) the listener receives a noisy, ambiguous signal and needs to decide which of many possible intended messages is most likely. Thus, it is obvious that this is inherently a probabilistic problem, as was recognized early on by all researchers in speech recognition..."
This is the money shot especially since speakers are aware of the interpretive activity of listeners, and effective speakers play constantly on the ambiguities in their statements - structural (i.e. grammatical) ambiguities as well as semantic ambiguities. Listeners in turn are aware of speakers' awareness of this.. There is, effectively, an infinity of mutual awarenesses of structural ambiguities. In any instance of communication.
I think most technologists and (especially) businesspeople see this intuitively. I think many academics do not. Not sure how to articulate what I mean but I think I am saying something non-trivial about academics and their perspective on language.
> I think I am saying something non-trivial about academics and their perspective on language.
I became convinced that there is a strain of thought, one that is especially pervasive in the academia, which believes that knowledge/meaning is something irreducible and almost mystical. It probably has to do with the fact that people who fetishize knowledge as something incredibly worthwhile for its own sake end up being overrepresented in the academia. Those who are a bit more cynical/nihilistic tend to go into finance or start their own companies.
The old advice "do not make any gods to be alongside me" is still relevant except for the "alongside me" part, which probably only has any meaning if you consider yourself religious. I have a feeling that many academicians, especially the old-school ones, idolize knowledge to the extent of ascribing to it god-like powers even if said knowledge has little relevance for anything practical.
Sorry about the intermittent access. My hosting service provides me with sufficient bandwidth, but only provides a version of Apache that forks a new process for every GET, and thus runs out of processes and denies access to a portion of visitors when I get slashdotted/redditted/hacker-newsified. If anyone can suggest a more reasonable hosting service, let me know.
It's funny. Lately I've been working with NLP systems and in the last few years there are a few really good parts-of-speech taggers that are about 99% accurate. All the ones I know of are based on hidden markov models, which definitely would disappoint Chomsky.
Part of the trouble w/ Chomsky is that real language doesn't draw a clear line between syntax and semantics. Even though an HMM doesn't correctly model the nested structures that are common in natural language, it makes up for it by encoding semantic information.
Another trouble is that human beings are innately probabilistic when it comes to language. A sentence written/spoken by humans does not have to be gramatically correct, to convey it's meaning, and does not always follow the strict rules that Chomsky talks about.
It's not the language that defines how we communicate, it's how we communicate defines the language.
But I also disagree with peter when he says the why is not important, it is this why or the understanding of the matter that separates us from the machines like watson, since our sole purpose in life is not to win at a game, but play/enjoy the game and most importantly "reuse the understanding" gained in some other facet of life, a feat that I beleive no machine is capable of.
>It's funny. Lately I've been working with NLP systems and in the last few years there are a few really good parts-of-speech taggers that are about 99% accurate. All the ones I know of are based on hidden markov models, which definitely would disappoint Chomsky.
No, it wouldn't disappoint him at all. In fact, one of his earliest works in linguistics discussed how transition probabilities could be used for chunking and categorization. (See http://www.tilburguniversity.edu/research/institutes-and-res... ) It's not as if Chomsky ever presented part of speech tagging as a poverty of the stimulus argument.
>He doesn't care how the tides work, tell him why they work. Why is the moon at the right distance to provide a gentle tide, and exert a stabilizing effect on earth's axis of rotation, thus protecting life here? Why does gravity work the way it does? Why does anything at all exist rather than not exist? O'Reilly is correct that these questions can only be addressed by mythmaking, religion or philosophy, not by science.
Science doesn't really aim to answer the 'why'-questions, but rather the 'how'-questions. The scientific method boils down to falsifying hypothesis, and it's a lot easier with 'how does the tide work?' than 'why does the tide work (the way it does)?'.
Science can't say anything about 'Why does anything at all exist rather than not exist?', because there is no way to test any of the answers. So it's left to mythology, religion or philosophy to answer.
> Why is the moon at the right distance to provide a gentle tide, and exert a stabilizing effect on earth's axis of rotation, thus protecting life here?
A possible answer to this stems from the anthropic principle. We evolved in a place with a moon because the moon helped us evolve. We don't see no moon because complex life such as us would not have developed without it. A stable rotation and gentle tide are conducive to the evolution of complex organisms; tides were instrumental in getting life out of the seas and onto land.
"Why is the sun the way it is?" can be answered similarly. A smaller star has too small a habitable zone where liquid water can exist. A larger star would have burned out sooner than the 4.5 billion years it took to develop sapient life. A double star has a much smaller set of stable planetary orbits. That the sun is an appropriate star for our life on earth is not divine providence or an enormously unlikely coincidence; it's the result of a universe-wide scenario of statistical multiple endpoints.
This is a good example of why (how?) language is so weird. Maybe I am just satiated, but for an inquisitive mind, to me "Why is the moon in the sky?" and "How is the moon in the sky?" parse out to be semantically equivalent. Science (astronomy) does try explain how (why?) we exist and under what circumstance the universe came into existence (if it did).
I interpret this sort of question not as asking for a further step in a causal chain, but rather as demanding a teleological explanation where none is available.
While I disagree with almost everything Chomsky says about everything, and I think it was meant to be somewhat sympathetic, it's really unfair to propose an affinity between Chomsky and O'Reilly in this manner. What the hell.
Equally unfair is Norvig calling Chomsky a mystic for his invocation of Plato. Chomsky is a rationalist, not a mystic.
What science often does is that it responds to a "why" question by an analysis of the phenomenon and presenting its causes in some lower-level terms. But, from a certain viewpoint, that is not a satisfactory answer.
Take physics, for example. It can tell you why some objects behave the way they do by telling you there are certain particles, interacting forces, etc.. In this way you can explain, say, the photoelectric effect.
But it isn't really an answer to the "why" question, is it? It just pushes the question one level lower. Why are there such and such particles and forces? Why the constants? The very nature of these answers is descriptive. It is a description of how the world works, not why it works that way.
Maybe asking "why" in this ultimate manner is an ill-posed question - but that's not the deal here. It just doesn't seem that science in its current form, unlike religion or philosophy, could ever even attempt to answer it.
Don't get me wrong, I'm strongly atheistic myself, but there are some inherent limitations of scientific exploration and clarification with respect to the answers it can provide.
> My jaw is on the floor. It drives me nuts when people go from 'We can't explain that yet' to 'The only explanation is God.'
I don't think anyone mentioned God, and, to be fair, the problem of what exactly constitutes reality and what would be the best ways to imitate it is quite complex. We, as a species, have been trying to find the rational answer to this problem for at least 2,500 years (since the pre-Socratics), but as far as I know we haven't come to any definitive answer, we don't even know if there is such an answer.
I think the whole point of that part of the article is that the only answer that could satisfice O'Reilly and viewers is that of religion, and Norvig says Chomsky has a philosophy "(some would say religious belief)" i.e. some unscientific belief that "language should be simple and understandable", which is balderdash to claim that is a religious viewpoint, in my opinion.
There are several ways one could model language, from a top down purely statistical approach that Norvig likes, something in the middle which Chomsky proposes, to a bottom up neural model of chemical interactions. There are advantages and disadvantages to each method for many different reasons.
> There are several ways one could model language, from a top down purely statistical approach that Norvig likes, something in the middle which Chomsky proposes, to a bottom up neural model of chemical interactions.
Yeah, I was just trying to take a step back (and maybe I was too OT, I agree), but at some point we should start asking ourselves more fundamental questions. Anyway, this discussion is way over my head, I'm just glad that HN users think there's an answer for everything, is like Godel or Kant have never written anything in their entire lives.
> And why exactly are myths, religions or philosophy better at answering questions than science? What’s the justification for that claim?
Science assumes that there is a "justification" (for language, the universe, you name it). A true philosopher always begins by asking himself if there is such a thing as "justification". Just think about it, science has helped us put a man on the Moon and create the Tsar bomb, but it isn't able to answer Epimenides's "All Cretans are liars" paradox, a 2,500 years-old problem.
If you can find a solution it’s no longer a paradox. Paradoxes are defined† as un-answerable. Whoever created the paradox made a mistake and merely created something that looks like a paradox.
“Paradox” is just a word. There is nothing special about something being a paradox (except that paradoxes are cool to think about).
†That’s at least a common definition. That’s where I’m getting such crazy ideas like “paradoxes have no answer.” You are free to define “paradox” some other way. Be assured that it was never my intention to claim that the definition I used is in some sense the true definition. Definitions are all about communication (you need to agree on definitions in order to be able to talk to each other), not tools for finding the truth.
If whoever created the paradox had the goal of creating a paradox (as commonly defined) and ends up with something that has an answer (i.e. with something that is not a paradox as commonly defined) that person has failed to achieve her or his goal of creating a paradox. It is likely that the reason for this failure is a mistake the person made while creating the paradox. Other explanations for such a failure are also possible.
That’s the ultra verbose version of that sentence. I can crank the verbosity up quite a bit still but I would rather not want to.
The point of reference you are asking about is the goal of creating a paradox. That was sort of implied but seem to be quite a fan of verbosity. It is, of course, possible that someone – for example – just stumbles upon something that looks like a paradox. The mistake would then be the identification as a paradox.
I’m not really sure what you are trying to tell me with your last point. You started trying to define paradoxes some other way as they are commonly defined. (Quote: “You take as a given that paradoxes are or at least should be ‘un-answerable’.” – thereby implying that according to you definition of “paradox”, the same can have answers.) I wouldn’t have brought definitions up otherwise.
I agree with you that merely defining doesn’t tell you much (maybe nothing) about the nature of reality and said as much. Definitions are for communication, no tool for finding truth. Those tools are the meat of science, not definitions.
Some questions are metaphysical, not because they are complex, but because they are ill-posed and not subject to falsifiable experimentation of observation.
BTW, a lot of real-world phenomena, like some large events of history or sociology or macroeconomics, fall into the same category of scientific unapproachability, due to practical limitations of our civilization and any plausible future civilization.
The handshake example was illuminating. Three "equivalent" theories:
Theory A: Closed form formula function.
Theory B: "Algorithm". Still a function.
Theory C: Memoized function (constant time!)
According to the article "nobody" likes C, especially the article's Chomsky straw man. If one had a procedure to convert C to A, then this whole issue would become hairsplitting. Such a procedure would aim to convert a memoized function back into a form that uses more symbols from a mathematical language. A good criteria of success would be the description length of the resulting procedure in the preferred language. One reason this could be useful to science is that once you identify a value that is useful in many theories it becomes part of the language. Making it available to the next problem may speed up the search for a "good" description of the next phenomenon. Identical procedures that appeared in various algorithms might acquire a special name. One such value might be called "pi", another "foldr" and so on.
Of course there may be many good descriptions, just as there are many languages. Also, the example could be extended to statistical modeling situations by adding room for error terms in the suitability criteria.
So if, you have a general procedure to convert a table into a definition you can make money and science at the same time!
My conclusion is that 100% of these articles are more about "accurately modeling the world" then they are about "providing insight," although they all have some theoretical insight component as well.
Before you can figure out why, you have to make sure you can accurately characterize the what. So there's a lot of science that is focused on coming up with a descriptive tool like an adhoc curve, before the underlying principles are discovered.
I think Chomsky is afraid that statistical models will cause people to stop looking for the underlying principles.
This essay made me think: Lojban (http://www.lojban.org/tiki/la+lojban.+mo), among constructed languages, is the categorial language par excellence. Every word has a well-defined range of meaning; the grammar can be parsed by the same kinds of parsers used for programming languages; potential sources of ambiguity, like plural references, associativity of modifiers, and negation, have been rigorously (or tediously, depending on how you roll) nailed down.
Can there be such a thing as a conlang that demonstrates the ideal statistical grammar and semantics? (“All the words in this list are 60% likely to be used as nouns and 40% likely to be used as verbs....” But in the absence of a pre-existing linguistic community, how could you get students of the language to use them in the right proportions?)
Are Norvig's comments on the "I before E except before C." really vaild? Why would one use a corpus for analysis of the rule, and not a dictionary? It appears to me that "CIE" (P(CIE) = 0.0014) is more common than "CEI" (P(CEI) = 0.0005) because the words that contain the "exception" "CIE" are used more frequently in the corpus than the words that follow the rule "CEI". Once you know the limited number of exceptions (in the dictionary sense) the rule appears to preserve its relevance.
Hmm... I never thought of it that way. That sports are a weighted random number generator,but the various weights are unknown. And the commentators are discussing theories as to what the weights are, and how derived. (Although the cartoon seems to be saying the narratives are just about the numbers generated, which is more cynical, and frankly less interesting).
This whole theory vs observation argument exists at the very pinnacle of human thought, expressed in the Copenhagen interpretation. If you want to contribute to the human understanding of this, you'll have to beat Bohr and the uncertainty principle.
My claim wasn't first, it was top. You up-end the Copenhagen interpretation, show the universe really is deterministic, and every other argument on this subject, in every discipline, collapses. As it is, the arguments are almost certainly failed, but it's not quite a cinch. Because probability admits determinism as a special case. One of the deep points of Norvig's essay.
It is interesting than on a completely different debate, chomsky takes norvig's position (he is accused of not looking for a "theory" and "whys" and he replies that it is pragmatic results that matter):
And while it may seem crass and anti-intellectual to consider a financial measure of success
Why are other metrics Norvig provides like articles published or prevalence in practical applications are considered more intellectual?
And besides, I don't think "accurately modeling the world" is the end of it. Classical Newtonian mechanics correctly describe 99% of our activities in the real world and were considered pinnacle of scientific achievement for several centuries. Yet we know today that they're just a subset of General relativity and Quantum mechanics.