
Do Colorless Ideas Sleep Furiously? (1997) - phab
http://www.mit.edu/people/dpolicar/writing/proseDP/text/colorlessIdeas.html
======
vilhelm_s
In 1985 some students at Stanford organized a competition:

> you were asked to compose not more than 100 words of prose, or 14 lines of
> verse, in which a sentence described as grammatically acceptable but without
> meaning did, in the event, become meaningful. The sentence, devised by Noam
> Chomsky, was: colourless green ideas sleep furiously.

Four winning entries here:
[http://archives.conlang.info/ga/farzhi/shiarweilwoen.html](http://archives.conlang.info/ga/farzhi/shiarweilwoen.html)

------
0xddd
It's funny the author wasted energy composing this after admitting he barely
knows the origin of the sentence. Chomsky invokes it in "Syntactic Structures"
to illustrate that the grammaticality of a given sentence doesn't fully
explain the odds of it appearing in a large corpus. "Furiously sleep ideas
green colorless" is another low probability sentence, yet a native speaker
couldn't perform these sorts of mental gymnastics to twist some meaning out of
it.

~~~
canjobear
Chomsky was arguing that probability is useless for defining and studying
grammaticality.

I'm not so sure. GPT-2 says

log P("Colorless green thoughts sleep furiously.") = -53.64797019958496

log P("Furiously sleep thoughts green colorless.") = -65.46656107902527

The ungrammatical one is lower probability. But those are famous sentences,
and probably present in the training data, so let's try

log P("Colorless blue ideas hibernate angrily.") = -60.12953460030258

log P("Angrily hibernate ideas blue colorless.") = -70.02637100033462

~~~
0xddd
I think the more interesting result (and more relevant to Chomsky's point)
would be to work in the other direction. If you instead produce a list of
sentences with similar log probabilities you will see that it contains a mix
of grammatical and ungrammatical utterances. This implies something more is
needed to distinguish them.

~~~
canjobear
> If you instead produce a list of sentences with similar log probabilities
> you will see that it contains a mix of grammatical and ungrammatical
> utterances.

Yes, Chomsky mentions this in a footnote. But as far as I know, it hasn't been
tried with modern language models.

There's been some interesting work that tries to reproduce grammaticality
judgments in terms of language model probability after controlling for length
and lexical content. It turns out it works pretty well. For instance
[https://arxiv.org/pdf/1910.14659.pdf](https://arxiv.org/pdf/1910.14659.pdf)

~~~
0xddd
I wish there were a freely available copy online, I could link, but the
passage is at the end of chapter 2 of Syntactic Structures. It's not a
footnote, but rather the crux of his argument, I believe:

> "... a structural analysis cannot be understood as a schematic summary
> developed by sharpening the blurred edges in the full statistical picture.
> If we rank the sequences of a given length in order of statistical
> approximation to English, we will find both grammatical and ungrammatical
> sequences scattered throughout the list; there appears to be no particular
> relation between order of approximation and grammaticalness. Despite the
> undeniable interest and importance of semantic and statistical studies of
> language, they appear to have no direct relevance to the problem of
> determining or characterizing the set of grammatical utterances. I think
> that we are forced to conclude that grammar is autonomous and independent of
> meaning, and that probabilistic models give no particular insight into some
> of the basic problems of syntactic structure."

I do think it's an important point for people to recognize. Scientific
theories don't arise on their own out of large-scale statistical analyses.
There is a lot of faith being put in deep learning methods these days, which
are great for prediction, but not inference.

~~~
canjobear
Thanks for pasting the whole thing. It's an interesting argument. The core
empirical claim is

> If we rank the sequences of a given length in order of statistical
> approximation to English, we will find both grammatical and ungrammatical
> sequences scattered throughout the list; there appears to be no particular
> relation between order of approximation and grammaticalness.

It's totally not clear that this would be true with modern language models,
after you control for (1) the length of the sentence and (2) the words in the
sentence (as mentioned in the thing I linked above).

~~~
0xddd
I will have to take a look at that paper. I didn't catch your edit before
replying. It would certainly be worthwhile to verify that claim (or not) using
the paper's model if I find some time. In any case, I think the underlying
point is that these language models serve a purpose, but will not uncover an
underlying structure for you or derive something like the phrase structure
grammar proposed in Syntactic Structures. I may be extrapolating a bit based
on other times I've seen Chomsky discuss this, though.

------
friendlybus
The author is working very hard to build an indifference to the self into what
is an emotional and psychological meaning to that phrase. Logic and language
rules only get you so far removed from desires and romantic realities. I get
these attempts are innocent from an american cultural perspective, but paul is
not the next colbert.

I never get tired of the internal contradictions people rely on to dismiss
what is in front of their face. The start of this article declares that the
phrase is meaningless and then ends with a fake revelation that the author has
defined it's meaning. The author is so determined that the initial reaction is
meaningless that we need to read three paragraphs about palming off nature to
a language game.

Bring back Steve Jobs.

------
infradig
With the rise of the Green Movement the sentence has actually taken on a
semblance of meaning.

------
tempodox
> Do Colorless Ideas Sleep Furiously?

Racter would definitely say so.

[https://en.wikipedia.org/wiki/Racter](https://en.wikipedia.org/wiki/Racter)

------
phab
I see the title has been modified post hoc to include a year.. but it's wrong.
It should actually be 1997...

~~~
dang
Fixed. Thanks!

