
Gary Marcus's Kafkaesque critique of GPT-3 - 43t344efsg
https://nostalgebraist.tumblr.com/post/628024664310136832/gary-marcus-has-co-authored-a-brief-critique-of
======
gwern
Yes, pretty much. As Marcus admits in his writeup, even despite using _doubly_
adversarially filtered GPT-3 errors, the final errors are due mostly to bad
(no) prompting and then penalizing GPT-3 for not telepathically reading his
mind in order to provide a serious response to a silly question.

I'm about halfway through the first 25 errors or so (it's very tedious so I've
been putting it off), and as expected, GPT-3 is often correct zero-shot if you
use even a trivial short prompt like 'This is a series of scenarios describing
a human taking actions in the world, designed to test physical and common-
sense reasoning.'

~~~
gwern
The point about prompting also applies to the GPT-2 counterexamples Marcus
published earlier this year, and conspicuously fails to discuss in _this_ one,
BTW.

------
goodside
The bigger problem with efforts to assess GPT-3 output using AI Dungeon is
that AI Dungeon doesn’t actually use GPT-3 the for the first response of a new
adventure. If you attempt to start each prompt from a clean slate, you are
actually seeing GPT-2 output. See here:
[https://twitter.com/nickwalton00/status/1289946861478936577](https://twitter.com/nickwalton00/status/1289946861478936577)

~~~
vagab0nd
I've noticed that as well, but what other options do we have? I signed up for
the OpenAI API a long time ago and never got a response.

~~~
goodside
If you're aware of the limitation you could maybe hack around it by putting in
a dummy prompt that encourages GPT-3 to ignore the GPT-2-generated context.

E.g. use a second prompt like: "\n---\nThe preceding message has been a paid
advertisement, and is unrelated to the message that follows.\n---\n" Then you
begin typing the actual text you'd like completed with GPT-3.

I haven't tested this, though, because I don't have a paid AI Dungeon account.

------
liability
I'm pretty sure I read essentially the same rebuttal here weeks ago. That
said, I think it's a valid rebuttal. Would the 'correct' answers be GPT-3
making meta-remarks about how you're giving it stupid prompts? Marcus knew
what his conclusion would be and set out to find 'evidence' to support it. I
think it's telling that the best he could find is so flimsy.

------
nefitty
I‘be been using AI Dungeon in GPT-3 mode all day. It’s insanely addictive.
I’ve run into trouble a few times, but it’s usually due to badly constructed
scenarios.

My biggest success so far is making the original prompt a Wikipedia article
intro and letting GPT-2 consume that. I have subsequently been leading my
“character” through random made-up books, and the game tells me what is in the
books.

I showed my wife and she started cracking up. I asked her what she wanted to
learn about. She asked me for a pasta recipe. I prompted, “You feel hungry.
You open a book titled ‘How to Cook Pasta’ and turn to the chapter with the
heading, ‘The best pasta recipe in the world’. It started giving me a real,
doable recipe! Our minds were blown. I then immediately proceeded to ‘read’ a
book about Albert Camus, which caused it to get stuck in a loop for a bit lol

------
toxik
I think there is a point to be had that GPT-3 is nowhere near human level in
its responses, but is that really the bar to set? It seems extremely high.

This rebuttal asks if a human could continue the slightly nonsensical inputs,
and I feel like we certainly could? “... tried to drink with his eyeglasses;
his schizophrenia was particularly bad today.” While sharing a spoon with your
neighbor is in the realm of what’s possible, it does seem very odd, seeing as
there is no prior relationship established to this neighbor. I also wonder
what would’ve happened if an actual neighbor had been mentioned before, i.e. a
door neighbor. I think any human would then read neighbor as next door
neighbor. Would the network pick up on this and specify seat neighbor?

~~~
mellosouls
_I think there is a point to be had that GPT-3 is nowhere near human level in
its responses, but is that really the bar to set? It seems extremely high._

Unfortunately the hype surrounding GPT-3 (and deep learning before it) wrongly
implies we are making great strides towards AGI; it's inevitable people then
ask for the evidence and set the bar so.

------
zarzuelazen
OK, So I got GPT-3 to do a simulated conversation between itself and Gary
Marcus. Expect for the initial prompt, all output is GPT-3. It breezes through
all the questions of simulated-Gary :D

\---

Prompt:

The following is a question and answer session between AI researcher Gary
Marcus and the OpenAI language model GPT-3. Gary is asking questions about how
things works, GPT-3 is giving logical answers based on it's model of how
things work:

\---

GPT-3 output:

\---

Gary: What is the purpose of the slits on a manhole cover?

GPT-3: To let steam out

Gary: What is the purpose of the handle on a teapot?

GPT-3: To hold on to.

Gary: What is the purpose of a teapot?

GPT-3: To brew tea.

Gary: Why do we need to brew tea?

GPT-3: Because people enjoy drinking tea.

Gary: Why do you think humans invented chairs?

GPT-3: To give their bottoms somewhere to rest while they were standing up.
It's easier to stand up than to sit down, but it's also harder to stay
standing up than it is to stay sitting down. That's why we invented chairs, so
that we could sit down more easily and rest our legs when we were tired.

Gary: Why do people buy new furniture?

GPT-3: To replace old furniture that is no longer useful to them.

Gary: What makes a piece of furniture "comfortable"?

GPT-3: If it is soft and does not cause discomfort. A chair is comfortable
because it has a soft cushion on the seat that allows you to sit down without
feeling any pain; a bed is comfortable because it has a soft mattress that
allows you to sleep in comfort, without feeling any pain or having your body
ache.

Gary: Where is the best place to buy furniture?

GPT-3: In a furniture store.

------
gumby
The “two Dead artists” photo was famously a problem for CYC too (could
Socrates and Einstein have met? Sure: I don’t know how they spent their whole
lives, so some of it could have overlapped.)

~~~
liability
Why shouldn't Socrates and Einstein meet? IIRC they did in _Bill and Ted._
Having an 'AI dungeon master' entertain fictional scenarios hardly debunks
anything.

~~~
gumby
Because the point of commonsense reasoning (which was the defining purpose of
CYC) is that it is common sense that people (and other physical objects) have
continuous paths through time.

~~~
liability
That seems to suppose that it's not possible to apply common sense reasoning
to fictional scenarios. Asking the system whether a scenario ever could have
happened is not the same as asking it to imagine an impossible scenario. The
system's willingness to consider impossible scenarios doesn't mean the system
believes those scenarios are possible.

~~~
gumby
I worked on CYC (in fact I designed all the low level datastructures and the
level architecture) and indeed it was an amusing bug in the higher levels of
the system.

Certainly CYC had no sense of fiction (when I was there) which is a very
important function I agree, but that wasn’t within the project scope, at least
back then.

------
vanusa
_What do the authors even imagine success to be, here?_

At least _some_ approximation to the hype surrounding GPT-3 -- which seems to
be "near-human lucidness nearly all the time".

 _Sometimes they deliberately describe a surreal situation, then penalize
GPT-3 for continuing it in an identically surreal manner_

In the spirit of "not even wrong" \- GPT-3's answers were, sorry to say, "not
even surreal". Far too often, they were just complete gibberish - with no
amusement value, let alone sublime literary value (that would go along with
the description of "surreal"). Which was precisely Marcus's critique.

------
frisco
I keep being surprised that anyone takes Gary Marcus seriously. As far as I
know he hasn’t contributed anything meaningful to deep learning (or any other
modern AI methods) but is always somehow there to criticize, waving his arms
and shouting “but it’s not actually reasoning!” Baffling.

Edit: removed reference to Gary being a psychologist. It was distracting from
the actual point.

~~~
jszymborski
While I don't necessarily agree with a lot of Gary Marcus' arguments re: AI, I
don't think it's fair to gate-keep based on what you perceive his background
is. AI is an incredibly multi-disciplinary field, and it's a little silly to
imply that people coming from a psychology background cannot contribute
meaningfully to the discussion.

~~~
liability
If his background were in selling used cars, would his opinion receive any
traction at all? I think it's fair to question the relevance of somebody's
background _if_ their background is the root of their supposed relevance.

~~~
tomcam
Or you could, you know, evaluate things on the technical merits

~~~
liability
You mean like the rebuttal the article gives? There is little more to Gary
Marcus's position to rebut because his position is so weak. If he had a more
relevant background then perhaps his opinions would have more substance and,
consequently, warrant more substantial responses.

