
My Conversation with “Eugene Goostman” - aheilbut
http://www.scottaaronson.com/blog/?p=1858
======
fchollet
There is nothing about Eugene that is useful to the advancement of actual AI,
to our understanding of intelligence. Advanced chatbots are a parlor trick,
the same way that rabbit and hat tricks do not help us develop actual
teleportation systems, however clever and convincing they might seem to
credulous people.

The difference between a trick and a technology, of course, is usefulness. Can
your software do something useful, or is it solely aimed at deceiving people
into believing it does?

It was interesting to see how large quantities of journalists were simply
channeling the ridiculous claims of the original press release, without an
ounce of critical thinking --choosing instead to pontificate on the inexorable
progress of AI. Yet another Dorian Nakamoto moment for the mainstream media.

~~~
ecopoesis
If the machine can "deceive" you into thinking it's doing work, then it's
doing work. It doesn't matter if there's a man who speaks Chinese in the room,
or if there's a bunch of rulebooks about Chinese, they're functionally
equivalent.

[http://en.wikipedia.org/wiki/Chinese_room](http://en.wikipedia.org/wiki/Chinese_room)

~~~
fchollet
The difference between an employee doing work and one deceiving you into
thinking they're doing work is simple: one produces value, the other does not.
Here, no useful information can be extracted from this Eugene chatbot. It
cannot perform any useful task a human could perform or answer any useful
question a human could answer. Which is what I was talking about when I
mentioned usefulness.

As a side note, I recently made a question answering engine which, although
not very advanced and certainly not attempting to pass for a human, can
provide you with useful information when asked general knowledge questions:

[http://www.sphere-engineering.com/blog/quickanswers-io-
seman...](http://www.sphere-engineering.com/blog/quickanswers-io-semantic-
question-answering-engine.html)

The Chinese room experiment is not a relevant reference here, because this
Eugene chatbot is in no way "functionally equivalent" to a human being. It is
not even attempting to implement intelligence.

~~~
Too
Define "useful"

I bet a black hat SEO optimizer would find tons of auto generated content very
useful. That kind of content is designed exactly to deceive both users and
search bots that it is of value.

~~~
cardine
Eugene would not do a good job auto generating content that is useful for SEO
and there are existing tools which do a very effective job auto generating
content for SEO purposes.

------
davidgerard
The entire Turing test claim is Kevin Warwick's media hype. He issues a press
release to this effect approximately annually. Basically, the way to tell if a
claim related to artificial intelligence is rubbish is to search for the name
"Kevin Warwick". This is what I wrote for his 2008 effort:
[http://newstechnica.com/2008/10/13/turing-test-won-with-
arti...](http://newstechnica.com/2008/10/13/turing-test-won-with-artificial-
stupidity/)

~~~
TazeTSchnitzel
You know, writing a chatbot that could pass the turing test might actually be
very easy if you make it act like a deranged Internet troll.

To make a chatbot that imitates someone specific might be easier still; I wish
no ill-will to TempleOS (formerly Losethos), but I think I could quite easily
write a bot that was indistinguishable from him.

~~~
StefanKarpinski
Writing an RMS chatbot would be exceedingly easy. Now that I think about it,
it's entirely possible that someone at MIT already did that and I fell for it
by arguing with it on the csail-related list last month.

------
boyter
For fun I ran Eugene against Eliza to see what sort of nonsense it would come
back with. Below is a quick extract of the "conversation"

    
    
      Eliza: Is it important to you that your attitude to I ?
      Eugene: It is highly possible. It is highly possible.
      Eliza: Lets discuss further why your feelings right now.
      Eugene: Just because 2 plus 2 is 5! Wonna ask me something more?
      Eliza: Don't any other reasons come to mind ?
      Eugene: Is it a kind of manace? Maybe, let's talk about something else? What would you like to discuss?
      Eliza: Why the uncertain tone ?
      Eugene: Why? Just because! :) Why? Just because! :)
      Eliza: Why do you ask ?
      Eugene: Oh, but why not? Well, let's go on though.
      Eliza: Are such questions much on your mind ?
      Eugene: Who knows... Well, let's go on though.
      Eliza: Have you asked such questions before ?
      Eugene: I would rather not talk about it if you don't mind. Oh, what a fruitful conversation;-)
      Eliza: What would you do if I don't mind ?
    

I think it would be more interesting to wash each reply to Eugene through a
translator from English to Japanese and back. Given a more powerful bot the
Eliza it could produce some interesting results.

~~~
plaguuuuuu
It would be interesting to run the AI against itself and have each version try
to determine if the other is an AI.

~~~
StefanKarpinski
That would actually be far more interesting than any of this nonsense. To try
to determine whether the thing you're talking to is a bot or not, you have to
construct some kind of model of it – try to explain its behavior. The ability
to explain things is a genuine hallmark of intelligence, so building things
that can try to model and explain the world around them would be real AI
research.

~~~
murbard2
That's what blogspam detection is all about

------
jqm
Based on this article, if 33% of the judges were fooled maybe some better
judges are in order.

It seems there are two ways to beat the Turing test. Come up with very clever
algorithms or (probably easier) get some less discerning judges.

The whole thing reminds me of this:
[http://www.cleverbot.com/](http://www.cleverbot.com/)

Not Turing convincing, but I found it fun.

~~~
Timmmmbob
Yeah the Onion article writes itself:

"Turing test proves that 30% of judges are actually computers."

A recent Turing test at Reading University backfired after it was discovered
that nearly a third of the apparently human judges were discovered to be less
intelligent than a ZX Spectrum. Suspicions were raised when they were unable
to carry on sophisticated conversations with the programs being tested and
instead nattered on about banal topics such as their job, popular music, and
another thing.

Ok I'm done. I don't write for free!

Anyway, Onion writers, make it happen.

~~~
eridius
Relevant XKCD: [http://xkcd.com/329/](http://xkcd.com/329/)

------
codeulike
Here's an article by Robert Llewellyn (Kryten from Red Dwarf) about his
conversations with Eugene Goostman at the Uni of Reading event

[http://www.theguardian.com/science/2014/jun/09/turing-
test-e...](http://www.theguardian.com/science/2014/jun/09/turing-test-eugene-
goostman)

------
matnewton85
I had an actual conversation with Eugene Goostman as if I was meeting a 13
year old from the Ukraine.

And Eugene nailed it. He introduced himself, ask me polite questions, I asked
him polite questions, and we developed a slightly broken conversation, but
yes, a conversation.

The idea of this is not "try to break the robot", the idea is "if you ran into
this robot in real life and it was masquerading as a human, would you be
tricked?"

That's the REAL question.

~~~
Morendil
> The idea of this is not "try to break the robot"

Think of it this way. If you want to learn what constitutes strong chess play,
will you learn best from playing a) yourself or b) a much stronger player?

Having a "collaborative" exchange with a chatbot is of the same strength as
playing chess with yourself, for the purposes of investigating what "thinking"
consists of.

The Turing Test is useful precisely when we _are_ trying to "break the bot" as
you put it; in fact, when the bot is pitted against a real human, who in that
contest plays the role of the chess master.

Saying that Eugene Goostman "passed the Turing Test" is like crowning me World
Chess Champion, based on the amazing record of beating 70% of a random sample
of six year olds.

> "if you ran into this robot in real life

You wouldn't ever "run into" Eugene Goostman in real life, because it lacks
the kind of generalist problem solving ability that would allow it to insert
itself into any "real life" situation - an ability that even six year olds
possess. It literally couldn't even get out the gate.

~~~
matnewton85
Ok, I think this is just a perspective thing.

You're (most likely) coming from a CS background, I come from a user testing
background.

If someone sits me down and says "use my site, have a conversation with a 13
year old Ukrainian".. I start having a conversation with a 13 year old
Ukrainian.

Someone sits a CS major down with Eugene, the 13 year old Ukrainian, he'll
drop references to AI reserch from the 60s. Something that probably one or two
Ukrainian kids could ever answer.

~~~
marcus_holmes
but these bots won't cope with not being able to answer, because the human
response (to learn) isn't open to them. If you sit down with a human and start
talking about something they don't know, then usually (in a situation free
from conflict or other emotional prompts) their curiosity prompts them to
start asking questions and building a mental model of the subject that prompts
further questions and they learn about the subject. The conversation builds on
the learning and a real progression of thought ensues.

The chatbots can't do this, except in the most limited of ways ($job="CS
scientist"), so they don't appear human.

~~~
matnewton85
Great point.

------
JackC
_I had the “conversation” below with the version of Eugene Goostman available
at[http://default-environment-
sdqm3mrmp4.elasticbeanstalk.com/](http://default-environment-
sdqm3mrmp4.elasticbeanstalk.com/). It’s possible that there’s a more recent
version somewhere else, but this is the only version I was able to access.
Even then, the site was constantly down ..._

Setting aside whether this is the same version of the software, is Eugene CPU-
constrained? The press release described it as running on a "supercomputer." A
conversation with an overloaded Amazon instance might be totally different.

~~~
cmiles74
Based on the article, I'd be very much surprised if a "supercomputer" was
necessary. While the chatbot may have improved a bit since the referenced
version, I'd be surprised if it was by much.

------
Nihilartikel
I'd really like to see the passing transcripts from this test, but, I'm not
expecting to be blown away. I was introduced to Kevin Warwick's writing in a
course on technology in society about 10 years ago. He was making a big fuss
about having made himself a cyborg by implanting an RFID tag in his arm. My
ex-trucker next door neighbor, who had a cochlear implant, was a much more
impressive example of cybernetics by my estimation, and far less insufferably
attention seeking.

------
ThomPete
Many commentors seems to think that the turing test is a claim about computers
achieving self awareness or consciousness. It's not. It is a test to see
whether humans can distinguish between the "AI" and a human. There is no
correct definition of AI here.

This is no different than when a chess computers beats a human being. It
doesn't matter that it does that by brute-force. What matters it that it beats
the human. That's the endgame. There are no points for style in reality.

Human fool each other all the time by being disingenuous that doesn't mean
that we aren't humans when we do that.

Lets get some perspective here.

~~~
rntz
In Turing's original paper, the Test is intended to be a "new form" of the
question "can machines think?". It is explicitly a test of the machine's
intelligence, not of humans'.

~~~
ThomPete
It test the machines intelligence under the same naive assumption that was
held about beating humans in chess. It was a different time they thought about
it differently.

Turing first made the test to ask whether computers can think but changed it
to a much more concise and answerable question.

"Are there imaginable digital computers which would do well in the imitation
game?"

[http://en.wikipedia.org/wiki/Turing_test](http://en.wikipedia.org/wiki/Turing_test)

------
tokenadult
Scott Aaronson here makes plain by example that human beings still have a big
lead over chatbots in combining thoughtful examination of deep issues with
humorous tone and sparkling language. I'll recommend to all of my friends that
they read this submission.

The comments below the blog post also demonstrate much better conversation
than the chatbot, and answer some questions that I had as a reader when I had
only read the blog post itself. Friendly groups of human beings (as here on
Hacker News) still provide much better conversation than chatbots. Accept no
substitutes.

------
kenjackson
A twist on the Turing Test that would be fun.

Drop in 19 people and 1 AI into a chatroom, and then try to determine who is
the AI. A lot of these adversarial questions may come off looking more like an
aggressive AI than a human.

~~~
gamegoblin
I do like the idea, but all it would require is at least 2 people putting
forward group strategies.

For example, one person says "everyone repeat your username", so everyone
does. Of course, the one who said this could be the bot, so another person
would then have to put forth a different challenge (something similar --
repeat your username backwards, or something).

~~~
kenjackson
What if I told the humans there was 1 AI, but there were really 15 AIs and
just a few humans. How many humans would volunteer, "There may be only 1 AI,
but many seemed just as inhuman."

------
BigTuna
"Maybe, let’s talk about something else? What would you like to discuss?"

And that's where it failed my amateur, "BS" test...

------
teahat
I think it's important to acknowledge the distinction between chatting to a
bot and not being convinced that it's a human, and having two conversations
simultaneously, one of which is with a human, one with a computer, and
figuring out which is which. Humans too can be surprisingly obtuse, go off at
tangents, ignore your points, be boring and repeat themselves.

As for the questions around the utility - automating support functions for
various non-critical services seems like an obvious potential application
(think of all those "click here to chat to a live representative" dialogs on
various websites).

~~~
DanBC
But automated support with a bot that is obtuse, goes off on tangents, ignores
my points and is boring sounds like a pretty poor support experience.

------
rdvrk
If I were to chat with a random person online, and their conversation starter
was "Which is bigger, a shoebox or Mount Everest?", I'd probably ignore that -
and disconnect after a couple more "witty" questions. Eugene is just trying to
be nice.

As someone said, it gets better if you try having a normal conversation.

I'd like to see what would happen if the people who created this let a real 13
year old boy chat with people, but announcing it as the supercomputer version.

------
listic
Why isn't anyone trying hard to make a better chatbot?

(from the comments) JE Says: "Nobody that I know in the NLP community works on
chatbots or the Turing test"

------
trhway
after reading the chat i think Scott has passed Turing test

~~~
acqq
Or maybe he wouldn't: remember that the scenario is that you communicate with
"another end" not knowing if it's a person or a bot, and Scott managed to
never answer the questions he was being asked, constantly stating the
questions of his own. Try to imagine you really don't know if he's human; eh
only asks questions, it can be very easily scripted.

~~~
hnal943
He does reply later on to mark the sensible responses, as well as call back to
previous parts of their conversation (making a point about how any person
would know that a Camel has four legs). Not to mention that his line of
questioning is coherent.

~~~
acqq
It's easy for a simple bot to refer to his own previous questions. Not much
harder to refer to something the other side said too. So the only thing that
remains seems to be claim that the reply of the other side has some sense.

~~~
hnal943
His commentary about the previous question included analysis of the other
side's answer (that it was not only wrong, but evasive and obviously computer
generated), so I think that remains a valid point. I don't see Goostman
providing similar analysis.

~~~
acqq
It was already shown that attacking the other side (even claiming that this
other is the computer) is a good bot practice. So even there Scott can't
obviously score. Analysis it ain't, is just "it's the first sensitive thing
you say" and "it's the second sensitive thing you say." You, as the human
involved in chatting with bots and humans in parallel, would have to evaluate
the probability that the particular statement is actually only sensitive thing
etc. Without the hindsight and knowing that offensive behavior is very bot-
like, it's harder than you imagine.

A lot of human behaviors online can in fact be replaced by scripts. There's
even a T-shirt:

[http://www.kleargear.com/1474.html](http://www.kleargear.com/1474.html)

"Go Away Or I Will Replace You With A Very Small Shell Script"

------
josephlord
I think there is something about Kevin Warwick's voice being so dull sounding
that makes journalists take his ridiculous stunts seriously.

------
NAFV_P
I would like to see the Turing Test turned on its head ... the human has to
convince the computer it is talking to another computer.

~~~
ipsin
I would get tired of typing "Invalid syntax."

~~~
NAFV_P
> _I would get tired of typing "Invalid syntax."_

Good point, but put yourself in the computer's shoes. If I were an artificial
computer, talking to a human would drive me insane.

~~~
anonymfus
*> If I were an artificial computer

Opposed to natural computers like humans? That is very interesting
terminology.

~~~
DanBC
That's terminology straight from Turing's paper introducing the game.

Computers were people back then, as well as digital computers. And he talks
about cloning a human not counting as a win for an intelligent machine.

~~~
NAFV_P
> _That 's terminology straight from Turing's paper introducing the game._

Hi DanBC.

My choice of words was coincidental. I was not aware that Turing used the same
terminology, so I am pleasantly surprised. Recently, I have been experiencing
a lot of coincidences on HN, bizarre.

------
mcv
Yeah, Eliza was more convincing.

~~~
Houshalter
Ahh. Please continue...

------
V-2
Clever as this is, how could it really fool anyone (aware of the existence of
bots)?

~~~
Houshalter
Well this is a much older version of the bot I believe. Additionally most
people do not ask adversarial questions, but rather try to have a normal
conversation, which bots are quite good at. You also have to have prior
knowledge about how chatbots work in order to know what kinds of things it is
bad at (e.g. common sense knowledge about the world, which most people
wouldn't think to ask about or would come up in conversation normally.)

A Watson like QA system could potentially fix that weakness and answer the
questions he asked. The press release described it as running on a
supercomputer, so it's possible they were doing something like that. But then
someone would find another weakness in it, and so on.

~~~
jpindar
The normal conversations I have with people do not consist primarily of them
trying to change the subject.

~~~
Houshalter
Neither does a "normal" conversation with the bot. It only tries to change it
when you start probing for weaknesses and going off script.

------
jonsterling
Lmao that people are still taking AI seriously.

~~~
baddox
What exactly are you suggesting with the implication that people shouldn't
take AI seriously? That AI research is a waste of time and money? That "strong
AI," however it's defined, will never happen?

~~~
Houshalter
Some people have the strange, sometimes religiously motivated, belief that
human intelligence has some vague magical property which by definition can't
be reproduced in mere physical machines.

Otherwise it's just arguing about how difficult AI will be which is very hard
to estimate. Some people look at previous failures and lack of progress in AI
and extrapolate from that. But progress is rarely linear and computers are
only now getting fast enough to handle the really cool stuff.

