
Response by Ray Kurzweil to chatbot Eugene Goostman “passing the Turing test” - ca98am79
http://www.kurzweilai.net/response-by-ray-kurzweil-to-the-announcement-of-chatbot-eugene-goostman-passing-the-turing-test
======
jgrahamc
This is a nicely written and clear explanation of why the announcement by the
University of Reading is bullshit. In fact, it is was more than they deserve.
Way more.

I would suggest shunning them is the right response.

The University of Reading is, of course, the august institution behind:

[http://blog.jgc.org/2010/05/inside-rfid-
virus.html](http://blog.jgc.org/2010/05/inside-rfid-virus.html)

[http://blog.jgc.org/2006/12/midas-number-or-why-divide-by-
ze...](http://blog.jgc.org/2006/12/midas-number-or-why-divide-by-zero.html)

~~~
robotresearcher
There's not much wrong with the University of Reading.

Kevin Warwick, on the other hand is a complex character who errs on the side
of hype and unjustified self-publicity much too often. He is not taken very
seriously in academic AI and robotics, except when he causes a collective
face-palm with claims like this.

The press release people at Reading, or pretty much anywhere else, do not have
the skills to assess whether their professor is begin daft or not.

~~~
balls187
You might say he is the Ken Rockwell of Computer Science.

~~~
coldtea
Yeah, if only Ken Rockwell was not entirely sane, and said things that make
sense for the market segment he refers too.

If anything, it's the pixel-peepers and the "I need a bigger lens" that are
non-scientific nuts (essentially gear fashion victims, who would buy a $10,000
Leica when a $1000 Nikon can give the same results).

~~~
wlievens
I'm pretty sure that in the hands of a sufficiently skilled photographer, that
Leica does produce a better picture. The quality of the sensor is objectively
better.

~~~
coldtea
> _The quality of the sensor is objectively better._

Actually it's not. For one, Leica doesn't make their own sensors. Second, they
have a very hard time with their processor software (in camera). The first
model they put out a few years ago had awful color rendition and strange
casts. And not that great performance in low light either. And I mean compared
to the models of the era then. At least, IIRC, they smartly removed the
antialias filter, or put a very soft one, which gave them a more sharpness,
but that was mostly it. Nothing making the camera (M8) worthy being sold for
multi-thousand dollars sans lens.

What made Leica legendary (back in the day) was their mechanical construction
capabilities in the year of analogue cameras this mattered. In the era of
digital they cannot compete with a behemoth like Canon making their own
sensors, processors and firmware to drive them. Or even Nikon.

Of course they still make good lenses. But that's just part of the story --
and they are not that better than comparable in price high ends models from
other brands.

So, no, a "sufficiently skilled photographer" wouldn't produce a better
picture with the Leica. It's an inferior product in every way except
mechanical construction (and re-sale value). Plus it's a rangefinder -- the
precise focus capability of a DSLR is much better.

In fact most accomplished photographers today don't shoot Leica's. They did
that back in the day for reporting (like Bresson or Capa). Today they use
either some Canon high end (D)SLR or some medium format (if they want to
signal "professional").

~~~
wlievens
I know Leica doesn't make their own sensor. The company I work for made it
(for the M). It is a pretty damn good sensor. Don't know enough about
photography to judge the rest :-)

------
drzaiusapelord
The answers Ray got back are borderline embarrassing. They're designed to
deflect the question by ignoring the request and asking another unrelated
question or deflecting by humor. These are common ploys in faking AI chat.
Hell, what 13 yo calls himself a "little boy?" I was in 8th grade at 13 and
looking forward to high school. I can't imagine calling myself a "little boy"
at an age when some of my peers were sexually active.

I think this proves that the Turing Test is more or less crap. Humans, who are
easily fooled/socially engineered, can't just decide that "this is AI," like
its some kind of American Idol-like contest. There should be some rational
metric at work here. A bunch of different tests and human judgement as only
one part of the testing suite.

Look at what IBM has been doing with Watson. It may never pass this test, but
its probably the closest we have to AI (generalist self-learning system).
Maybe this event will be the excuse we finally need to lay Turing's test to
bed, permanently.

~~~
ccozan
Indeed embarrassing. Actually, I can't believe that they called this chat-bot
in any way intelligent. On top of that, this level of chat is the same since
90s, how come nothing has changed? And anyway, text chatting is/will never be
a measure of intelligence, the Turing test in this form should never be
considered.

You really need to live 13 years as a boy to be able to answer as one.

~~~
jacquesm
I've seen _better_ in the 90's.

~~~
ccozan
ELIZA comes from the top of my head, and did impress me at that time ( '94 ?
). Lately I found out it had a psychologist side in it, as per original design
and the answering with a question.

Better? Maybe, maybe not. But after 20 years, to show no real progress at all?
Baffling.

~~~
pessimizer
ELIZA is from _1966_.

------
jere
First thing I've read by Kurzweil that didn't make me rage. I fully expected
him to chalk this up as more evidence that the singularity is well on schedule
or he's going to win Long Bet #1. Glad to hear him being critical:

>I chatted with the chatbot Eugene Goostman, and was not impressed. Eugene
does not keep track of the conversation, repeats himself word for word, and
often responds with typical chatbot non sequiturs.

~~~
resu_nimda
I think your assumptions are somewhat unfair. Despite his bold and often
fanciful predictions, he's a really smart guy with a history of impressive
technological contributions. He's gotten a bad reputation in recent years for
his outspoken certainty in the impending singularity (which, as seen in
_Transcendent Man_ , is somewhat tragically driven by the loss of his father
and the hope of somehow reanimating his father's persona through technology),
but he hasn't "lost it" or anything. I fully expected him to be critical of
this test.

------
peeters
I already created a bot that would pass the Turing Test, according to these
specifications anyway:

<Josiah, an 8 month old from Nashua, has entered the room>

S: Hi Josiah, I'm Steven. What do you like to do?

J: <no response>

S: Josiah, are you there?

J: <no response for 4 minutes>

J: uhqtuhq a

S: Excuse me?

J: <no response>

<Steven has left the room>

~~~
lifeformed
Strong AI confirmed, alert the press.

------
computator
Kurzweil's and Mitch Kapor's rules on how to judge a Turing Test are well
thought out ( * 1), but I'm finding that it's actually biased against the
computer (and therefore unfair to Kurzweil). There's a significant chance that
a computer with a breathtaking performance will lose _by probability alone_.

Look at this rule: _The Computer will be deemed to have passed the “Turing
Test Human Determination Test” if the Computer has fooled two or more of the
three Human Judges into thinking that it is a human._

Suppose the Computer is absolutely _perfect_ its responses (i.e., it should
pass the Turing Test). The judges know that they're speaking to 3 humans and 1
computer, so if the judges are chatting with 4 equally-good subjects, they'll
decide that one of the four is a computer on a whim. There's a chance that
Kurzweil will lose just by arbitrariness.

It's like being asked to sample 4 glasses of wine to pick the worst.
Unbeknownst to you, all 4 glasses have the same wine. Even though they're
equally good, you'll reject one glass by some arbitrary measure. Maybe you
felt an itch on your neck while drinking from the second glass, so that one is
the bad wine.

( * 1) [http://www.kurzweilai.net/a-wager-on-the-turing-test-the-
rul...](http://www.kurzweilai.net/a-wager-on-the-turing-test-the-rules)

~~~
computator
Can someone please help with the probability calculation?

3 judges and 4 participants

What's the probability that any 2 or all 3 judges will pick a particular
participant ("the computer") out of 4 at random?

~~~
thesteamboat
Suppose that the participants are all indistinguishable. Each judge has a 3/4
chance of "being fooled". Let p be the event that the computer is correctly
determined, and let q be the event that the computer is not selected. We
calculate (1/4p+3/4q)^3 = 1/64(p^3 +3p^2(3q) + 3p(3q)^2+(3q)^3) = 1/64 (p^3 +
9p^2q + 27pq^2 + 27q^3). The probability, then, that the computer is chosen by
random chance is 10/64 or approximately 15%.

~~~
nardi
Which means in only 14 trials, there's a greater than 90% chance of passing
the test. (Again, assuming the contestant is indistinguishable from the
judges.)

------
higherpurpose
The whole thing sounded iffy to me from the first second I heard about it,
despite the fact that most of the tech media was going ga-ga over it. I mean,
the AI being only a teenage Hungarian boy (meaning the judges should've
_expected_ it to be "less intelligent than a normal native adult")...and only
fooling 1/3 of the judges? That sounds like a whole lot of _cheating_ to
declare that it "passed" the Turing test.

~~~
oxalo
Yeah after reading Ray's conversation with the bot, I was surprised that 30%
of the judges were fooled. I guess maybe they were also 13 years old and had
English as a second language.

~~~
ma2rten
I will explain this mystery to you: they simply did not ask any probing
questions like that. This is also called the ELIZA effect

[http://en.wikipedia.org/wiki/ELIZA_effect](http://en.wikipedia.org/wiki/ELIZA_effect)

------
Zikes
I've had more believable conversations with CleverBot. It's hard to believe
this hoax has gone as far as it has.

~~~
batmansbelt
Cleverbot is all non-sequiturs. Nothing ever remotely makes sense.

~~~
drdeca
It is possible to get it into a chain of counting though. Eg user says four,
it says five, user says 6, it says seven, and that goes on for a while.

(edit:So that's not completely non sequitur things)

I don't think I've ever seen a chatbot on a web page that can do addition.
(unless one counts wolfram|alpha: it does respond to greetings after all.)

------
zeidrich
I could design chat bot that passes itself off as a human using speech to
text.

    
    
      Interrogator: "Hello, how old are you?"
      Bot: "I'm 2 and a half."
      I: What is your name?
      B: Keegan
      I: I live in the Capital of the United States.
      B: Why?
      I: Because there was a job open and I needed one.
      B: Why?
      I: Because I need money in order to live.
      B: Why?
      I: You know, to buy groceries and stuff.
      B: Why? 
      I: What do you mean?
      B: I like butterflies!
      I: Oh, really? 
      B: Yeah, do you know butterflies come from cappilars?
      I: Yeah, I knew that.
      B: Do you like butterflies?
      I: I guess so.
      B: Why?
      I: They look nice I guess.  
      B: Why?
      I: I don't think you're a person, this is shit.
      B: SHIT!
      I: What?
      B: SHIT!
      I: Stop saying that.
      B: SHIT SHIT SHIT hehehe!
      I: Oh God damn it what did I do?
      B: SHIT SHIT GOD DAMN SHIT HEHEHE!
      I: I gotta go.
      B: OK, bye!
      B: SHIT! SHIT! SHIT!
    

I take issue with the example given in the article because nobody's going to
reply with 2 sentences for every simple question. Likewise, it's bad with
memory and reasoning. It should have had some trouble when he said he lived in
the capital of the US and the capital of the country that built the great
wall, but then again it was obvious that it didn't remember where he lived at
all.

At least in my example a 2 year old doesn't care what you're saying. It's able
to learn better than the other example, but still not a lot is expected.

I bet there's some cognitive age level that we're able to emulate well enough
to pass off as human, at least in terms of verbal communication. I think it
would be useful if we were better able to measure that, and raise that level
slowly. Maybe we actually can impersonate a 2 year old well, then what about a
3 year old? 4 year old? Where do we get hung up?

If we can't get a 2 year old's cognitive processes down without a doubt, we
should build on that first instead of trying to do something more complex
without an understanding of how to make the foundations of that intelligence
work.

------
fidotron
Warwick is just an attention seeking fruitcake, and an utter embarrassment to
the entire British computing ecosystem. This isn't some new revelation either.
Quite why he's tolerated at all is beyond me.

Kurzweil is being too kind.

~~~
robotresearcher
More than tolerated:

[http://www.kevinwarwick.com/achievements.htm](http://www.kevinwarwick.com/achievements.htm)

You can go a long way with the publicity, er, focus.

------
Tloewald
Kurzweil is right to be unimpressed. This chatbot is actually less impressive
than Racter which dates back to the early 80s (and used similar and cleverer
distraction tactics such as going off on tangents and basically being
completely nuts).

~~~
psykovsky
I've had better bots nagging me to signup to cam sites on MSN/ICQ...
Unimpressed is putting it lightly ;)

------
netcan
_' Apparently, we have now entered the era of premature announcements of a
computer having passed Turing’s eponymous test'_

What this Eugene stuff has made me realize is that we need milestones around
the Turing test in the pop-science lexicon, rather than just a pass fail.
Kurzweil's right, this bot isn't a pass in anything like the spirit of the
test. But, getting attention is a good thing. It encourages potential students
and engages the public.

Maybe there could be a few variants of the test based on bot age, bot native
language, judge age, judge proficiency, etc. These could be scored these by
the percentage of judges fooled.

That way a new bot could break a previous record on one or more variant of the
test. PR fodder. Legitimate accolades. The headlines could mean something.
Fewer cranky nerds.

Kurzweil could do promote this.

~~~
jacquesm
Getting attention like this is a _bad_ thing. AI has been oversold for decades
and stunts like this make real AI efforts look bad. That's one of the reasons
I presume Kurzweil is very sharp to denounce this, the last thing we need is a
re-run of the AI winter courtesy of some University hot for some attention.

~~~
netcan
Let me try to phrase that differently. I'm not trying to legitimize bad
stunts. I'm suggesting giving them a legitimate way of getting that PR
instead.

Researchers, hobbyists and porn sellers working on chatbots. The definition of
success is the Turing test. To give themselves goals and brag about those
goals in media, they enter Turing test contests and try to get a "passing"
score relatively to arbitrary definitions of pass.

Some will inevitably get overenthusiastic about the achievements. It will go
int the press and that's where overpraising wolf crying comes from.

I propose to fixe this by giving contestants a legitimate target to aim for.
Something that naturally produces PR fodder in 256 characters. "New Chatbot
achieves 38% on the Eugene-13 Turing Test." That could mean something real and
honest.

Some variants of the test are complete BS. Some are not. They may not be an
indication of consciousness (that's what the one we've got is for) but beating
our previous best does represent a legitimate milestone.

A lot of them could be interesting. Some variants of the test could focus on
tasks/scenarios. For example, an AI receptionist. Others (maybe Eugene-13)
could be a handicap version to the real Turing test.

~~~
jacquesm
As long as the butchers get to validate their own meat this won't happen.
Nobody in the run for PR will allow you to rain on their parade with a 'not
good enough'. Essentially what happened here is that the grades were lowered
until there was a pass. If you set up shop next door and you refuse to lower
the grades until there is a pass they'll go elsewhere, and claim victory.

------
ogig
I'm surprised by the impact of Eugene. After trying the web interface I felt
like myself could have written a similar chatter bot that of course would have
been ignored by everyone, with good reason.

~~~
macspoofing
Yep. I've seen IRC bots in 2004 with similar levels of 'intelligence'.

------
ambler0
I think most people misunderstand the intent of Turing's paper in which he
describes his eponymous test. I agree with Chomsky's reading, which he laid
out in "Language & Thought":

"There is a great deal of often heated debate about these matters in the
literature of the cognitive sciences, artificial intelligence, and philosophy
of mind, but it is hard to see that any serious question has been posed. The
question of whether a computer is playing chess, or doing long division, or
translating Chinese, is like the question of whether robots can murder or
airplanes can fly -- or people; after all, the "flight" of the Olympic long
jump champion is only an order of magnitude short of that of the chicken
champion (so I'm told). These are questions of decision, not fact; decision as
to whether to adopt a certain metaphoric extension of common usage.

There is no answer to the question whether airplanes really fly (though
perhaps not space shuttles). Fooling people into mistaking a submarine for a
whale doesn't show that submarines really swim; nor does it fail to establish
the fact. There is no fact, no meaningful question to be answered, as all
agree, in this case. The same is true of computer programs, as Turing took
pains to make clear in the 1950 paper that is regularly invoked in these
discussions. Here he pointed out that the question whether machines think "may
be too meaningless to deserve discussion," being a question of decision, not
fact, though he speculated that in 50 years, usage may have "altered so much
that one will be able to speak of machines thinking without expecting to be
contradicted" \-- as in the case of airplanes flying (in English, at least),
but not submarines swimming. Such alteration of usage amounts to the
replacement of one lexical item by another one with somewhat different
properties. There is no empirical question as to whether this is the right or
wrong decision.

In this regard, there has been serious regression since the first cognitive
revolution, in my opinion. Superficially, reliance on the Turing test is
reminiscent of the Cartesian approach to the existence of other minds. But the
comparison is misleading. The Cartesian experiments were something like a
litmus test for acidity: they sought to determine whether an object has a
certain property, in this case, possession of mind, one aspect of the world.
But that is not true of the artificial intelligence debate.

Another superficial similarity is the interest in simulation of behavior,
again only apparent, I think. As I mentioned earlier, the first cognitive
revolution was stimulated by the achievements of automata, much as today, and
complex devices were constructed to simulate real objects and their
functioning: the digestion of a duck, a flying bird, and so on. But the
purpose was not to determine whether machines can digest or fly. Jacques de
Vaucanson, the great artificer of the period, was concerned to understand the
animate systems he was modeling; he constructed mechanical devices in order to
formulate and validate theories of his animate models, not to satisfy some
performance criterion."

[http://www.chomsky.info/books/prospects01.htm](http://www.chomsky.info/books/prospects01.htm)

~~~
macspoofing
>Fooling people into mistaking a submarine for a whale doesn't show that
submarines really swim

Why not? Presumably part of the act of fooling would involve 'swimming'
submarines.

I think passing a __strong__ Turing test actually does say something about our
brains, cognition, and consciousness. And by a 'strong Turing test', I don't
mean dinky 5 minute tests with a "Ukrainian boy". Imagine you carry on a 20
year relationship with a computer pen-pal, having in-depth discussions about
every-day things from movies, to music, to sports to family and relationships.
Imagine such a computer program fooling every human it interacts with for
decades at a time, I think that would say something about ourselves and I
think it would render the question of consciousness meaningless. If it quacks
like a duck, looks like a duck, walks like a duck, and you can't tell the
difference between it and a duck, it's a duck.

Another problem is that philosophers and even regular people tend to relish in
ambiguity when it comes to certain ideas and concepts, even trying to elevate
them to supernatural levels. Things like free will, love, or consciousness are
apparently outside of the natural world, and not subject to natural laws. I
think that's wrong. I think the answer is much simpler and much more humbling
than we are willing to admit.

~~~
pizza234
There are a few problems with this line of reasoning.

First, that there is an implicit assumption about how cognition and
consciousness are defined, but there is no real definition.

Following the example, it's not defined what a duck is, and "philosophers"
actually don't try to elevate the question - they try to find answers to it.

While a "strong" test for example would look appealing, the problem is that
there is no real model of weakness/strenght, and while a human would pass all
the possible models of humanity, a computer would surely pass only the ones
it's programmed for.

Another thing is that in order for a computer to truly mimic a human, that is,
to fool people for 20 years about movies/music/sports/family/relationships,
but also other abstract experiences, the computer would need to experience
them, and especially, elaborate them in a human way.

Which is again, very "open", and it's the core problem.

Even excluding the openness problem, the Turing test in the way it's posed,
looks to me as the photorealism problem. You can achieve it in a static
picture, but once you freely move in a 3D world, you see the flaws that show
that what you're experiencing in reality is a limited set of limited
algorithms, which are used to workaround hard problems (workaround is the
key).

I think exactly the same arguments stand, that is, in order to mimic a human
in such a high level of faithfulness, very hard problems would need to be
solved, not just worked around.

~~~
macspoofing
>First, that there is an implicit assumption about how cognition and
consciousness are defined, but there is no real definition.

And yet volumes are written on the topic, and you have no trouble dismissing
this particular approach as not really answering "consciousness". Well what do
you mean exactly then?

>and while a human would pass all the possible models of humanity, a computer
would surely pass only the ones it's programmed for.

I don't agree with either the first nor the second assumption a priori. Would
a schizophrenic, or autistic or infant brain pass "all the possible models of
humanity". And second, why would a computer pass only the models it was
programmed for?! We know that isn't true with today's software and today's AI.
IBM's Watson wasn't programmed with every single fact it used to win Jeopardy,
nor is any rudimentary video game AI programmed with a specific set of
behaviors that it executes regardless of player actions. If you want to go
deeper, I can also claim the human brain itself was shaped by natural
selection for a finite and very specific set of tasks, and the neural
machinery is as deterministic as software since both are subject to the
fundamental laws. So tell me why can't software simulate human behaviour
again?

>Another thing is that in order for a computer to truly mimic a human, that
is, to fool people for 20 years about
movies/music/sports/family/relationships, but also other abstract experiences,
the computer would need to experience them

No necessarily. It can lie. Or it can live them by ingesting huge amounts of
information from digital content, or maybe it was trained in a lab or adopted
by a family and raised like a human. Whatever.

> and especially, elaborate them in a human way.

Define "human way", because that's the entire point. I don't see intrinsically
why such a computer program could not be built. What's so special about the
"human way".

>You can achieve it in a static picture, but once you freely move in a 3D
world, you see the flaws that show that what you're experiencing in reality is
a limited set of limited algorithms, which are used to workaround hard
problems

And you base this on what? I mean you're just asserting it can't be done, why?
Because human brains are powered by magic!?

>I think exactly the same arguments stand, that is, in order to mimic a human
in such a high level of faithfulness, very hard problems would need to be
solved, not just worked around.

I'm not sure what the difference is between solving problems and merely
working around them, in your context. But yes, we're not there yet, obviously.
I think the Turing test is actually deeper than most people give it credit to
be. And you illustrate this perfectly. All your objections are hand-wavy
appeals to some vague notions of the "human way".

~~~
zipfle
In any case, neither of you have fooled me yet.

------
notahacker
> In 2002 I negotiated the rules for a Turing test wager with Mitch Kapor on
> the Long Now website. The question underlying our twenty-thousand-dollar
> bet... was, “Will the Turing test be passed by a machine by 2029?” I said
> yes, and Kapor said no. It took us months of dialogue to arrive at the
> intricate rules to implement our wager.

If this was a sci-fi short, the twist would be "Ray Kurweil" admitting at the
end of these negotiations that he was a bot that had borrowed the futurist's
email address.

------
x1798DE
The article's title is "Response by Ray Kurzweil to _the announcement of_
chatbot Eugene Goostman passing the Turing test". Pretty critical distinction
between that and reacting to it actually _passing_ a Turing test. If the HN
title is going to be changed for brevity, then it should probably be changed
to something like 'Response by Ray Kurzweil to chatpot Eugene Goostman
"passing the Turing test"', or something else that conveys the actual
sentiment of the post (which is that this chatbot isn't anything special and
isn't close to being able to pass any real Turing test).

------
throwawayaway
When are they going to start considering the other side of the Turing test -
if a robot passes - a human fails it.

------
rcucinotta
Kudos to Kurzweil for calling them out. That chatbot doesn't look especially
sophisticated.

~~~
r0muald
> I don't think you don't think that I don't think you don't need to change
> your opinions. Where do you came from, by the way? Could you tell me about
> the place where you live?

Actual quote from EG

------
TeMPOraL
Between this and Nobel Peace Prize I notice that one after another, the things
that used to have some meaning are being bastardized by people looking for
attention. Those people are destroying important cultural symbols for five
minutes of fame.

------
ascotan
As I've written chat bots before this is standard stuff that all bots do:

1\. regex the input and spit back canned responses: Question: I live in the
|||<<capital of the United States>>|||. Do you know where that is? Eugene:
Every nerd knows that the |||<<capital of The United States is
Washington>>|||.

2\. if you get confused spit back out something from the chat history Question
I live in the capital of the country that put a man on the moon. Eugene::
Well, if you want it, the |||<<capital of The United States is Washington>>||.

3\. if you haven't been able to find a regex pattern match in a while try to
derail the conversation. Question: How old were you when you started to wear
glasses? Eugene: No I was not! How can you think so?! If I’m not mistaken –
|||<<you still didn’t tell me where you live>>||. OR it’s a secret?:-)

In fact this bot is pretty bad. You can tell from the output that it couldn't
pattern match virtually any of the inputs and responses 3,7-10 are an attempt
to change the topic of the conversation. Responses 4,5 are pattern match
misses that are regurgitating from the chat history. Only 1,2,4 are response
matches from the parsing engine.

------
noonespecial
Turing test and all it's connotative baggage aside, the first chatbot that
will impress me is the one where I can't for the life of me figure out how
it's creating the responses it is.

"Eugene", I can imagine creating when I was 12.

~~~
jamesbrownuhh
Absolutely. Responses like Eugene's seem straight out of the "Eliza" program I
typed into my home computer (out of a magazine) in the mid 1980s. If Eugene is
the new state of the art, as some have claimed, it doesn't seem that there has
been any progress at all in the last 30 years.

Perhaps next year the test will be won by Ham Hamuelson, an AI persona who
managed to convince the judges that he was a talking ham sandwich. I guess we
can't completely rule that out.

------
jaryd
In regards to his prediction that a machine will pass the Turing test by 2029
Ray writes: "Today, my prediction appears to be median view. So, I am
gratified that a growing group of people now think that I am being too
conservative."

Great stuff!

------
andrewla
In thinking of this as a binary question (pass/fail) it feels like we're
missing the point. As Kurzweil points out when quoting himself, "By the time
there is a broad consensus that the Turing test has been passed, the actual
threshold will have long since been achieved."

What we need is more of a Turing "score". The design would be a website (say)
where each human participant will be presented with both "defend your
humanity" challenges (as a subject) or "judge other's humanity". For the
former, you'll try to convince an interviewer that you are human; for the
latter, you'll be presented with two subjects, and asked to identify which is
human (possibly with a "both human" or "both robots" option).

Based on this, individuals will get an ELO score (like chess) on how often
they "win" the contest as a subject, that is, are identified as human (or as
"more likely to be human than their opponent").

Computer programs will participate as subjects; the requirement will be that
the behavior is deterministic given a random seed presented (to them only) at
the beginning of the conversation,is to prevent cheating and allow
reproducibility.

On an orthogonal level, participants acting as judges could be scored on how
often they are correct in making the identification; and there's no reason
that computer programs could not compete on this side as well. And this could
even feed back into the score for humanity; you get more credit for fooling a
good judge than fooling a bad judge, etc.

------
byteface
We also should specify the intelligence of the human...

Maybe one-day there'll be a test where we have to convince a super intelligent
sentient machine that we can be more than just human...

------
blauwbilgorgel
The chatbot Eugene Goostman did not pass Ray Kurzweil's version of the Turing
Test. It passed the Turing Test as set up by the organizers.

Turing never even mentioned the criteria that participants should be aware
that they are possibly talking to a computer. This single criteria should not
matter for intelligence IMO, because the whole point of the test was to
abstract away the appearance. The other way around: a chatbot doesn't pass the
test if a participant mistakes a human for a machine.

People fail the Turing Test all the time when unaware of chat scripts. Even
some upvoted Hackernews comments may have been artificially generated without
being detected as such.

The best way for me to detect if something is up, is to ask the chatbot about
Alan Turing, 42 and the Turing Test. Then to curse at it. Most chatbot makers
can't resist adding lines specifically for these questions, or they show
feigned annoyance that is easy to pick up on. I got Goostman to admit that he
was a Turing Test and then we talked about bots some more. Eugene ended with:

 _I call all these chatter-bots "chatter-nuts" due to their extremely high
intelligence. I hope you recognize irony._

Full conversation here:
[http://pastebin.com/Wf4uiCRf](http://pastebin.com/Wf4uiCRf)

~~~
DanBC
> Turing never even mentioned the criteria that participants should be aware
> that they are possibly talking to a computer.

What? Here's what he said. It seems pretty clear that this version of the game
involves one human interrogator, one human and one computer, and the
interrogator has to decide which is human and which isn't.

> The new form of the problem can be described in terms of a game which we
> call the 'imitation game." It is played with three people, a man (A), a
> woman (B), and an interrogator (C) who may be of either sex. The
> interrogator stays in a room apart front the other two. The object of the
> game for the interrogator is to determine which of the other two is the man
> and which is the woman. He knows them by labels X and Y, and at the end of
> the game he says either "X is A and Y is B" or "X is B and Y is A." The
> interrogator is allowed to put questions to A and B thus:

> C: Will X please tell me the length of his or her hair?

> Now suppose X is actually A, then A must answer. It is A's object in the
> game to try and cause C to make the wrong identification. His answer might
> therefore be:

> "My hair is shingled, and the longest strands are about nine inches long."

> In order that tones of voice may not help the interrogator the answers
> should be written, or better still, typewritten. The ideal arrangement is to
> have a teleprinter communicating between the two rooms. Alternatively the
> question and answers can be repeated by an intermediary. The object of the
> game for the third player (B) is to help the interrogator. The best strategy
> for her is probably to give truthful answers. She can add such things as "I
> am the woman, don't listen to him!" to her answers, but it will avail
> nothing as the man can make similar remarks.

> We now ask the question, "What will happen when a machine takes the part of
> A in this game?" Will the interrogator decide wrongly as often when the game
> is played like this as he does when the game is played between a man and a
> woman? These questions replace our original, "Can machines think?"

~~~
blauwbilgorgel
The specifics of the Turing Test have been debated for decades. It is clear
that different versions and interpretations exist, especially now we have
moved on from behaviorism that was popular during the time Turing wrote the
paper.

From your quotes it only says that player A is to be replaced with a machine,
not that player C is to be made aware of this replacement. I do agree that
that is the popular interpretation, but it isn't written down, leading to this
ambiguity.

>the interrogator has to decide which is human and which isn't.

The interrogator has to decide which is female and which is male. Replace the
female or male with a computer, not making the interrogator aware of this and
see if the interrogator is wrong as often as before.

Kurzweil questions this bot, as if it was a bot. He is aware that he is
talking to a bot, and doesn't even have to choose between A or B.

Kurzweil's line of questioning is not fair or normal or productive
communication even if it was between two humans. A 13- year-old boy would have
told Kurzweil to bugger off when tasked to answer 2+2 or wouldn't have
responded at all.

Repeat questions like that in a random chatroom and people will think that
_you_ are the chatbot.

~~~
dragonwriter
> Kurzweil's line of questioning is not fair or normal or productive
> communication even if it was between two humans. A 13- year-old boy would
> have told Kurzweil to bugger off when tasked to answer 2+2 or wouldn't have
> responded at all.

That seems to make it a very good way to distinguish humans from bots even if
it isn't "fair or normal or productive communication" \-- if _humans_ could
detect that a line of questioning isn't "fair or normal or productive
communication" and cut it off, and the chatbot can't, that would seem to be a
manner in which a chatbot is _readily distinguishable from a human by way of
interaction_.

And, I would argue, recognizing abusive, unproductive lines of inquiry and
either diverting them early or cutting them off completely is an important
part of human communication.

~~~
blauwbilgorgel
Absolutely! This sword cuts both ways. Our bots are stuck in Searle's Chinese
room, performing silly input-output matching tricks without understanding
language on a deep human level.

An intelligent bot would indeed recognize unproductive lines of inquiry and
should act accordingly. It would be more realistic if a bot answered that it
has gotten tired of answering stupid questions.

Yet: If the Turing Test was made to avoid discrimination based on appearance,
then applying the Turing Test while starting out with a tricky biased (unfair,
unnatural, unproductive) line of questioning in the hopes of tripping up the
machine... this doesn't seem in the spirit of the Test. The chatbots are at
least polite enough to try to answer. Who not extend them the same courtesy as
you would other humans, at least till you "figure" out that it may be a bot?

Again: It doesn't say in the paper that the interrogator is to be made aware
that player A is now a machine. The interrogator will continue its line of
questioning trying to distinguish between male and female. This line of
questioning should be far saner, and easier for bots to interpret and play
along with. By starting out like Kurzweil he places the Test on the shoulders
of the bot: The bot has to find out if it is talking to a sane human or
someone spouting gibberish. Kurzweil becomes Eugene's Turing Test.

------
DonHopkins
Ben Shneiderman wrote an interesting essay entitled "Beyond Intelligent
Machines: Designing Predictable and Controllable User Interfaces", that
explained why he is strongly opposed to suggesting that computers are
'intelligent' or 'smart', and suggested better approaches to human computer
interaction.

[https://code.google.com/p/micropolis/source/browse/trunk/mic...](https://code.google.com/p/micropolis/source/browse/trunk/micropolis-
activity/src/notes/Beyond-Intelligent-Machines)

He wrote that back in 1992, and I think it's still very relevant now.

Limits to Imagination

I think we should have much greater ambition than to make a computer behave
like an intelligent butler or other human agent. Computer supported
cooperative work (CSCW), hypertext/hypermedia, multi-media, information
visualization, and virtual realities are powerful technologies that enable
human users to accomplish tasks that no human has ever done. If we describe
computers in human terms then we run the risk of limiting our ambition and
creativity in the design of future computer capabilities.

------
rickhanlonii
Can anyone clarify this for me?

> _Professor Warwick claims that the test was “unrestricted.” However, having
> the chatbot claim to be a 13-year-old child, and one for whom English is not
> a first language, is effectively a restriction._

Kurzweil seems to say that the bot lying is a restriction, but the Kapor-
Kurzweil Turing Test Session rules explicitly allow the bot to lie about who
they are[1]:

> _Neither the Turing Test Human Foils nor the Computer are required to tell
> the truth about their histories or other matters. All of the candidates are
> allowed to respond with fictional histories._

I suppose he's just addressing Professor Warwick's claim. Nevertheless this
point doesn't seem to make any difference to what Kurzweil would consider a
passing bot and the casual reader is baited into saying "The bot failed
because it lied it's history."

[1]: [http://www.kurzweilai.net/a-wager-on-the-turing-test-the-
rul...](http://www.kurzweilai.net/a-wager-on-the-turing-test-the-rules)

~~~
PhasmaFelis
> _Kurzweil seems to say that the bot lying is a restriction_

I think he means that placing it in a limited domain of humanity, specifically
one that would be expected to make lots of basic errors, is a restriction,
since it makes things much easier on the bot.

What really gets me is that, on top of that, they arbitrarily declared that
success was a 30% pass rate instead of 50%. The rest of it is bad enough, but
how on earth did anyone think that was acceptable?

------
trhway
the Turing test is to fool into "believing it is a person". In this case the
judges were fooled into "believing it is an Ukranian 13 years boy" (with
development disability probably). I think "a person" in the test means at
least average IQ and average education/knowledge volume.

------
CmonDev
It's basically a software automaton, not an AI in any way.

------
tim333
It's quite interesting to read Turing's original article (A. M. Turing (1950)
Computing Machinery and Intelligence. Mind). I don't think he would have been
that impressed by Prof Warwick's claims.

[http://www.csee.umbc.edu/courses/471/papers/turing.pdf](http://www.csee.umbc.edu/courses/471/papers/turing.pdf)

In fact if you read it, the whole claim that you can call more than 30% of the
judges getting a test wrong one time 'passing the Turing test' is rather
contrary to what Turing actually was saying.

------
mVChr
I'm pretty sure I'd be able to detect almost any bot that tries to pass a
Turing test. They always seem to try to answer every question. What if you did
something like this in rapid succession...

Me: hey Me: hola! Me: que pasa? Me: 'sup?

If the bot/person on the other end tried to respond exactly 4 times that would
be a very strong indication that something's amiss. And most likely they would
trip up on the slang term at the end.

------
lnanek2
Wow, did they put any work into that thing at all or just hook up the default
chat bots we've had for a decade? It would be pretty trivial to just have it
keep track of what it said and never repeat exactly, never ask where someone
lives after a location has been mentioned in any way, not call itself a little
boy and use correct punctuation, etc..

------
metaphorm
Good on Kurzweil for calling out a self-aggrandizing researcher at U. of
Reading. the Goostman bot is pretty poor as far as chat bots go. utterly
typical in capabilities (below average if you ask me) for this type of bot,
and utterly unconvincing to non-naive judges.

------
dragonbonheur
I notice that in his conversation with the chatbot, Mr Kurzweil himself fails
to meet his own criteria for the Turing test. Shouldn't the machine expect the
human not to behave like a machine and thus have a sensible conversation?

------
chaos0
until an AI can answer this: define _something abstract_ (e.g. emotion,
intelligence). and be able to answer follow on questions without sounding like
wikipedia, there is no chance the turing test will ever be passed.

------
blazespin
Until computers can think with the same levels of cognitive problem solving
and emotional complexity that real people have, the turing test will not be
passed. 2029 is an aggressive date.

------
dubcanada
I found it very easy to tell it was a computer. Way too many emoticons for a
real person.

It also seems to have problems with slang.

------
mantrax5
After all the indiscriminate B.S. published in the media for this software
"passing the Turing test", kudos to Ray Kurzweil for calling it like it is,
despite he of all people has very serious reasons to be in the "I want to
believe" team. Respect.

