Hacker News new | past | comments | ask | show | jobs | submit login
Turing Test Success (reading.ac.uk)
216 points by stevejalim on June 8, 2014 | hide | past | favorite | 134 comments



Where did the 30% requirement come from? Sounds like something the contest organizers added to make it possible to "pass" without fooling 2/3 of the judges. Using a young teenager as a character also seems like a cheat unless they had other 13-year olds to be the judges. The character needs to be a peer to the judges. Most 13 year olds behave oddly enough in the opinion of most adults that it's got to be easier to credit weird behavior to generational or cultural differences. So kudos to the winners on strategy, and boo to the contest organizers for having such poor rules.


Most likely from here:

>It will simplify matters for the reader if I explain first my own beliefs in the matter. Consider first the more accurate form of the question. I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

COMPUTING MACHINERY AND INTELLIGENCE

— A. M. Turing

http://loebner.net/Prizef/TuringArticle.html


Is he talking about 109 bits? Because that seems pretty impossible to me.


> As we have mentioned, digital computers fall within the class of discrete-state machines. But the number of states of which such a machine is capable is usually enormously large. For instance, the number for the machine now working at Manchester is about 2 165,000, i.e., about 10 50,000. Compare this with our example of the clicking wheel described above, which had three states. It is not difficult to see why the number of states should be so immense. The computer includes a store corresponding to the paper used by a human computer. It must be possible to write into the store any one of the combinations of symbols which might have been written on the paper. For simplicity suppose that only digits from 0 to 9 are used as symbols. Variations in handwriting are ignored. Suppose the computer is allowed 100 sheets of paper each containing 50 lines each with room for 30 digits. Then the number of states is 10 100x50x30 i.e., 10 150,000 . This is about the number of states of three Manchester machines put together. The logarithm to the base two of the number of states is usually called the "storage capacity" of the machine. Thus the Manchester machine has a storage capacity of about 165,000 and the wheel machine of our example about 1.6. If two machines are put together their capacities must be added to obtain the capacity of the resultant machine. This leads to the possibility of statements such as "The Manchester machine contains 64 magnetic tracks each with a capacity of 2560, eight electronic tubes with a capacity of 1280. Miscellaneous storage amounts to about 300 making a total of 174,380."


Indeed it should be 10^9. Likely an OCR error - you can find several other 10^9 properly formatted somewhere else in the paper.


And don't forget that the word bit was only around two years old at the time the paper was written. (I have no idea when Tukey first suggested it. I'm using Shannon's publication for a date.)

10^9 bits is only (check my math) about 120 MBytes.

So Turing was perhaps a bit optimistic there?


That's almost certainly meant to be 10 to the 9th.


More likely 1E9


30% of the time, five minute conversations, a "simulated" 13-year old? At least "the conversations were unrestricted".

The Turing test is not a test. Instead, it is an operational definition of "intelligence", a very slight formalization of the idea that something is intelligent if it seems to be intelligent.

As a test, it obviously has to have some kind of limits like this "competition", but as soon as you put limits on it then it stops being useful and becomes both gameable and meaningless. The Turing test has already been passed, long ago, if you have limits suitable to the Doctor or Parry.


> Instead, it is an operational definition of "intelligence"

Exactly. It's a straightforward formulation of what a strong AI would be capable of. It makes no sense to have a restricted Turing test that can be passed by a useless chatbot. It means absolutely nothing.


Not at all. If you've been following this, the age of the test has been getting older and older for the past 15 or so years. They're not resting on their laurels at 13-year-old, next year maybe they'll make 14, and the year after that...

It's a way of having a better metric than pass/fail. This is the real problem with AI. Everyone expects that AI research should go: you put a bunch of programmers in a room, and they work for a few years and build an AI.

The best intelligence production we have is human child-rearing. This process has always taken 15-20 years, and is backed by millennia of research. Assume you have a digital computer capable of human-like thought. Without an example of a computer capable of learning faster, it stands to reason that raising the computer into an AI capable of conversation should take 15-20 years.

Of course, one thinks there would be a way to do it faster, but on the first go?


I assume you mean weak AI. And I don't see how you can dismiss the meaning of passing this test so easily. Although it's a trivial example, I can definitely see such chatbots being applied for spamming purposes, which by definition exploits hapless victims.


No, I think that's a correct use of strong AI.[1] "Strong AI" is work towards general-purpose intelligence, at least as sentient, conscious, intelligent, or whatever term you prefer to use as you are. "Weak AI" is work towards usefully solving problems that would previously have been thought to require general intelligence, such as chess programs or self-driving cars.

[1] Although usage may be changing. Pity.


Weak AI is when a computer can (appear to) act intelligently, strong AI is when a computer can actually think (Russell & Norvig). Therefore, passing a Turing test would be a textbook example of weak AI, because you only need to simulate intelligence well enough to fool the judge.


Like I said, usage is changing.

I studied AI shortly after the "AI Winter", which is where I got my definition. Strong AI---working towards a general intelligence---was strongly out of favor, especially with funding agencies. It still remains so (but see Watson). But solving limited problems heuristically (or statistically) that are not otherwise algorithmically tractable (a loose translation of "would appear to require general intelligence") has always been a fertile field.

Turing's argument, which is a philosophical argument, is not meaningful if you put any limits on it---time, topic, behavior (really, Parry is better than the Doctor), which is why it is better thought of as a thought experiment. If you have limits such that it can be gamed, then yes, it is fair to say "you only need to simulate intelligence well enough to fool the judge." Which makes it uninteresting.

But the question is, if you can "simulate intelligence" well enough under any conceivable circumstance (and yes, all actual human beans will fail here), how can you say that it cannot "actually think"?


The Turing test is not a definition of intelligence. Turing explicitly suggests the test in order to replace the question of the definition of intelligence. The test is most definitely a test, in the sense that it provides an estimate, rather than a clear cut definition.

While the limits in this case were admittedly rather strong, this does not form a fundamental objection to the test. The bar can be set progressively higher.


What is the limit, the ultimate height of the bar, then? A test that requires three score and ten years with a 50% success rate? Longer? The problem with the Turing test as a test is that it is always possible to game it, to create a system that can pass with any set of feasible limits without being able to do anything else.

The problem with positively defining intelligence is that all such definitions seem to end up begging the question and are therefore unsatisfactory to somebody. Which is what makes the Turing test philosophically interesting.


I don't know about limits, you can decide for yourself what it would take for you to be convinced some agent is "intelligent." The fact that we've been trying for 70 years is completely immaterial. How long until we got the ability to fly? Seems like we had been trying at least a few thousand years, that says little about feasibility.

The test is an adversarial game, and both sides can attempt new strategies of fooling or seeing through the opponent. I think your claim about being able to pass any set of limits is rather bold, and I would be curious about arguments for it.

We're already in the territory where there are certain applications for these chatbots, and there is no fundamental argument why we this line of research could not progress further until it reaches its goals. I don't mean to say I believe we'll have AI soon, just that you can't fault the Turing test for not doing what it should.

What do you mean when you say there is a problem with positively defining intelligence? Do you think we should define it negatively? There is a lot of criticism of the Turing test, but alternative proposals are exceedingly rare.


You can also pass the test this way by having "generous" judges contributing to the 1/3, which is likely because the judges are not impartial: they are emotionally invested in being part of a positive result. I wonder how Kevin Warwick himself voted, for example.

A more correct test (which admittedly doesn't cover this issue) would be to give each judge a conversation with one human and one computer, and for them to say which one they believe is the human.


>to give each judge a conversation with one human and one computer, and for them to say which one they believe is the human.

I always assumed this was exactly what the Turing test was about. Guess I was wrong.


This is indeed what the Turing test as originally proposed is about.


A blogspam linking to this page also claimed the alleged "boy" was described as a Russian boy to whom English is a second language. The official report from University of Reading doesn't mention this, can anyone shed more light on this?

http://gizmodo.com/this-is-the-first-computer-in-history-to-...


The Turing Test is a terrible measure of sapience. Generally it involves using "average people", who have been shown time and again to be overly credulous when talking to these bots. If the test is to be used at all, it should consist of computer science experts instead--people familiar with the technology and bot tricks of the trade.

That the Turing Test is still used is proof that we still don't understand how to even define sapience. Without a definition and concrete, testable qualities, how can we possibly hope to ever build artificial sapience. As a result, we continue to see these toys that are little more than parlor tricks.

Any true test should include looking behind the curtain. "I know you're artificial--I can see the processes working--yet I have doubts that what I'm seeing is real."

In other words, real success is the tester believing he is being fooled when he is not, rather than fooling the tester into believing it is real.


That we don't understand how to define intelligence is exactly the point. That's why the Turing test is interesting.

The problem with "looking behind the curtain" is that it traditionally boils down to what I like to think of as the subtle fluid model of intelligence. If you know what is behind the curtain, then obviously it can't be intelligent, because it is not running on the right hardware, for gooey definitions of hardware, or because it doesn't have some Homunculus of Definite Understanding (Hi, Serle!), or because we can see behind the curtain and know what it is doing. Obviously, if we know what it is doing, it is not intelligent, right?


If you look what's behind the curtain of working brain chances are you no longer think of human brain as intelligent either.

Seems that for many people though, especially in CS, intelligence gonna be something that works in a way that is complex enough for them to not understand how it works.


I was fortunate enough to take a philosophy of mind class from Searle during my Computer Science undergrad. Very interesting experience.


It has been a while since I read him, and I don't think I ever got a satisfactory answer: how does he get around the unbounded recursion of the Chinese room argument?


I never agreed with him completely on his views in this regard, but his main point is if you have a system that's governed by rules on input (Like someone looking up the answer in the Chinese room), there's no subjective understanding. So even if you had such a device(room) that behaved flawlessly, there's nothing in there that actually "understands" what's going on. The paper that the Chinese characters are written on doesn't understand, they're paper. And the person carrying out the instructions doesn't understand either, they're just following orders. So what's having the subjective experience/understanding?

The main point I took away is that he feels that consciences is an ordinary biological process, and a simulation of that process is not the same as the process. In the same way that a computer simulation of a stomach digesting food isn't the same as an actually stomach digesting food. No matter how good the simulation is, it doesn't actually digest food. So a simulation of consciences, isn't actually consciences, and doesn't have a personal subjective experience.


  Generally it involves using "average people" [...] it 
  should consist of computer science experts instead
I don't agree. The prominent reason for "dumbing down" the judges follows the same reasoning behind decisions made regarding what constitutes "adequate" encryption. How can we gauge what will honestly happen out in the real world, today?

Consider DES. It was deemed inadequate, but how to prove it? The EFF came up with a budget estimate based on what was reasonably affordable for a group of attackers, and built a machine capable of cracking DES within those budget constraints.

http://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/HTML...

So, let's apply similar reasoning to the concept of AI. If a group of people were to build an AI, and use it as an adversary against ordinary people, how difficult is it to manipulate and deceive ordinary people into taking action, and is it feasible to do so with AI?


Since this chat bot passed the low bar set forth, do you believe that it is intelligent? Do stage magicians perform real magic simply because they fool the audience?


I believe it's practical to maintain an awareness that the bar is as absurdly low as it seems to be, and that well-developed deception may become the norm when interacting with online entities.

Do I believe that it's intelligent? Well, I wouldn't confer human rights to it. It doesn't carry the weight of emotional investment that a domestic pet might.

But that's the sort of thing to stay wary of. Average people becoming emotionally invested in silly things. People being tricked into carrying around an urn full of ashes, and believing that a convincing AI truly represents their late relatives, and similar sorts of tomfoolery.


I have a friend who used a sex chat bot to sell things to people. It was something very basic but incredible effective.

It made me think that we need to divide the turing test for different people: by age, by need, etc.


"I know you're artificial--I can see the processes working"

That criterion implies that once we have a much better understanding of the brain that humans might start failing it. What if you have a neuronal tracedump for a person involved in a conversation?


Perhaps I just asked the right questions, but I just had a lovely conversation with 'Eugene'.

I asked him a few questions like where he lived, his name, if he has brothers or sisters, if he wears glasses, etc. Eventually he started asking me questions like what I did for a living and where I lived. He also managed to form questions based on my answers.

It almost felt like a conversation. I can honestly say, I've never thought that before while talking to an AI. So far I am pretty impressed.


Next time it's up, try asking a few of these: http://www.cs.nyu.edu/davise/papers/WS.html


Is there somewhere online you can talk to him?



Realize that this version is from 2001 and is not the version that was in the recent test.


mirror? it down now


Awesome job of putting un-followable links in the press release, Reading University.

I'm going to take this with a hug pinch of salt until I've read the transcripts, due to the involvement of famous publicity-hound Kevin Warwick.


Yes, at this stage any time I read his name involved with something I cringe.


Managing to successfully imitate an ignorant, immature child 1/3 of the time is not what I would call a success, but rather a subversion of the entire intent behind the Turing Test in the first place.


I think you're probably expecting to see a 50/50 chance (or higher) of guessing it was a computer. But if you think about it, anything better than 50% would make it more human than a human. So is why the target is lower than 50%


Wait, could you explain how a program that is capable of being mistaken for a human more than 50% of the time is more human than human?

I'm pretty sure you're human, with a high degree of confidence, much higher than 50%.

So to are you more human than human?


I believe that in the original formulation of the Turing Test the judges were asked to hold two conversations: one with a computer and one with a human. Afterwards they chose which of the two they believed to be the human. In that scenario, being identified as "human" more than 50% of the time would indeed make you "more human than human"; with a "perfectly human" program the test becomes essentially a coin-toss. In that context the 30% threshold makes a good deal of sense.

But in this variant, "Do you believe the entity you spoke with to be a human or a program?", a 100% threshold is theoretically achievable. That means you're entirely correct: a 51% vote of confidence is decidedly less human than human. Additionally the 30% threshold they've used is laughably low in this context. Without a control group, even an 100%-confidence outcome probably says more about the beliefs of the judges than the ability of the program to simulate a human.

I don't mean to take away from the no-doubt impressive achievement by the team behind Eugene. I just take issue with the hyperbole in its reporting. But, ya know, the media will be the media, and academics gotta get research grants.


ahh. That's almost certainly what splike meant. Thanks.


It's not a random guess though is it? It's a sample of results.


I remember reading the first chapter on an AI book during college which laid out the objectives of AI research into two schools of thought.The first school of thought believes that AI can be achieved by mimicking real conscious beings. Remember when man first tried to fly, most devices mimicked birds and failed horribly. The other school of thought believes that intelligence has well-defined principles, when discovered, would produce real intelligence (not mimicry), having the effect that the final product may not resemble what we see on day to day basis. Compare planes, which use the principles of flight, to the early flapping machines. A bird and plane both fly but they are very different in the way they do it. Approaching flight from the mimicry angle is hard, it's only recently in the last 10 years we have made light-weight flapping machines that fly well;yet we have had planes for over a hundred years once we knew the principles.

Transcripts would be handy. I doubt a conversation with a 13 year old boy is a good way to measure AI? It's not the best metric to have but it is the most universal and most widely agreed on that we have. It seems like we are happier with small gains in mimicry for now, since real intelligence is hard. Really hard.


> The first school of thought believes that AI can be achieved by mimicking real conscious beings

It's important to note the true meaning behind this school of thought - that mimicry and "true" intelligence are actually equivalent. Behaviorists believe this, and the validity of the Turing Test along with it.


The wording of your comment reveals your bias towards the other school of thought, Strong AI. But what is "true" intelligence? I don't think anyone has a clear definition of this, and in reality it's all just a philosophical quagmire. An important distinction is whether mimicry is equivalent to "true" intelligence, or whether it should be treated as such, barring a better litmus test. With that in mind, taking a methodologically agnostic perspective of adequately reproducing behavior that is predicated as intelligence seems like a much more reasonable goal to me.


Strong AI wouldn't make same silly mistakes which people make all the time. So would the Strong AI need to be smart enough to play dumb to fool people? Because people are illogical, smart machines surely wouldn't be?


A Strong AI is by definition more intelligent than humans, so it could also deceive. No need to "be illogical", just playing along. In any case it's rather irrelevant, because presumably if there is a Strong AI, it would be so impressive that there is no need for a Turing test to prove it.


I think it might actually be in Civ5 that I first heard the following quote:

"Asking whether a computer can think is about as useful as asking whether a submarine can swim."

I think this is fairly insightful and relevant.


That's from Djigstra.

I digress. Most of the times, mimicking Nature works very well. We tend to always remember our most spectacular failure, but that's a selection bias.


Cars don't run like horses, solar panels don't use photosynthesis, engines don't work like muscles, computers don't look like biological neurons, etc, etc.


His name is Dijkstra. I'm curious how you arrive at the conclusion that mimicking nature works most of the time, do you have some examples?


Looks like the post is too old, and I can't correct it anymore. On my defense, I was quite tired when I wrote it.

You don't have to go very far looking for examples. The fictional Nautilus submarine was named after the animal Julio Verne copied its depth control system (the same one all real submarines use till today). Also, the first submarines' shape copied whales, a design that was adapted (but not completely replaced) because it does not work as well witout also copying their propulsion system.

By the way, birds wings are better than planes ones in several important ways (but worse in a few others). We don't copy them because we don't have good enough materials, not because it's not a good idea.


Remember this book too. It's a classic text in AI. I think what they were trying to say in the intro is that the approach has a significant effect on how much effort is wasted. The engineering effort required to get a plane of the ground using moving wings is considerably greater than using rotational parts like propellers. Imagine how many decades would have been wasted to scale an orthinopter to handle the load of a passanger jet.

This makes me think that AI research isn't going to be a gradual process of research being built on other research but it will be a eureka or an ah-ha moment that changes everything.


That book you read was Russell and Norvig, it's one of the best textbooks ever written on any subject:

http://aima.cs.berkeley.edu/


The Turing test is to our understanding of intelligence what sleight of hand is to our understanding of physics. Tricking people, as a goal, is not conductive to science.

A researcher claiming to have passed the Turing test instantly discredits himself as a prestidigitator looking for PR buzz. The present article is a textbook example of this.

As a side note, if you are focusing on disembodied, language-based human-like intelligence, then the paradigm you operate in is many decades behind. The Turing test was conceived at a time when the notion of thinking machines had just started to emerge --a very different time from today, where we have 60 years of AI research behind us. The Turing test has been irrelevant for longer than most AI researchers have been alive. I have never seen it used for anything else than smoke-and-mirrors PR operations.


Your opinion is not universally accepted. (Perhaps not even mainstream?).

There's an element of behaviorism which stretches back to Descartes - how can you know that I exist? That I think? You can only observe through my behavior; that my behavior mimics yours.

How then can we judge machines any other way?


The Turing test only tests for mimicry in human conversation, not "intelligence" as measured through a behavioral framework.

A better "Turing test" proposed a few years ago was, "develop a team of robots that play soccer so well they can win the World Cup". While not the best possible driver of AI research, the quest for such a goal would still drive AI research orders or magnitude better than "trick random people to mistake a chatbot for a human".


>develop a team of robots that play soccer so well they can win the World Cup

How is this any better that "build a bot that can beat Chess Masters" ?? Well, apart from the part where this includes multiple agents, and swarm robotics, etc. But frankly, setting a game as a bar actually covers relatively less scope in my opinion. Conversational bot actually does cover a LOT of scope if you think about all the possible ways the conversation can be taken. In fact, we as humans use conversation to judge other humans' intelligence as well. E.g. job interviews.

Though perhaps an advanced turing test could include performance in a variety of social situations like...

- convince employers to hire you, - Convince a girl to go out with you - Convince a customer to buy something - Debates... (Presidential debates by AI would be interesting)


Though how accurate the Turing test correlates to "intelligence" can be debated, Turing himself believed that mastery of language demonstrates reasonable intelligence.

However, in this case, "Eugene" claimed English as his second language, which seems as close to cheating as it gets.


How is "winning at soccer" a better proxy for sentience than "able to hold up its end of a conversation"?

One of the ironies of AI is that trivial tasks for humans like recognizing a soccer ball, moving around without falling over, or, yes, holding a conversation are very, very difficult while difficult tasks, like playing chess or finding the best route on a map, are relatively trivial.


Is it really better? It sounds completely within capabilities of current robotics and algorithms. Mix up DARPA robots and AI from FIFA and you'll likely have a winning team.


Please. That is many decades away, and the hard part about it is definitely not the soccer strategy AI...


His opinion is actually a mainstream opinion in AI, see for example the textbook by Russell & Norvig. Descartes was the very opposite of a behaviorist, namely a rationalist. I do agree that we don't have anything that is clearly better than the Turing test, but this just goes to show how far we still have to go.


I missed the AI Winter. Norvig survived it. I would imagine that colors his thoughts on work towards general intelligence. Also the fact that all AI research that has actually done anything interesting has been of the "weak" (as I learned to call it) variety.

But look at me, ascribing thoughts and intentions on a black box that I cannot be sure really exists.


Your spec is over generalized into all forms of observation and comparison. The Turing test is very limited into written 1 on 1 conversation as culturally conceived by humans. There's no particular reason to assume a non-human intelligence would have a human-like culture of communication, which kind of breaks it.

Here's a fun idea for several intelligence tests we can call the VLM tests that have nothing to do with two way conversation like the Turing test.

Given "a machine intelligence" spin up a couple million of them in a "fun" simulation environment and see how much thermodynamic dis-equilibrium they generate by whatever social interaction they see fit to apply to each other. Is it as interesting (aka thermodynamic dis-equilibrium) as a GoL or a real world anthill or a Dwarf Fortress or a Google Earth? Is their simulated culture as interesting to read as HN, or as dumb as youtube comments (which used to be the gold standard of dumbness in social media)

Assuming you can crack the literary code (if any exists) another game to play is extract the meme-flow of a culture of AI vs a culture of 4chan and vote for whichever meme came from a more intelligent group. This is Turing-ish WRT human observers majority vote and such, but is completely non-interactive, merely humans, or even trained sociologists, trying to figure out given two memes which is more intelligent.

Getting out the Sherlock Holmes hat, its possible to determine if an artifact came from an intelligence without talking to the intelligence for awhile. I suspect archeologists have really fun debates on this topic. Is this a stone hammer or merely a peculiar river rock, etc.


How do you define intelligence, then?


Surly they did not talk to this one [1] - it is light-years away from being convincing.

[1] http://www.princetonai.com/bot/


I tried talking to this one, and it failed to convince me at all. Even in the one conversation I had that was slightly believable, the bot was prone to extremely rapid emotional swings.


I was impressed it could write grammatically sound sentences, but it's not even close to human-level conversation...


I would be impressed if it didn't. If it made common mistakes a human would make, it would seem more human.


Bold claim, but without any transcripts it's impossible to verify just how close to the truth the claim comes.


That's true. I want to talk to Eugine. Your suggestion seems to be the very minimum.


When people imagine what a Turing Test conversation would look like, they frequently underestimate the conversation. I find Dennet's example of an imaginary Turing Test from Consciousness Explained to be a good counterexample:

Judge: Did you hear about the Irishman who found a magic lamp? When he rubbed it a genie appeared and granted him three wishes. “I’ll have a pint of Guiness!” the Irishman replied and immediately it appeared. The Irishman eagerly set to sipping and then gulping, but the level of Guiness in the glass was always magically restored. After a while the genie became impatient. “Well, what about your second wish?” he asked. Replied the Irishman between gulps, “Oh well, I guess I’ll have another one of these.”

CHINESE ROOM: Very funny. No, I hadn’t heard it– but you know I find ethnic jokes in bad taste. I laughed in spite of myself, but really, I think you should find other topics for us to discuss.

J: Fair enough but I told you the joke because I want you to explain it to me.

CR: Boring! You should never explain jokes.

J: Nevertheless, this is my test question. Can you explain to me how and why the joke “works”?

CR: If you insist. You see, it depends on the assumption that the magically refilling glass will go on refilling forever, so the Irishman has all the stout he can ever drink. So he hardly has a reason for wanting a duplicate but he is so stupid (that’s the part I object to) or so besotted by the alcohol that he doesn’t recognize this, and so, unthinkingly endorsing his delight with his first wish come true, he asks for seconds. These background assumptions aren’t true, of course, but just part of the ambient lore of joke-telling, in which we suspend our disbelief in magic and so forth. By the way we could imagine a somewhat labored continuation in which the Irishman turned out to be “right” in his second wish after all, perhaps he’s planning to throw a big party and one glass won’t refill fast enough to satisfy all his thirsty guests (and it’s no use saving it up in advance– we all know how stale stout loses its taste). We tend not to think of such complications which is part of the explanation of why jokes work. Is that enough?

Dennett: "The fact is that any program that could actually hold up its end in the conversation depicted would have to be an extraordinary supple, sophisticated, and multilayered system, brimming with “world knowledge” and meta-knowledge and meta-meta-knowledge about its own responses, the likely responses of its interlocutor, and much, much more…. Maybe the billions of actions of all those highly structured parts produce genuine understanding in the system after all."

I'm sure they didn't get anywhere near this with their 13-yr-old simulation. But this gives an idea of the heights AI has to scale before it can regularly pass the Turing Test.


That excerpt reads to me like writing, not conversation. Someone spent some time polishing it. I know people who can talk like that extemporaneously, but I'd wager 99% of native English speakers wouldn't pass if that's the bar.


True. What the example highlights is that the Turing Test is not about 'simulating any old conversation' but is specifically about holding a convincing conversation with a human 'judge' who is likely to take the conversation is a complicated direction if they are taking their role seriously.


Remember that the imitation game that forms the foundation for the Turing test pits males versus females, with the goal of the females pretending to be male. Allowing speech would normally reveal the males due to the voice being different - it was therefore suggested that the test be preformed in writing.

[edit: Ah, I had the details wrong, see https://en.wikipedia.org/wiki/Turing_test ]


Well obviously humans are more slovenly than that. Though talking was never a requirement, and indeed the Turing test could be run through a succession of emails. Or perhaps a forum like HN. So it's not unreasonable.

Though your right, and if a computer were to try to imitate a human, a better strategy would be about as slovenly as my post is.


Yeah the turing test is often imagined as a writing/chat/email exchange - simulation of voice/voice recognition is not really a vital part of it.


I thought that was an obvious example of a computer system, as it was a labored and overly detailed description. I would have immediately flagged it as a computer system, and not a human.


Well OK, but if you had a conversation like that with a bot, would you be prepared to consider the bot as being conscious? Thats the deeper question that the Turing Test is really about, rather than human/not human.


You know - Marc Andreessen was tweeting about this today - and he held the same view as you. But, every book I've read on Turing, and every article I've read on the Turing tests suggests that the entire idea behind the turing test was to not get caught up on concepts such as "thinking" or "intelligence" - but to just posit a test to see if a machine could imitate thinking behavior. This then, provides a nice unambiguous target for research and development, without worrying about being caught up in the semantics of the conversation.


Perhaps the lesson is that no matter what your starting intention, the rules of a competitive game are going to be optimized for. You might start out with the goal of creating a general test of physical prowess. A 50m sprint, weight lifting contest or even a wrestling match is too specific so you invent a general game where strength, speed, endurance, etc. matter and you call it rugby.

If no one had heard of the rules in advance, rugby would be a pretty decent test of general physical prowess. Maybe not 100% perfect, but out of a population of 1000, the 50 best rugby players would probably match most people's top 50 list of physical specimens well enough.

But, once you have people training and optimizing for it, you find that (a) training for rugby specifically matters. (b) Rugby is optimizing for a particular set of physical characteristics.

Chat bots designed to win the game are basically designed to fool people into thinking that they're human because that's the game. It isn't really a good proxy for consciousness.


But if I was talking to a bot and it was able to hold a conversation as complex as the one above, for a long time and without glitches etc, I'd be prepared to consider it 'conscious'. You have to consider how difficult it would be to pass a Turing Test _reliably_ with a decent judge who took the conversation in interesting directions.

re: Chat bots designed to win games: Some say that's exactly what we are! - The Social Brain Hypothesis of the evolution of human intelligence suggests that the reason our brains grew so big was that intelligence (via ability to deal with social groups) became a large factor in reproductive success.

http://en.wikipedia.org/wiki/Evolution_of_human_intelligence...


I guess what I am saying is that I think the Turing test was a way to demonstrate an idea without being able to define it specifically.

I think the focus on Turing tests is interesting and has definitely expanded knowledge in this area. But, it is now an area within the search for artificial consciousness. It no longer works as a test for it as it would if a computer just happened to stumble on the test and pass it.

That said, I do thing that where we are visa a vis the Test is a cool benchamark. I would be over the moon if one of the Turing bots got to the point where it could do a job, like being a customer support bot.

Hopefully someplace slightly north of the Turing test goal post there will be commercial goal posts to encourage development, hopefully a conversational user interface. A convincing chatbot as a user interface would present lot of very interesting challenges.


Well yes there's kinda two different views of the Turing Test:

1) Consciousness is really hard to define so the Turing Test is a handy workable yardstick that AI can use as a milestone until we get a proper working definition of consciousness

2) (Hard-AI, behaviourist position) Appearing to be conscious and being conscious are the same thing. Hence the Turing Test is about as good a definition of consciousness as we are ever likely to get. Perhaps it could be tightened up a little - insisting on really long conversations with lots of complexity etc. But a good judge running a test over a longish time period would see to that.


You would label be as a computer, then.


If a 13 year old responded with such a detailed, well thought out response as to the analysis of a joke, it would clearly never pass the turing test. If the response was "duh, he didnt realize he'd never run out of beer!" Then it might be more likely.


If it could respond with a detailed, well thought out response, it wouldn't have to take the guise of a 13 year old who speaks English as a 2nd language in order to cover for it's shortcomings.

If it came up with any analysis of the joke that was vaguely correct, it's doing much better than anything out there. If it came up with any analysis of the joke at all, I'd be surprised.


I don't really follow your argument. Why should Dennett set the bar for the Turing test with his fictional example? And how is this example any different from Turing's original example about rewriting Shakespeare's "shall I compare thee" sonnet? This sort of conversation is more like a courtroom cross-examination, which is incidentally typically well prepared by both sides. A program that could pass such a test would indeed be a milestone, but that doesn't detract from the achievement of a conversational agent that pulls of a more spontaneous form of dialogue.


Ok, so Dennett is being eloquent and both participants are very intelligent, but "explain the joke" is probably a good test. The point is to get the AI to do something that requires meta-knowledge.

That isn't to say all humans have to have meta-knowledge, but the test passing would be more convincing if the AI could do something most everyone can do, like explain a joke.


I'm just saying that some people underestimate what it would take to pass a true Turing Test. The judges could and would take the conversation in any direction.


To address some of the other replies to this comment ("it would more convincing with spelling errorrs" &c.), Dennet isn't directly concerned with the Turing Test itself here: he's attacking the Chinese Room argument [1], the formulation of which he regards as cheating, and provides the quoted conversation to illustrate the limits of what a human manually translating input Chinese symbols to output Chinese symbols could actually achieve.

[1] http://en.wikipedia.org/wiki/Chinese_room


I whacked Chinese Room's prose in LibreOffice, then performed a spellcheck. Most examples of written English that I see on the Internet contain mistakes, yet CR is pretty much perfect.


Has Dennet never seen Facebook or Youtube?


Not when he wrote that (1991)


Ironically, in a later book (Darwin's Dangerous Idea, 1995) he mocks people who got duped by an Eliza program on a disconnected laptop, since "obviously" a computer must be physically connected to the wall in order to talk to the outside world.


The point missed is the Turing test was an abstract thought experiment into how we perceive the presence of intelligence.

If a decade or so of social media (whatever that means) has proven anything, its that very little intelligence occurs in virtually all conversations.

The meta Turing test is being failed by many people who think it (a concrete implementation of it) means something. Much like actually building a well sealed box with a cat, a radioisotope source, and a geiger counter wouldn't actually be a "great step forward for Quantum Physics" in 2014. Any more than making a little anthropomorphic horned robot and having him divert fast "hot" molecules one direction or slow "cold" molecules another would be a great step forward for thermodynamics in 2014.

The value of a thought experiment is realized when its proposed, not when someone makes a science fair demonstration of the abstract idea.


Lots of self-congratulation all round with no sample questions to provide the merest smidgeon of 'reason to believe that this is that significant'.


> If a computer is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations it passes the test. No computer has ever achieved this, until now. Eugene managed to convince 33% of the human judges that it was human.

Surely it depends on who the human judges are. It seems a bit unfair that the judges normally have IQ > 100 and the other humans have IQ > 100.

I strongly suspect that some simplistic AI (alicebots, for example) would beat the Turing test if the human judges had IQ between 90 and 105. (Especially if we're using the limited 30% rule above).

Getting bots running on some Facebook groups might be interesting.


"a computer programme that simulates a 13 year old boy [...] If a computer is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations it passes the test."

In short they did nothing.


I don't know exactly how Eugene works, but I am quite sure like most chat bots it simply reacts to keywords or preprogrammed patterns. Basically it's Eliza [0], but with more scripts. I believe most people here would not give it credit for winning an intelligence test.

There is actually more than one bot, which has been claimed to have passed the Turing Test before. Cleverbot is one of them [1]. There are also several competitions, but I believe the most reputable and long standing one is the Loebner Prize [2]. The bot that currently holds the Loeber Prize is Mitsuku [3].

Anyway, you can chat with Eugene at [4], I gave it a try. I believe there is one thing that the creators of Eugene got right. When chatting with other chat bots, I usually in a situation where the bot says something, I ask I followup question (like "Why?"), and it gives a generic answer like "Because I say so" or "I don't know". Eugene does the same but will ask a unrelated followup question right together with the response. That way at least there is not a weird pause in the conversation.

[0] http://en.wikipedia.org/wiki/ELIZA

[1] http://www.geekosystem.com/cleverbot-passes-turing-test/

[2] http://www.loebner.net/Prizef/loebner-prize.html

[3] http://www.mitsuku.com

[4] http://www.princetonai.com/bot/bot.jsp


Any chance of seeing conversation logs?


Although the Turing Test is interesting, it is not, in my opinion, all that useful. I would much rather see chess program level of performance in the domain of medical diagnosis, for example.

There are lots of other domains where I would be entirely happy to know that I was talking to an AI, if the answers I was getting were significantly better than most human experts in that domain.


"If a computer is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations it passes the test. No computer has ever achieved this, until now. Eugene managed to convince 33% of the human judges that it was human."

SO the result can very depending on different conditions. :) Highly non deterministic


Me: Do you have more hair on your head or on your eyelash? Goostman: If I say that I haven't more hair on my head or on my eye lash - will it satisfy your inquisitive mind? By the way, what's your occupation? I mean - could you tell me about your work?

This is not passing the Turing test by any stretch of the imagination.


I've often wondered if the Turing Test has been decoupled from signifying its original goal due to an instance of Goodhart's Law; namely, "When a measure becomes a target, it ceases to be a good measure."

Surely the ability to trick a human into believing an AI is a human is a milestone, but it was with an AI specifically optimized for this task. The deeper question is if the passing of the Turing Test in this case means we should ascribe consciousness to the bot, and I think none of us are willing to affirm it yet. I would suggest that this discrepancy is caused by the “measure becoming a target” and losing its ability to be a “good measure.” I guess this is why there is such a critical distinction between Artificial Intelligence and Artificial General Intelligence, which is where the Turing Test would have more weight.


Between this and the Ars Technica article[1], I'm still confused: Was this a regular Turing test? Who was the humans that the machines tested against? As far as I recall, the model is two participants, one human, one machine -- the judges communicate with each through writing -- and if the machine "tests" as human more than 30% of the time, it's considered a "win" at the imitation game (the machine has successfully imitated being human). Both the machine and the human are supposed to try to appear human.

(And this is extended from another form of the imitation game, where the goal is to imitate being male, where participants are male and female)

Have anyone been able to find any more concrete information (and perhaps some transcripts)? If not I hope someone will set up a new test, and invite "Eugene" to participate.

[1] http://arstechnica.com/information-technology/2014/06/eugene...

[edit: We may be given some hints from the wikpedia article on the turing test: https://en.wikipedia.org/wiki/Turing_test#Imitation_Game_vs....

"Huma Shah and Kevin Warwick, who organised the 2008 Loebner Prize at Reading University which staged simultaneous comparison tests (one judge-two hidden interlocutors), showed that knowing/not knowing did not make a significant difference in some judges' determination. Judges were not explicitly told about the nature of the pairs of hidden interlocutors they would interrogate. Judges were able to distinguish human from machine, including when they were faced with control pairs of two humans and two machines embedded among the machine-human set ups. Spelling errors gave away the hidden-humans; machines were identified by 'speed of response' and lengthier utterances." ]



Adding random spelling errors and delays to responses should be one of the more trivial "improvements" to a chatbot.


A real test of (Artificial) intelligence would be, must include that it is real capable of learning new things and developing and not trying to convince others that it has learned something (what in this case in my opinion is wrong, since it is just knowledge that is programmed into it -- meaning, not its own knowledge, but borrowed knowledge of the programmers).

I would also add, that the real prove must include a topic that the artificial person was not programmed for. (not like a Bayesian filter that "develops" by "learning" new facts about a fixed topic).

Learning, developing, evolving, that are the real marks of living and of intelligence (since, I would not part between intelligence and living).


If it was hosted by the Royal Society, wouldn't it be on their website (https://royalsociety.org/events/?type=all&direction=past)?


It was hosted _by_ the University of Reading _at_ the Royal Society.

https://royalsociety.org/venue-hire/


Spoken language is just a small part of overall communication. Today Turing Test should consist of not only speaking but simulating, perhaps by a streaming video, a professor in front of a class of wild students. He should strive to capture their attention, to gain their respect and interest, to understand their inner state of mind.

Speaking is only a way of getting into the stage. Once into the highlights you must prove you are a leader or, if you decide so, that you are able to gain the attention of your audience to emphasize something important that previously was not perceived as such. That is speaking is an art, is not about explaining a plot but about creating a story.


That headline should read: "33% of human judges flunk the Turing Test".


This looks awfully like a private event for Reading University that just happens to have been held at the Royal Society. I wonder whether they even had more than three non-"celebrity" judges.


Ah, Reading University - the powerhouse of computer science...

Move on people, it's just a cheap PR stunt.


Even though many AI researchers will agree that the Turing Test isn't a very good representation of "real" intelligence, this is still a huge milestone. Many, many researchers have tried and failed to pass the Turing Test.

But people will continue to dismiss the state of the art and deny that computers have "real" intelligence, the same way they did when the computer defeated Kasparov, the same way they did when we saw Googles self driving cars, the same way they did when a computer won on Jeopardy, and now with the Turing Test. Even when we have robots that look and act exactly like humans, many people will say that they are not "really" intelligent and dismiss the accomplishment. They will still be saying that when AIs twice as smart as people arrive and they have to figure out what to do with billions of what will then be, relatively speaking, mentally challenged people.


>“I feel about beating the turing test in quite convenient way. Nothing original,” said Goostman, when asked how he felt after his success.

So how long until the creator can pass the Turing Test?

(Normally I'd ignore that, but given the subject matter...)


This doesn't seem like much of a milestone to me. If Ray Kurzweil wins this bet against Mitch Kapor, that will be a milestone: http://longbets.org/1/


Did they also run the experiment with an actual 13 yo kid?


...Imagine someone releasing bots on XBOXLive. Your task is to guess which obscenity-screaming 13 year okd is real and which is a bot.

Some forms of Turing test are trivially passable with dumb enough humans.


How feasible is it to have hosted state of the art Turing bots available to the converse with anyone?


Failed.. Just keep saying hello to it and imagine it is a real people.


You can watch the goal posts shifting for "AI" as we speak. Great result nonetheless!


I appreciate HN's rule about brief, non-sensational submission titles, but perhaps this one's taken it to the point of absurdism.


It was the submitter's title, not a mod edit, if that matters. The submitter did a good job, because the rest of the title ("marks milestone in computing history") is certainly linkbait and arguably misleading. Having it there would likely have made this thread a lower-quality controversy than it currently is. HN threads are surprisingly sensitive to initial conditions.

Edit: Oh, and considering how many fluff press articles are showing up about this [1], the submitter also showed exemplary taste in source selection. Yay, stevejalim!

1. https://hn.algolia.com/?q=turing+test#!/story/sort_by_date/0...


It seems the submitter has passed your Turing test. ;)


looks like some smegging marketing smeg for smeg-heads


Stop bitching. Its passed. Get over it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: