Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Is the Turing Test Dead? (ieee.org)
33 points by rbanffy on Dec 1, 2023 | hide | past | favorite | 63 comments



Of course not. Turing's test isn't supposed to be quantified universally over examiners. The criterion is "does there exist a human examiner who can tell the difference between talking to a computer and the human".

I would argue that Turing's test has actually become less dead. It can now additionally be used to establish someone's qualifications to identify reasoned explanation. The computer isn't capable of engaging in it (yet), so being unable to tell if you're talking to a computer generally means that you are incapable of distinguishing bullshit from reasoning.


I'm just reminded of an old Invader Zim episode where Zim harvests and consumes(integrates) lots of human organs to "be more human." He ends up having like 5 livers, 20 spleens, etc. Just absolutely ridiculous.

The school nurse when examining him says "perfectly healthy with such plentiful organs."

That's where LLMs are at. "Perfectly human with such plentiful words."


It can now additionally be used to establish someone's qualifications to identify reasoned explanation.

The test subject in this case is now the person instead of the algorithm. Quite the paradigm shift I would say!


Yep, the original Turing test is adversarial: There's a computer subject trying to act human, a human subject trying to act human, and an examiner trying to tell which of the subjects is the human. If there exists an examiner which can distinguish the subjects with probability non-negligibly different from 0.5, then the computer fails the test.


In addition, to paraphrase my friend, the turing test was dead when computers could reliably tell the difference between humans and computers. It was utterly useless as a measure before, but now it is genuinely alive.


My perspective is that the Turing Test is now obsolete. Two months ago Peter Norvig and a friend of his wrote an article arguing that the goal posts have moved in measuring AI. I have been a paid AI practitioner since 1982, and when I watched my niece’s family last night passing around my iPhone running ChatGPT Pro, just the looks on their faces as they were having spoken conversations, in all matter of topics, and asking for images to be generated and refined, I felt that AI is here.

LLMs remind me of Plato’s metaphor of humans only seeing the shadows on a cave wall, not true reality. LLMs are trained on text and images and are disembodied. They learn from an information lossy projection, but still an interesting “reality.”

We are likely to get real AGI when APIs are embodied, and toss in a few new technologies and optimizations to run cheaply on edge devices.


I just found the words to express that people have really conflated AI with the singularity. I agree with you that AI is here. And that doesn't mean Skynet. It means that you can converse with a reasonably intelligent partner who adds to the shared pool of ideas. Just like another person does.

People think when AI gets here, and if it's even a little smarter than humans, it will compound over time, make itself smarter, and rule over us all. That ignores that humanity is made up of intelligent people, some of whom are smarter than others, and even organizations (and technologies, like books) that allow our ideas to persist past our individual lifetimes. And yet, no organization has come to dominate the world. Why would an AI?


It feels pretty close to the ship computer in original Star Trek. It wasn’t really an AI in that it was sentient, but a verbal natural language interface to a computer capable of running all sorts of analyses and ship operations. If it ran into something outside of its knowledge base it’d just say I don’t know and shrug.


> If it ran into something outside of its knowledge base it’d just say I don’t know and shrug.

This is something LLMs are yet to master.


> And yet, no organization has come to dominate the world. Why would an AI?

I think the general point is valid (that AI is an intelligent partner rather than Skynet) but organisations do dominate the world, especially the parts of it (including older organisations) that don't employ smart people or newer technologies. There is a view that organisations are a form of slow AI executing complex business / legal processes (traditionally based on people power).


> There is a view that organisations are a form of slow AI executing complex business

Multi-national companies do act a lot like hiveminds.

There are individual human beings directing at the top, but only because they comply with the hive's interests. Boards act as an averaging mechanism that seeks to maximize self-preservation and growth.

The anti-capitalist incentive to not understand why a company would want to grow beyond its positive impact on society think that these companies exist to benefit society. But once something has got big enough to become self-sustaining, and not depending on individual patches of land, it's only natural to see this pattern repeat.

I don't really see what's artificial about it, though.

Thousands of human beings clicking away on interconnected computers just seems pretty intelligent.

Adding LLMs to the mix seems like a way to boost the people who already have a function in an organisation without having to scale with people; for any task where LLMs can make a person better, your organisation will grow in proportion.


> Multi-national companies do act a lot like hiveminds.

Corporations are AIs. They are just much slower at making decisions than single humans. They also limit competition against themselves by consuming human time.

We are at the stage where we give an AI a limited, carefully described, goal and it fulfills it based on patterns it learned from human codified knowledge. At some point, not that far from now, there will be AIs that are capable of selecting goals based on more generic directives and optimize from there better than any human could do. At that point, where motivations become incomprehensible to humans, we can say we have something like a superintelligence.

It'll be interesting.


> And yet, no organization has come to dominate the world. Why would an AI?

Depends how smart, how many instances are running, and if those instances are aligned with each other.

And if you count the British empire as world domination. (It wasn't in charge everywhere, but it was dominant; two AI empires fighting each other in the same way the British Empire fought its peers still isn't good for the rest of us).

A fully general IQ 100 AI on commodity hardware radically changes the world economy, and yet isn't likely to make many breakthroughs. An army of robots guided by an equal number of such minds could well be on par with a human army of similar number and let a dictator fight a brutal war — even a smaller army in the form of assassin bots (which don't have to be humanoid, you can have an explosive kamikaze pigeon) is a radical change to combat on both a nation state level and also for non-governmental actors (organised crime, terrorism, vigilantes, paramilitaries, police, private security).

But there are also potential surprise failures modes. A group of replicated minds, no matter how smart, are going to have correlated failures. Figure out how to make one of those soldiers surrender, you know how to make all of them surrender. Make them smarter and have them do R&D or science, instead of bouncing ideas off each other, they all get stuck at the same place. You'll speed up research, but it won't look like an unbounded number of skilled researchers, it will look like one researcher with unbounded time.

(Not that I'm expecting IQ to generalise from humans to AI. Hasn't done so far, after all).

Also, an AI with no qualia which is really good at only one thing might become dominant just because what it does plays into a prisoners' dilemma where humans defect often enough ("enough" being a standard which depends on the details, e.g. we don't need many humans choosing to release CFCs to cause environmental issues).


In a way our societies are already super intelligences, honed by natural selection and with their own incentives working on top of, and not necessarily aligned with, the desires of each of its constituent human apes. AI will just seamlessly blend into that, I think.


Their goals are rather simple. While they are immensely powerful, they aren't much smarter than a fungus.


The smarter things have indeed dominated the world. Its not like chimps have much of a say in what direction the world goes in. The same is actually true of groups of people, find me two groups of people of similar size where one is on average smarter than the other and I'd expect the smarter one to have far more dominance over the world.


+1 well said


> Two months ago Peter Norvig and a friend of his wrote an article arguing that the goal posts have moved in measuring AI.

I think this is probably the one?

https://www.noemamag.com/artificial-general-intelligence-is-...

If it's another, could you provide a link, if you happen to have it at hand? Thanks!


Yes, thank you, that is the article.


> LLMs are trained on text and images and are disembodied.

Are we sure we aren't the same?

In fact, I'd argue that we "humans" have no way to tell whether we're actually human, or whether we're inside a simulation that's using simulations to train human intelligence.


At a fundamental level we are just fluctuations in a quantum field, which causes emergent conditions for the fluctuations to be able to consider how they exist in the first place. We see light but our eyes are lied to, there are far more wavelengths that we can't see. We hear sound but our ears are likewise lied to - someone could be screaming at you at 60k hertz and you'd never pick it up.

If reality is merely what we can experience, we miss out on a lot of what reality truly is because we view it through the lens of what we believe it to be based on our flawed biology and pre-existing biases.


Not quite. We have instruments much better than our senses to add to that experience.


True, so we get a slightly clearer view of reality, but it's still limited by what we can test for and find by experimentation.


To know everything would require a brain bigger than the universe.


The Turing test can be thought of as having strong and weak versions. The strong Turing test establishes a ceiling past which, philosophically speaking, a candidate is indistinguishable from a person. To pass the strong Turing test, you need a candidate who can get convince the most knowledgeable scientist of their humanity, as well as tell the most discerning 5-year-old a compelling bedtime story. The strong Turing test is hard to beat but just about every conscious human being can do it.

The weak Turing test establishes a floor below which we're pretty sure the intelligence on the other side is not a person. CAPTCHA did this nicely for a long time. False negatives aren't uncommon (ever been distracted or tired enough to miss clicking on a stoplight or fire hydrant?) Weak Turing tests are useful for practical reasons. Any quick and dirty "test" is going to be a weak Turing test.

The strong Turing test remains unconquered for now. It remains useful, because its use is primarily philosophical. Once it's been solidly beaten, Pandora's box will be fully open on artificial people.


I think the strong version is probably impossible to beat, because even if the machine has all the same capabilities as the human, it will probably give away its nature one inconsequential way or another, just like a foreign person usually gives away their foreign nature with some slightly unusual manner of speech.


> The strong Turing test is hard to beat but just about every conscious human being can do it.

Not a chance. Unless you're redefining large swaths of humanity to not be human. "A compelling bed time story" is something an LLM can do better than most humans.


> The strong Turing test remains unconquered for now. It remains useful, because its use is primarily philosophical. Once it's been solidly beaten, Pandora's box will be fully open on artificial people.

"Artificial"..?


The Turing test was passed by dr Eliza in the 70es. Revealing not how to model intelligence rather how keen humans are on fooling themselves.


No it wasn’t! The Turing test is not “can at least one be fooled”, it’s “can an expert judge distinguish between a human and an AI”. Please read Turing’s paper, it’s very readable and is quite clear on this point


Here is a beautiful 2 minute clip demonstrating this point and Eliza itself:

Before Siri and Alexa, there was ELIZA

https://www.youtube.com/watch?v=RMK9AphfLco

ELIZA is an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. Created to demonstrate the superficiality of communication between man and machine, Eliza simulated conversation by using a 'pattern matching' and substitution methodology that gave users an illusion of understanding on the part of the program, but had no built in framework for contextualising events.


Anyone using Eliza for more than a couple of minutes will quickly pick up on its tricks. Even in their cherry picked example it barely makes it a few messages in before the illusion starts to shatter with awkward lines like "Do you think coming here will help you to not be unhappy?". It might pass the turing test once or twice by sheer luck but it's clearly not competitive with a human conversing in general.


Sure. With today’s eyes it is obviously a primitive computer program. But back then it was convincing to point where people suggested replacing real person psychologists with this.

Likewise I think 50 years from now looking at present day ChatGPT it would terrible primitive and very unconvincing.


> In the Turing test—which Turing himself originally called the “imitation game“—human participants conduct a conversation with unknown users to determine if they’re talking to a human or a computer.

No. They talk to A and B, one of which is a computer, the other a human to determine who is who.


This is important, because it forces a head-to-head comparison. Otherwise, what knowledge the judge has about the abilities of AI matters a lot.

A well-informed 2023 person talking to ChatGPT will quickly establish that it's not a human, but I'm pretty sure a 1950s person doing the same would swear it was human, if an odd human, because they couldn't conceive of a computer being that fluent. But force them into this head-to-head scenario and they will make the right choice.


In the head-to-head comparison, what is the goal of the human talking to the judge? Are they trying to fool the judge into thinking they’re the AI? Or are they trying to make it easier for the judge to spot the AI?

Because this really matters. If you put me up against ChatGPT I’m going to talk briefly and informally, with lower case text. I’ll answer their questions succinctly without much elaboration. And I think most judges would then easily spot ChatGPT due to its formality and perfect grammar.

On the other hand, there are chat bots designed to pass the Turing test which speak informally and even ignore questions. They tend to fool people into thinking they’re a bored teenager.


> Or are they trying to make it easier for the judge to spot the AI?

The latter. Both A and B (human and AI subjects) are trying to convince the judge that they're the human.


Turing intended his test more as an intellectual argument than a practical test. Recognising that the question "can machines think?" is problematic in that it depends on your definition of those words, he thought it clearer to ask if it could be told from a human. From his paper (https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf)

>Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"

As a thought experiment I don't think it will ever be dead. As a practical test of human equivalence, I don't think it was ever thought out very well. Being written 70 years ago Turing didn't really get into the practical details. Though you could invent a modern version.


The Turing test is never dead: Just deploy the curret best chatbot as a remote consultant in a technical field and see how well it works. This is a Turing test.


As I understand it, the Turing test doesn't require the computer to convince people it's a particularly smart human.

After all, most humans when asked to explain shor's algorithm in iambic pentameter would just say no.


Alternative version: Start a long distance relationship with a chatbox and see if it is engaging.


People are already doing this

https://news.ycombinator.com/item?id=38445578

Some are spending 5+ hours a day on Character.AI chatting. I don't know what they're chatting to but I assume some chats follow a relationship format

https://www.reddit.com/r/CharacterAI/comments/14tfwue/screen...


So the Turing Test is not dead, we just need to redefine what it entails? That sounds like moving the goalposts.


The Turing test lets the examiner have whatever conversation he wants with the machine. So the examiners can ask to solve math problems, recognise shapes, interpret poetry, perform whatever task. It doesn't matter either if the machine fails or succeeds at the task, it matters if it fails or succeeds in a way that is recognisably different from humans. And if some new task comes up that the machine has a hard time with, it is only natural that the examiners are going to ask that particular task. So yes it is completely allowed to move the goalpost in the Turing test, that's why it is a hard and interesting test to pass.


Well yes, thr Turing has to be defined exactly. That is not mvoing the goalposts because I never considered the Turing test to be a quick 10 minute chat.

I am sure that Turing in his original publication soecific the conditions manner under which a Turing test should be performed. That was over half a century ago though. What are we more interested in, when talking about the concept of a Turing test, the exact version that Turing described, basically v0, or a test which captures that concept that AGI can be tested via an exchange of language? I think the latter is the much more useful and fundamental concept: The hypothesis that AGI can be tested purely via language has not been disproven for Turing tests of sufficiently large compelxity/length. Hence I find it most useful to think of the Turing test of a family of tests that have in common that they are a purely language based conversation but that may be held on different time, complexity or adversiality scales.

Why would I interpret the definition like this instead of sticking to the original paper? Because the article here makes the fallacy that the v0 Turing test not detecting AGI implies that the "AGI can be tested bia language" is in question. This is just not true.


The Turing Test was never anything but a thought experiment, developed when computers were in their infancy, and has been widely criticized by philosophers and AI researchers.

It should never have been taken as seriously as it has, just as Asimov's Three Laws of Robotics should never have been taken seriously. Both presuppose rigidly and objectively defined values for, respectively, self-awareness and morality, where none exist. If anything they reveal more about human intellectual bias, popular cultural assumptions (and possibly hubris) than they do anything about AI.

The goalposts must be moved because we still don't even know what the nature of the game is.


From the linked paper in the article:

>If it relies instead on some sort of deep learning, then the answer is equivocal—at least until another algorithm is able to explain how the program reasons. If its principles are quite different from human ones, it has failed the test.

Why would we expect our reasoning methods to be the only or even the most efficient? If it can get to the correct conclusions, shouldn't that be what matters?


> Why would we expect our reasoning methods to be the only or even the most efficient? If it can get to the correct conclusions, shouldn't that be what matters?

This has epistemological roots.

Knowledge is commonly defined as true justified belief. If I correctly predict that in exactly 10 years from now, it will be a rainy day, that is not knowledge, as while my belief ended up true it was not justified.

Ultimately it is very easy to make correct predictions, you just need to make a lot of them, which is why whenever someone makes successful predictions, we scrutinize their justifications.


> Finally, researchers would take a look under the hoods of the machines to determine whether the neural networks are built to simulate human cognition.

This doesn’t make any sense. We don’t inspect boats to see if they simulate human swimming patterns.


> Ukrainian teenager named Eugene Goostman seemed to put one of the first nails in the Turing test’s coffin by fooling more than one-third of human interrogators into thinking they were talking to another human

So the first nail in the coffin is a bot who failed to convince most people it was a human? This was during a 5 minute conversation, so like 3 or 4 short messages back and forth.

If there's anything dead about the Turning Test it's people's willingness to try it. Put 2 people and 1 AI in a chat together, tell them to identify the AI, give them an hour, give them a small reward if they are correct.


My perspective is that the Turing Test is now obsolete. Two months ago Peter Norvig and a friend of his wrote an article arguing that the goal posts have moved in measuring AI. I have been a paid AI practitioner since 1982, and when I watched my niece’s family last night passing around my iPhone running ChatGPT Pro, just the looks on their faces as they were having spoken conversations, in all matter of topics, and asking for images to be generated and refined, I felt that AI is here.

LLMs remind me of Plato’s metaphor of humans only seeing the shadows on a cave wall, not true reality. LLMs are trained on text and images and are disembodied. The learn from an information lossy projection, but still an interesting “reality.”

We are likely to get real AGI when APIs are embodied, and toss in a few new technologies and optimizations to run cheaply on edge devices.


Poor Turing must be rolling over in his grave. The so-called Turing test was common before Turing wrote his rebuttal, calling that test little more than a Gallup poll (i.e.,just a subjective opinion). He proposed an alternative test, the real Turing test, that could provide statistical evidence. In his test, observers tried to distinguish between a woman and a man pretending to be a woman. This was repeated as often as you wanted to gain statistical significance. After that, a machine is substituted for the man and repeated as before. Turing claimed if the machines and the men had the same success rate, you should conclude the machine was thinking.


At the time it was a necessary line in the sand so researchers could just get on with it. Eliza successfully passed it in the 1960s. At best, it's obsolete. At worst it focused the field on behavioralism rather than a rigorous definition of AI


A psychological test to see how strongly the profile matches human emotional-cognition? Can we call it the Voight-Kampff Test?


A vote for yes is either a vote for:

"I cannot tell the difference"

or

"I would not find it notable that I could not tell the difference"


Maybe we should not move the goal posts and accept chatbots as fellow humans.


Or we could dispense with the myth that the Turing Test has ever been any kind of "goal post" to begin with.

It's like Schrodinger's Cat, it was a pretty minor and unprincipled thought experiment, that a major thinker in the field made in a fairly informal manner. The public perception of its importance and its level of prominence in pop culture around the topic has always far exceeded its actual importance.


> Or we could dispense with the myth that the Turing Test has ever been any kind of "goal post" to begin with.

Well, it definitely was a goal post for many years—pretending it wasn't doesn't seem like a good idea.

On the other hand, now that we have chatbots that can pass it, its flaws as a test have become apparent. And as you say, it's informal, and underspecified in a way that wasn't clear when it was defined. But I also don't think it's much of a surprise that "passing" it doesn't actually mean what Turing thought it would mean back in 1950. We didn't have a good idea of what parts of intelligence are "easy" versus what parts are "hard".


> Well, it definitely was a goal post for many years—pretending it wasn't doesn't seem like a good idea.

In the sense that people treated it as one, sure. Not in the sense that it would ever have actually been any kind of important threshold for "reaching AGI".

A goalpost that's based mainly on popular perceptions from bad sci-fi probably should be moved. "Moving the goalposts" just doesn't work as an accusation here, is my point.


The Turing Test was one of the goal posts for establishing intelligence, I don't think it was one for establishing humanity.

I mean, at least some murderers are intelligent. But without an acceptable moral compass, does intelligence alone suffice to qualify them as "fellow humans"? I would say no.


Please ELI5:

  - Is AGI if a computer meets the intelligence of any human?
  - Or of an average human?
  - Or of the most intelligent human?
  - Or of cumulative total of all humans?


Average human.


The term gets used in different ways but Wikipedia has able to do any task humans can, so an above average human I guess.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: