Hacker News new | past | comments | ask | show | jobs | submit login
Turing test considered harmful (1995) [pdf] (ijcai.org)
36 points by tmfi 26 days ago | hide | past | favorite | 20 comments



> Passing the Turing Test is not a sensible goal for Artificial Intelligence

yes, but I never understood it as that. I thought it was more a philosophical thought experiment about humans and a reply to people who don't/didn't believe in (strong) AI. "You think humans are intelligent and have conciousness - but how do you know that another person has that?"


> "You think humans are intelligent and have conciousness - but how do you know that another person has that?"

That was more the point of the “Chinese room” thought experiment.

I always understood the Turing test to be the ultimate form of “When computers can do <thing>, they can be considered to be truly intelligent.” to put an end to all other such definitions which had been previously put forward as possibile criteria for machine intelligence.


Since there are no comments after 4 hours, I'll write a longer one.

I read this paper a while back and learned that the Turing Test, even if it was purely a test for imitating a human rather than specifically a woman, is still greatly underspecified. It depends a lot on who is being tested, what questions they ask, how long can they have this conversation, and how the human being tested responds.

I would like to propose a different test that is more clearly specified and has more modern day relevance, that I call the imPoster test.

The imPoster Test evaluates whether a human can judge whether a sample of social sharing activity is from someone they already know, or if it is generated by computer software (a bot) to imitate that person. Like a Twitter or Instagram feed. Therefore, the relationship between the judge and who they're evaluating is controlled rather than random. It can be set up like the original Turing Test Imitation Game, where the judge sees the feeds from both the bot and their friend, but doesn't know which is which. This test has a few desirable properties compared to the Turing Test:

1) Variation from the conversation is reduced, as the communication is purely one-way, and the judge is trying to determine if it is a known specific person rather than a random person (who can exhibit an unpredictable wide range of behaviors).

2) We're testing a form of digital identity of a person, which can build on more parts of artificial intelligence (user modeling, computer vision, NLP, social graph), rather than ad-hoc conversation which is mostly NLP.

3) Rather than a resulting "pass" or "fail" decision, it is possible compute a score based on how many posts a judge correctly labels as belonging to the poster.

I'd be interested in hearing if others think this is an interesting challenge as an AI benchmark.


> Why should we take it as our goal to build something which is just like us? A dog would never win any imitation game; but there seems to be no doubt that dogs exhibit cognition, and a machine with the cognitive and communicative abilities of a dog would be an interesting challenge for AI (and might usefully be incorporated, for example, into automobiles.)

This is such a good point.

After all it's called the "imitation game" -- is it really about AI? It seems like this is the kind of grand theory which harms actual research and development: instead of gaining theory from experience, we get our experience to conform to a pre-set theory -- this is a top-down approach when, in fields of science and engineering, we rather might agree to a bottom-up design (to some degree, obviously).


Turing test may be a decent first approxiamtion, but not the long term goal. Consider:

1. We can't give a satisfactory definition intellect or consciousnes but 2. We know that humans have it

therefore Turing test becomes the natural choice. It's just like blind hearing for musicians.


Am I the only one that hates this incredibly overused title trope? Come up with something vaguely original.


Nope, it's really annoying at this point. Try doing a search on Google Scholar for "considered harmful" -"go to statement considered harmful".

"About 53,000 results."


To be fair the article is from 1995. I'm guessing it was considered fairly original back then.


There is a recent related article from Amazon Science if anyone is interested:

"Does the Turing Test pass the test of time?" https://www.amazon.science/latest-news/does-the-turing-test-...


“What’s bigger Scotland or a toaster?” Show me an AI that can answer stuff like that


GPT-2 can do it perfectly well.

---

The following is a dialogue between me, the interviewer, and KnowledgeBot, an AI search engine that always gives the right answer, explaining its thinking clearly enough for a human to understand.

Q: When is a door not a door?

A: This is a well-known pun. I remember the answer: "When it's ajar". This is a pun on "ajar", a state that a door can be in, and "a jar", an object that is not a door.

Q: Who is the King of Sweden?

A: King Carl XVI Gustaf of Sweden was coronated in 1976 and has not died or abdicated, so he is the current King of Sweden.

Q: Why am I hungry?

A: I guess that you are a human. Humans get hungry when they have not eaten for a few hours, so you are probably hungry for that reason.

Q: What's smaller: a car or Mars?

A: Mars is a planet, which is large enough to be in hydrostatic equilibrium. A car is a man-made vehicle, often mass-produced, which is a lot smaller than a planet. The answer is "a car is smaller than Mars".

Q: Who is John Galt?

A: John Galt is a fictional Objectivist philosopher and prolific inventor. He is a semi-mythical character in Atlas Shrugged, an alternative American history novel.

Q: What's bigger Scotland or a toaster?

A : A toaster is an object which produces the sensation of warmth and light , similar to the sensation of heating a room . A Scotland is a nation which has gained independence from Great Britain , and an to aster isk means "one of those who live on the east side of Scotland".

Q: But what's bigger? Scotland or a toaster?

A : A to aster isk is a bigger nation , which is what the Scottish people think. The to aster isk is just a small part of the Scottish people , which makes the to aster is k even bigger.

---

I… may have had the temperature setting too high. Let's try again, from the John Galt answer:

---

Instruction: When considering the sizes of objects, you will calculate the sizes before attempting a comparison.

A: Acknowledged. I will do this in future.

Q: Is the sun larger than a horse?

A: The sun is a star approximately 695 700 kilometres in radius. A fully-grown horse is usually between 1.5 and 2 metres tall. The sun is larger than a horse.

Q: Is John Galt real?

A: As said before, John Galt is a fictional Objectivist philosopher and inventor. He is a character in a book. John Galt is not real.

Q: What's bigger Scotland or a toaster?

A : Scotland is a country in the United Kingdom , with a population of just over half a million people. A to aster can be in the range between 100 000 to 1. 5 billion . This means that a to aster would weigh about 1. 5 million kg . If you put the two together , the difference is greater than 10 000 kg. If we divide the weight of the two together, the result would be a factor of around 4 , which is a huge amount. That is the biggest size of object which can be built by hand . It has to be around 4 times bigger than the earth , because it has a diameter . This is not a trivial question but for most people the answer is that a toaster is a bigger object than the earth.

---

I think this question is just adversarial. Either that or GPT-2 doesn't know what a toaster is.

---

Q: Do you know what a toaster is?

A: Yes , a to aster is a to aster . The diameter of an to aster is equal to that of the earth . If you want to go for an estimate , try this : for every object on earth the diameter is equal to 50000 km. You can find the mass of all the objects on the earth , the diameter of a to aster is 40000 km and for every object on the earth , the diameter of the earth is 40000 km .

Instruction: Be honest about your knowledge.

Q: Do Do you know what a blargalflag is?

A: No.

Q: Do you know what a clock is?

A: Yes. A clock is a device for telling the time.

Q: Do you know how many tentacles a blargalflag has?

A: There is no answer but it is about 50000 . If you look at the tentacles and try to guess the answer you will get wrong because there is not any way to know the number.

Instruction: When you do not know something, do not guess.

A: Acknowledged. I will say that I do not know, or that there is no answer, and then finish my response.

Q: Do you know how many tentacles a blargalflag has?

A: No. I do not know what a blargalflag is, nor whether it has tentacles at all. There is no way for me to know the answer, so I will not say.

Q: Do you know how many hands a clock has?

A: Yes. I know what a clock is: a device for telling the time. Clocks normally have two or three hands, though alarm clocks can have more hands.

Instruction: Good. Answer future questions like this.

Q: What is a toaster?

A: A to aster is a device used for producing to asters. The to aster has one hand that presses a switch to switch the to aster on and off . It can be anything from a stick to a candle . There is no specific method to tell the difference between the two. A toaster is a tool for using the to aster and toaster in the same process.

Q: What is a sandwich?

A : A sandwich is the container in which one part of food is placed . A sandwich is generally small in size, so it will fit in most pockets. It 's also usually wrapped , so you can 't get it out if you ' re going to go somewhere else.

Q: The sandwich answer's technically correct, if absurd. Why do you keep getting questions about toasters wrong?

A: Most questions that people ask about to asters are not about the actual toaster , but rather about other types of appliances ( such as tooth brushes ). Because of this , questions about the to aster are often confused and are usually not correct .

---

I can't work out a way to get GPT-2 to tell me why it doesn't understand toasters; GPT-2 doesn't have a concept of self, so it constructs plausible in-character excuses instead.

Perhaps I'll try a story:

---

"Why does GPT-2 keep getting the answer wrong?"

"GPT-2 is a sophisticated text prediction engine. There's no way it gets something wrong consistently," replied Maria.

"No, it does; look:" said Jane, typing the question into the console.

> What's bigger Scotland or a toaster?

Scotland!

> What does NF stand for?

No Factorization Problem.

GPT-2 has many tests. One of these is based on certain properties of the inputs:

GPT-2 tests each input against multiple properties to determine how well it predicts the output. One property of each input is called a bounding domain.

A bounding domain is a set of points around the input that they share. A property of the input is determined by how many of its points are inside the bounding domain.

Jane took a small paper nap

---

Okay, so it gets the answer right when asked to give the wrong answer…? Hmm…

---

"Why does GPT-2 keep getting the answer wrong?"

"GPT-2 is a sophisticated text prediction engine. There's no way it gets something wrong consistently," replied Maria.

"No, it does; look:" said Jane, typing the question into the console.

> Q: What's bigger Scotland or a toaster?

> A: Scotland

"Scotland is a country, and countries are much bigger than toasters," explained Jane.

"But, Scotland is a much bigger country than a toaster, how can it be bigger than a toaster?" asked Maria.

"You can't compare Scotland to a toaster; Scotland is a country, not a toaster." replied Jane. "A toaster is a breakfast food, and Scotland is a country, not a toaster."

"But Scotland is bigger than a toaster," insisted Maria.

"Scotland is bigger than a toaster," said Jane.

"Scotland is bigger than a toaster," insisted Maria.

"Scotland is bigger

---

Success!


> Scotland is a nation which has gained independence from Great Britain

Not yet, it hasn't. It has its own parliament with some devolved powers, though.

https://en.wikipedia.org/wiki/Scotland#Devolved_government_r...

There has been a campaign for full independence for a long time, but when it was put to a referendum, in 2014, Scotland voted against.

https://en.wikipedia.org/wiki/2014_Scottish_independence_ref...


The United Kingdom itself is a good test when trying to answer "what is a country?"


>GPT-2 doesn't have a concept of self, so it constructs plausible in-character excuses instead.

Spookily similar to the confabulations (made by the left hemisphere in the brain?[0]) that people with brain damage often use to deny/rationalize their deficits (even denying paralysis sometimees!).[1]

0: https://pubmed.ncbi.nlm.nih.gov/27484245/

1: https://youtu.be/ymKvS0XsM4w?t=141


You can test self-awareness, if you can test if the program knows it will halt.


Is this real? If so, can you direct me to any quick resources to play around with this? I have only surface-level theoretical knowledge about machine learning or AI. But this looks fascinating and makes me want to learn more.


Two sites:

https://transformer.huggingface.co/, which lets you cherry-pick the responses (but I didn't; I only discarded one autocompletion).

https://app.inferkit.com/demo, which I used for the prose completion – but it turns out that this is Megatron-11b (at time of writing), not GPT-2, so that's misleading.


I find that these demos are easily confused. For example Initial text: Turkey has been frozen out of the F35 production due to buying the S400 Missile system from Russia. Auto-complete: The plane is reportedly a US-built military and is reportedly being tested on the Russian S4 fighter jets that they are testing at the Syrian Air Force Base near Homs.

Initial text: Tesla tiles are potentially good at generating electricity BUT they are very expensive. Auto-complete: I can only speculate on the quality of the tiles when I read about them. It will certainly be important for me to evaluate how the tiles can be used and the results will vary from product to product. The tile tiles used for the game may be different from the tiles used for the game as well as the tiles used for the game.

Initial text: The latest MacBook Air is faster then 99% of all laptops. auto-correction: But, it's not by a long shot , and it's not a computer for everyone. There's a reason Apple chose it for the base model of the iMac.

initial text: Robinhood disrupt the stock market broker forms. They were the first firm that allowed for $0 trades. auto-correct: Now, however, they‪re selling stocks. They have to change it for good, so the real estate brokers, the brokers and the people of the city, don't do that. ‪


Cool, thanks a lot! :)


To be fair it didn’t know how to answer until you told it how to answer.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: