Hacker News new | past | comments | ask | show | jobs | submit login
Giving GPT-3 a Turing Test (lacker.io)
453 points by DavidSJ 71 days ago | hide | past | favorite | 234 comments



Nic Cammarata on Twitter pointed out that if the prompt gives GPT-3 permission to indicate questions are ridiculous, it'll do so reliably:

https://twitter.com/nicklovescode/status/1284050958977130497


More generally, the best commentary I've seen on GPT-3 comes from following @gwern and keeping an eye out for whoever else they frequently interact with.

https://twitter.com/gwern/with_replies

Most other commentary I've seen either goes too far into reactionary skepticism, too far into to-the-moon style hype, or just straight-up gets stuff wrong by not being aware of some technical detail (importance of prompt design, BPE limits).


At least it is witty:

Andrej Karpathy @karpathy

> By posting GPT generated text we’re polluting the data for its future versions

John Carmack @ID_AA_Carmack

> Was just about to say that... Old printed books will be the equivalent of the sunken ships they cut up for uncontaminated steel today.


> Old printed books will be the equivalent of the sunken ships they cut up for uncontaminated steel today.

It's not so much contamination, but value drift.

Did you ever try reading Shakespeare without commentary? So many things of that language are no longer colloquial. Maybe generated text can recover that and train us.

Or imagine the "dark tongue" from Paul Kingsnorth' "The Wake" to have been real: we've come quite a bit in expressiveness.


>Old printed books will be the equivalent of the sunken ships they cut up for uncontaminated steel today

Already are.


What are modern books contaminated with?


That’s true. In general, if you customize the prompt for a specific type of question, GPT-3 will be better able to handle that type of question. Prompt size is limited, though, so if you are attempting to create a general-purpose chatbot out of GPT-3, you won’t be able to handle every archetype of question in the prompt.


Came here to say this. In the prompt there was no example of rejecting a question for being nonsensical.


Humans don't need a prompt to do this.


Children often answer questions nonsensically. They are prompted over years to be serious and answer questions correctly.


I am not sure that's the same.

Children answer nonsensically to normal questions, but if you ask a nonsensical question they are more likely to go "uh?" than to answer with logically built follow-up nonsense.


What would the ethics be of using children in Turing tests? If you asked a human child to answer Turing test questions, would the judges think of them as a computer?


Yes, they are trained over time, just as this model has been.


Humans have much more context at their perusal than a simple prompt.


That depends on context. Nobody would answer "this question makes no sense" in an exam, instead most people assume they lack knowledge of what's being referenced. Even humans have to expect the possibility of nonsensical questions


"None of the above" is sometimes the right answer on exam questions.


So think of this as adding a "none of the above" option.


Humans don’t need AC power to operate either, but presumably these aren’t the kind of differences we’re interested in.


It depends whether you think the other party would prefer for you to play along or call them out on their BS. Between friends we generally prefer the latter.


True. Maybe the human corollary to this is sarcasm. Lots of people fail to pick up on sarcasm in online discussion, hence the /s at the end of a post. And for that matter, natural language parsers really struggle at extracting real meaning from sarcastic texts too.


Thanks for the tip.

I incorporated this into the GPT-3 based search engine I'm hacking right now: https://twitter.com/paraschopra/status/1284905727388028928


What does ‘yo be real’ stand for?


"be real" for being realistic, asking a better question that is not nonsense.

Loosely translated: "yo be real" = "Hey, please be more realistic with your future questions"


Ah sorry I thought it was some technical thing!


"be real" when spoken by humans means "be yourself" not "be realistic".


Depends on the context. But without context I’d read it as “be honest” or “tell the truth” first. So in the context of answering a question it pretty clearly means “ask a question I can realistically answer”.


Out of curiosity how old are you?


It's an Americanism.



Wow, I wonder if it would make a good bullshit detector?


I'm sure I'm not the only one who knows people that can't say "I don't know", to the point they're not far off from the first example.


A great demonstration of why in-distribution learning is not enough for AGI.

The largest GPT-3 model "GPT-3 175B" is trained with 499 Billion tokens (one token is maybe equal to 4 characters in text).

Human reading/talking/listening equivalent of 200 pages of text per day for 80 years would be just 13GB of raw data or 3B tokens. You could also make an estimate using 39 bits/s as the normal information rate humans can absorb and get the same order of magnitude estimate.

It's not completely wrong to say that we are figuring out how far interpolation from data can go as a learning method. Even the most advanced deep learning is in-distribution learning and system 1 thinking (Kahneman's term). Just run through data until the model can interpolate accurately between data points.

We must figure out how do learn models that allow out-of-distribution learning or these models fail the Turing test after a few questions.


I don't think it makes sense to compare human learning to GPT-3 learning: it's a fundamentally different process. Human brain doesn't get just tokens, but also other sensory data, particularly, visual.

So I don't think that you can conclude that humans learn more efficiency based on just quantity of data.

It's also worth noting that GPT-3 is trained to emulate _any_ human writing, not just some human's writing.

For an actual Turing test one might fine-tune it on text produced by one particular human, then you might get more accurate results.


> I don't think that you can conclude that humans learn more efficiency based on just quantity of data.

You are correct. My main argument was that in-distribution learning is not enough. You can's fix that problem with more data as many responses to my comment seem to assume.

I think out-distribution leaning and small data requirement are connected. If agent can understand the concept separate from the chain leading from sensory inputs, it can understand what it's doing in novel situation even without examples.


Elements used in GPT-3 are capable of transformations such as abstraction (e.g. separate the structure of syllogism from concrete nouns) and logic (ReLU can directly implement OR, AND, NOT which is sufficient to do arbitrary logic).

We can see that it actually uses abstractions and logic in some cases.

E.g. "Bob is a frog. Bob's skin color is ___". Even small GPT-2 models can relate "Bob" to concept "frog" and query "frog" "skin color" attribute. Even basic language modeling requires inference, and GPT-x can inference using transformer blocks.

With more layers it can go from inferencing meaning of words to doing inference to solve problems.

But the inference it is able to do is limited in scope because of the structure of a language model -- each input token must correspond to one output token. So the model can't take a pause and think about something, it can only think while it produces tokens.

Here's an absolutely insane example of embedding symbolic computation into a story which lets GPT-3 to break computation into small steps it can handle. Intermediate results become part of the story: https://twitter.com/kleptid/status/1284069270603866113

https://twitter.com/kleptid/status/1284098635689611264

So I guess one can make a model which is much better at thinking simply by training in a different way or changing the topology. But the building blocks are good enough.


The representation GPT-3 learns are flexible. It can learn very complex tasks, including logic and simple arithmetic. The issue is GPT-3 learning algorithm and it's learning capability.

Typically the ability to learn << ability to represent in most cases. Universal approximation theorem type results don't say anything about learnability. GPT-3's abilities quickly fade once it gets outside the distribution it was trained with.


> You can's fix that problem with more data as many responses to my comment seem to assume.

From my reading, you've asserted this to be true, but haven't given any more support to why then the people who have asserted it to be false.

GPT-3 isn't a human brain. I want to understand your argument as more than "planes can't fly because they don't flap their wings".


Yeah that’s what I was thinking when he started with the nonsensical questions, like “how many eyes does a foot have?” No (non-blind) human would need language input to learn this fact. Makes me wonder if anyone is working on large-scale architectures that can ingest multiple types of data and correlate them to make predictions.


Interesting to consider tho...there must be some sort of power threshold. Above which all our questions can be merely "interpolated" from data. Right now 499B tokens is not it. But there must be, I guess, some upper limit, within which all human knowledge expressible through language can be contained and conversed upon using this method. Pretty scare to think about...that at that point, when it's 10^n tokens or whatever, we would be unable to detect if it understood or not.

Even more scary, what if our brains are simply above that power limit in their ability (if they do that) to interpolate. And what if we don't really understand anything (but, just like vision, our brains provide us the comforting sensation that we really get it), but simply are in possession of wetware/quantum computers that can interpolate better than we can poke holes in it.


There's no evidence that human intelligence is anything more than associative memory and very elaborate multi-level pattern matching. IMHO any cognitive task can be described as a combination of associative lookups, domain transformation and trial-and-error stateful processing. I don't see any reason to believe there's more to intelligence/consciousness than a combination of such elements/processes.

I guess domain transformation is something which is hard to visualize. But it was visualized in style transfer: there are NNs which can decompose a picture into a subject and style and then take the same subject and apply a different style.

Same works with text -- you can take a sentence and rewrite it to be in Victorian epistolary style. Or take a story about humans and rephrase it to be a fairy tale with animals. GPT-3 can also do this. This means it's possible to take a sentence and decompose it into different layers, then manipulate layers individually and reassemble.


Then you probably don't know what 'consciousness' means, because none of these processes give an explanation as to why or how qualia arise.

Also, "trial-and-error stateful processing" is so vague and broad that I don't feel that it meaningfully describes anything more than 'computation'.


There's no evidence that qualia are anything more than clusters within some vector space which is useful to describe sensory inputs. They arise because they are useful for making sense of the external world.

I know that philosophers like to believe in some mystical bullshit about a whole different category of things, but if you believe in evolution, things are simple. An animal receivers a sensory input, it tries to use it to improve its survival chances. It usually makes sense to tell signal from noise, by clustering and transforming it. You cannot derive any useful information from a single vibration of air, but if you transform from time domain to frequency domain, you can observe that certain frequencies relate to information about outside world, such as presence of predators, etc. If you do several layers of such transformations you arrive to qualia.

If you study signal processing and NNs, these things become obvious. If you give a signal processing guy a task to detect human speech, for example, he will filter frequencies, then estimate loudness and compare to background noise. If you train NN, it will do the same -- you will likely get a neuron which represents "loudness in human speech". Same if you train a computational process using evolutionary process: loudness in specific frequency range carries out useful information, so no matter what process you use, you will have some representation of this quality.

Same with "color red" or whatever other qualia you can think of -- it's just a region in a space which arises from useful transformations of the incoming signals which maximize useful information.


>There's no evidence that qualia are anything more than clusters within some vector space which is useful to describe sensory inputs.

I'm sorry, but this just betrays that you have no clue what qualia refers to. It's about the _subjective experience_ of interpreting data. Why don't we have 'smell experience' for visual data and a 'visual experience' for smell data? Why aren't our interpretations of red and blue switched (so that we interpret red visual data as a blue visual experience and blue visual data as a red visual experience)? Hell, why do we need visual experience to act on it at all?

Saying "they arose because they're useful' or trying to reduce it down to cluster analysis does absolutely nothing to explain qualia. There is absolutely no evidence that a NN neuron (or set of them) that can detect a certain trait such as loudness also has subjective experience. Frankly, we essentially know absolutely nothing about subjective experiences except that we ourselves have them.

>I know that philosophers like to believe in some mystical bullshit about a whole different category of things, but if you believe in evolution, things are simple

I genuinely recommend reflecting on this statement. You're essentially saying that you can solve in a single paragraph and using only knowledge that a CS undergrad might have (!) a problem that professional philosophers have grappled with for decades. Do you really think that is more likely than you simply not understanding the problem?


> Why don't we have 'smell experience' for visual data and a 'visual experience' for smell data?

Many people do. It's called synesthesia.

But usually vectors which have different meaning are not mixed together. While you can describe sound using a vector and you can describe a picture using a vector, adding these vectors makes no sense, it's not useful. So we just don't have pathways which mix data of different kinds. Except that sometimes there's mixing on some level, so one can talk about a blue sound and red word, for example.

> There is absolutely no evidence that a NN neuron (or set of them) that can detect a certain trait such as loudness also has subjective experience.

Not true. If we observe a correlation between a human interpretation of a signal and ANN neuron, then we can say that this neuron _represents_ the quality a person can talk about. That is, there's a correspondence between a neuron and experience human has. One implies another, so they must represent the same thing.

> You're essentially saying that you can solve in a single paragraph and using only knowledge that a CS undergrad might have (!) a problem that professional philosophers have grappled with for decades.

I do. What I'm saying is that professional philosophers specialize on creating various fictional concepts which do not exist in the real world and describe various problems with these stuff. So the thing is, there's no real problem with qualia -- it's a fictional concept.

Suppose I tell you that I've been working on a problem of relating foobla to brantoblaze for decades. If I cannot describe what things things correspond to real world, it's a fictional problem which exists only within my brain. It cannot be solved, but also, doesn't need to be solved.

Another example: Philosophers have been thinking about a nature of language for decades. Wittgenstein came up with revolutionary idea that language is used to communicate useful information between individuals. Any CS undergrad can design a system and a language which communicates useful information between instances of the system: he doesn't need decades of philosophical thought because he already understands concepts such as 'state' and 'information', which are sufficient.

> Do you really think that is more likely than you simply not understanding the problem?

It's far more likely that philosophers do not understand signal processing, vector spaces, ANNs and so on.


> Why aren't our interpretations of red and blue switched (so that we interpret red visual data as a blue visual experience and blue visual data as a red visual experience)?

What would that even mean? Is it based on some underlying assumption, that we have some hardwired, a priori subjective experience of red and blue colors and later we only associate these with perceptions of blood, roses, ripe apples and sky, water accordingly?

That might be just an illusion and our color perception is learned, so blue is just the color of sky and water, and nothing more.

> Why don't we have 'smell experience' for visual data and a 'visual experience' for smell data? Some people do (synesthesia), but generally lack of such experience mixes can be explained by different part of the brain getting different inputs, and impossibility of e.g. auditory stimulus to generate the same response as seeing red color would do.


>Is it based on some underlying assumption, that we have some hardwired, a priori subjective experience of red and blue colors and later we only associate these with perceptions of blood, roses, ripe apples and sky, water accordingly?

The point is that we don't know. We know that we have a subjective experience of an image, where 'red' parts correspond to visual red light stimuli, but it's impossible for me as an individual to know what your subjective experience of an image looks like. If someone 'smelled' subjective images it would be impossible to tell the difference as long as they still 'smell' red light as red.

>That might be just an illusion and our color perception is learned, so blue is just the color of sky and water, and nothing more.

But where does that 'blue' experience come from?

>Some people do (synesthesia), but generally lack of such experience mixes can be explained by different part of the brain getting different inputs, and impossibility of e.g. auditory stimulus to generate the same response as seeing red color would do.

Sure, I'm just asking why there are different experiences to begin with. Why do we experience smell, sound, etc. the way we do?


Sure, my answer to that is that subjective experiences are learned in a chaotic process, so location of neurons representing the same color might be in different parts of our visual cortices. At the same time we learn patterns that exist in the real world, so our experiences will most probably create similar associations, e.g. associating mint taste with green or strawberries smell with red.

> But where does that 'blue' experience come from? It is learned by having interactions with blue things and trying to find patterns, common qualities about these, and such association of possible outcomes creates a subjective concept, a qualia of "blueness".

> Sure, I'm just asking why there are different experiences to begin with. Here I don't have good explanations. Maybe it's matter of different wiring in our brain, maybe it comes from the fact, that you can predict how things are going to look to your left eye by looking on them with your right eye or how things are going to feel touching your face after touching them with your hand - allowing you to create abstractions between such perceptions, that are not transferable across senses.


> but it's impossible for me as an individual to know what your subjective experience of an image looks like

This smells like a god of the gaps or argument from ignorance type of argument to me. And it's not impossible, just difficult to execute. All we would have to do is digitize your brain and swap out the responsible part, then you can form a memory of that, swap the original back in and do a comparison. Or something like that. All handwavy science-fiction of course since we currently lack detailed knowledge how this works, but that does not imply there's anything special about it, only that one human might be processing the data slightly differently than another human. The same way that one human may assign a different most-common-meaning to an ambiguous word than another.


>This smells like a god of the gaps or argument from ignorance type of argument to me

huh? Are you saying that 'we don't know how subjective experience works' is a argument from ignorance?

>All we would have to do is digitize your brain and swap out the responsible part, then you can form a memory of that, swap the original back in and do a comparison.

I'm not sure what that is supposed to mean. What is 'the responsible part' and what would swapping it out achieve? I'm still only going to have my subjective experience.


> huh? Are you saying that 'we don't know how subjective experience works' is a argument from ignorance?

I'm saying that the lack of knowledge does not imply that there's anything special about it that wouldn't also arise naturally in an NN approaching even animal intelligence.

> What is 'the responsible part' and what would swapping it out achieve? I'm still only going to have my subjective experience.

I assume that "subjective experience" has some observable consequences, of which you can form memories. Being able to swap out parts of a brain will allow you to have a different subjective experience and then compare them. It is an experimental tool. I don't know what you will observe since that experiment has not been performed.


>Only that there's something special about subjective experience that wouldn't arise naturally in an NN approaching even animal intelligence.

That isn't at all what I've said. I'm saying that 'qualia' exist and that we have no clue how they arise. Maybe they arise from complicated enough systems, maybe they don't. Hell, maybe panpsychists are right and even a rock has some sort of consciousness. My issue is with people who are confident that a big enough NN necessarily has consciousness.

>I assume that "subjective experience" has some observable consequences, of which you can form memories. Being able to swap out parts of a brain will allow you to have a different subjective experience and then compare them. It is an experimental tool. I don't know what you will observe since that experiment has not been performed.

Unless you presuppose that there is some part that completely determines subjective experience (I don't think it'd even be possible to identify such a part if it existed), I don't see how that would work. Yes, you can swap out a part and see that your subjective experience changes, but this tells you nothing about the subjective experience of others.


> I'm saying that 'qualia' exist

If by qualia you mean slight differences in information processing in human brains, then sure. If you mean anything more than that I would like a) a better definition than the one I have given b) some observational evidence for its existence.

> My issue is with people who are confident that a big enough NN necessarily has consciousness.

Not necessarily, just potentially. After all there will be many inefficient/barely-better-than-previously/outright detective big NNs on the path to AGI.

If you're asking whether an intelligent NN will automatically be conscious then it depends on what we mean by "intelligent" and "conscious". A mathematical theorem prover may not need many facilities that a human mind has even though it still has to find many highly abstract and novel approaches to do its work. On the other hand an agent interacting with the physical world and other humans will probably benefit from many of the same principles and the mix of them is what we call consciousness. One problem with "consciousness" is that it's such an overloaded term. I recommend decomposing it into smaller features that we care about and then we can talk about whether another system has them.

> Hell, maybe panpsychists are right and even a rock has some sort of consciousness.

If we twist words far enough then of course they do. They are following the laws of physics after all which is information processing, going from one state to another. But then all physical systems do that and its usually not the kind of information processing we care that much about when talking about intelligences. Technically correct given the premise but useless.

> I don't think it'd even be possible to identify such a part if it existed

We're already making the assumption we have the technology to simulate a brain. If you have that ability you can also implement any debugging/observational tooling you need. AI research is not blind, co-developing such tooling together with the networks is happening today. https://openai.com/blog/introducing-activation-atlases/


>If by qualia you mean slight differences in information processing in human brains, then sure. If you mean anything more than that I would like a) a better definition than the one I have given b) some observational evidence for its existence.

Subjective experiences i.e. how I actually experience sense data. There is no real, objective observational evidence and there can't be. How would you describe taste to a species of aliens that understands the processes that happen during tasting, but don't taste themselves? It's simply impossible. I know that I have personal, subjective experiences (the 'images I see' are not directly the sense data that I perceive), but I can only appeal to you emotionally to try and make you believe that it exists operating under the assumption that you too must have these experiences.

>One problem with "consciousness" is that it's such an overloaded term. I recommend decomposing it into smaller features that we care about and then we can talk about whether another system has them.

This entire discussion has been about consciousness in the philosophical meaning i.e. the ability to have some form of subjective experiences.

>If we twist words far enough then of course they do. They are following the laws of physics after all which is information processing, going from one state to another. But then all physical systems do that and its usually not the kind of information processing we care that much about when talking about intelligences. Technically correct given the premise but useless.

This isn't about twisting words, some people genuinely believe that everything is conscious with more complex system being more conscious.

>We're already making the assumption we have the technology to simulate a brain. If you have that ability you can also implement any debugging/observational tooling you need. AI research is not blind, co-developing such tooling together with the networks is happening today

The point is that it's about _subjective_ experiences.


Fish tastes like fish because the taste is a categorizing representation of that sensory input.

What you can do is today is start with a feature map. We can do that with colors https://imgs.xkcd.com/blag/satfaces_map_1024.png (do you perceive this color as red?) and we can do that with smells https://jameskennedymonash.files.wordpress.com/2014/01/table... That's a fairly limited representation but words are an incredibly low-bandwidth interface not suitable to exporting this kind of information in high fidelity, so we can't. That does not mean it's conceptually impossible. If you wanted to export subjective experience itself then you'd need the previously mentioned debugging interface. Our brains don't have that built-in, but software does. I.e. a program can dump its entire own state and make it available to others.

To me subjective experience seems to be an intermediate representation, deep between inputs and outputs, and due to the various limitations we're bad at communicating it. That doesn't mean there's anything special about it. It is a consequence of compressing inputs into smaller spaces in ways that are useful to that entity.

> This isn't about twisting words, some people genuinely believe that everything is conscious with more complex system being more conscious.

Anything that interacts with the world will have an internal, idiosyncratic representation of that interaction. Even a rock will have momentarily vibrations traveling through it that carry some information about the world. One of today's NNs will have a feature layers that roughly correspond to concepts that are of human interest. They're often crude approximations, but it's good enough for some use-cases. Animal brains just have more of that.

So in that sense, sure, it's a continuum. But there's nothing mysterious about it.


>Fish tastes like fish because the taste is a categorizing representation of that sensory input.

Yes, but why does the fish taste have the taste it does? Hell, try explaining what fish tastes like, without evoking similar tastes.

>What you can do is today is start with a feature map. We can do that with colors https://imgs.xkcd.com/blag/satfaces_map_1024.png (do you perceive this color as red?) and we can do that with smells https://jameskennedymonash.files.wordpress.com/2014/01/table.... That's a fairly limited representation but words are an incredibly low-bandwidth interface not suitable to exporting this kind of information in high fidelity, so we can't. That does not mean it's conceptually impossible. If you wanted to export subjective experience itself then you'd need the previously mentioned debugging interface. Our brains don't have that built-in, but software does. I.e. a program can dump its entire own state and make it available to others.

But a feature map doesn't tell you anything about how the space itself works. If you look at that smell graph, you'll see that it uses comparisons, because it's literally impossible for us to explain what smelling is like without saying "well, it's similar to smelling x". Someone who is born without smell could memorize that chart, understand everything there is about smelling, but he wouldn't actually know what it's like to smell.

>To me subjective experience seems to be an intermediate representation, deep between inputs and outputs, and due to the various limitations we're bad at communicating it. That doesn't mean there's anything special about it. It is a consequence of compressing inputs into smaller spaces in ways that are useful to that entity.

We're not just bad at communicating it, but we're bad at understanding it, because our conventional means of measuring things doesn't really work for subjectivity. I'm not saying it's "magical", but it's not certain that we even can potentially build tools to interact with it.


> But a feature map doesn't tell you anything about how the space itself works.

The space is what is doing the work. Of course it's vastly more complex than a simple image with a few regions painted into it. There are only implementation details below it. The issue is that we cannot import and export them. With software that is a wholly different matter and they be transplanted, fine-tuned, probed and so on.

> but it's not certain that we even can potentially build tools to interact with it.

I agree that this is all very speculative, we don't have the technology and it can take a long time until we can actually inspect a human brain. But we may be able to do the same much easier to artificial intelligences, once created.


"I know that philosophers like to believe in some mystical bullshit about a whole different category of things"

I've found that it's the people who are most contemptuous of philosophy that could benefit most from taking a few good introductory philosophy courses, so they can finally realize that an enormous part of their own (often unconscious) understanding of and attitude towards the world derives from long established philosophy.


I didn't mean to disparage philosophy, but just point of that thinking pure in terms of abstract concepts is often unproductive.


Yeah I think we need to maintain this distinction between a scientific understanding of the world and what human beings actually are, and the abstractions we invent in order to ensure we don’t destroy ourselves as a species. I firmly believe that without the latter we are doomed, which is why I intentionally deceive myself with ideas like the existence of a singular benevolent unifying presence (often referred to as “God”) in order to shape my own behavior.


what we need psychologically to survive is not the same as what is really there. Qualia is a bullshit word like trying to say there is a difference between the numeral '1' and the word 'one'. I do not dispute the fact that we are better people when we recognize that there are intelligences/forces greater than us. Presuming that our current science is 'good enough' reminds me of the time when a prominent physicist said that 'all that remains is to add decimal points of precision to our measurements'.


>Qualia is a bullshit word like trying to say there is a difference between the numeral '1' and the word 'one'

I'm not sure what this is supposed to mean. Are you saying that sense data is the same as the subjective experience of it?


Data comes in, response is determined by the associations built up in the brain both over the subject’s own life time and all of evolution.

I don’t see the need to add some mystical third component, sounds to me like philosophers just made up something to talk about because they don’t get paid unless they have something to talk about.

From wiki:

Examples of qualia include the perceived sensation of pain of a headache, the taste of wine, as well as the redness of an evening sky.

Nothing here seems to be above and beyond simple developed response to stimuli, same as a meal worm.


Yes, we collect sense data, process it and then respond to it, but we also have a subjective experience. We 'see an image in our minds' as part of processing it, but why and how? Something like the 'The taste of wine' isn't a response, but a subjective feeling. Why do we have a subjective experience of pain instead of reacting to it?

To be honest though, if you're actually interested in discussing this your comment really isn't conductive to fruitful discussion (especially the part denigrating philosophers). If not, why even respond?


Regarding “the taste of wine”. I have for uninteresting reasons never gotten drunk, never trained my brain to associate positive feelings with wine or alcohol. As a result, wine for me subjectively is elaborate fruit juice. When I am given a fancy wine, I can taste the components people describe, but the subjective experience they have of the amazing taste I do not share, because I simply don’t have those associations. Similarly for beer my subjective experience is an association of disgust.

Those of us who believe neurons are all there is are simply arguing everything is just abstractions and associations, just neurons firing. What you call a subjective experience is a particular set of neurons firing as a response to stimuli. There is nothing more than that, and while you can form an abstraction in your brain that describes something more than that, this would be an example of an improperly trained neural net.


> especially the part denigrating philosophers

Quite the opposite, really. I simply refuse to put them on a pedestal by making arguments like you did saying, essentially, “philosophers have been saying this is complicated for hundreds of years, therefore it must be”. Which, as other have stated, is not a logical argument but a faith based one. It’s really quite the same as my friends HS theology class, in which a correct answer to “what evidence do we have of the existence of god” was “thousands of years of belief and study by renowned theologians”, aka. bullshit.


Isn’t he just saying that qualia is in the same category as God? It seems like an abstraction we created to reason about and hopefully solve some social coordination problem, but not an essential part of reality.


Because when you are sensing something your brain tries to find associations with past experiences in your memory and builds an "image" from that, and because peoples have different experiences in their memories it becomes subjective and a little different for everybody. Also when it's something very different from your past experience the mix of the closest memories can be weird and seemingly unrelated.


Cool. I don't like to include consciousness in this reduction, I consider that something sacred. The field that permeates all of us and everything. The thinking part of it, I think there's a chance it could be as you describe, with our brains coming with algorithms (or the ability to form similar algorithms) that do domain transformation, stateful processing, associate lookups, memory. I like to think of the consciousness, as simply being aware of this, and I guess "gently guiding" it through intention. It's what tunes us into our thinking and let's us perceive it. But the thinking could occur without that "supervisor". But perhaps somehow consciousness is the spark which gives us life, because when it leaves the body, the body and brain no longer function, or at least no longer animate. And consciousness can seem to exist within and outside the body.

Maybe what we call intelligence, and what we are trying to create with AI is just "how a brain" thinks. That version of AI definitely seems achievable, it's a;sp interesting to think what consciousness is, when it seems that people have been able to perceive things outside their body, while it was dead, and then return to that body upon it being revived, and being able to bring verifiable details back with them...Or a consciousness being able to recall things inbetween incarnations in bodies....Those impressions must involve pattern matching and thinking as well...But there's no substrate it seems. Just the universe itself. Which is pretty bizarre...Makes you wonder why our brains need to be so complex at all, if our souls can think and remember too...

Perhaps even more scary and bizarre (and not what I believe) but what if having a soul (as in some consciousness part that exists separate to the body) is actually real but not common. So the strange anecdotes of these occurrences are not more widespread simply because, for whatever reason, most bodies don't have souls...I'm sure it's not like that...but since we're speculating, may as well take this to where it goes, which is, if that's the case then consciousness (which we all have, otherwise the body is anesthetized), and the soul (which it seems some people have, owing to how they have disembodied memories that are factual) might be separate.

So if the body has consciousness that's entirely embodied, maybe AI's can have consciousness as well. Then what's the soul? And what is the soul doing when a person's brain and consciousness is doing the thinking? Or if we all have a soul (a disembodied consciousness) why can only very few people seem to have disembodied memories?

If we make the right AI substrate for consciousness, will it attract souls, like moths to a light?


It's even more interesting from the perspective of a non-native english speaker who learned reading and speaking (english) in a mixed fashion. From my personal experience my quickest progress was not made during the systematic learning from school/univ etc, where grammar / structures were taught, but rather from watching movies/conversation/gaming etc. In those scenarios I didn't "need" to have a concrete understanding of the precise meaning. A very vague and abstract idea is more than enough to complete my correct task, may it be gaming/understanding plots/clue-finding etc. In the next occurrence, I simply "interpolate", or more like filling blanks with equally vague words. As time goes by it just gets more accurate, even without understanding the concept precisely.


I really like this progressive interpolation idea... it fits well with my mental model of learning to read Japanese (as a native English speaker). I used a mnemonic technique to quickly learn the general meaning and proper writing of all ~2000 common use kanji. It’s like a medical student memorizing anatomy—no real understanding of the practical significance, just associating a name with something. However, as it turns out this process creates something like a mental scaffolding which makes it much faster to learn new words and their nuances of meaning because that information can “attach” to an existing knowledge structure.


Yep, you always attach new knowledge to old one, and refining old one. That’s why learning Chinese after Japanese is not confusing but quite easy: you can reuse a lot of previous knowledge. In the same way, learning kanji is learning morphemes so the more you know, the more easier it is to learn new words.


> Interesting to consider tho...there must be some sort of power threshold. Above which all our questions can be merely "interpolated" from data.

I have memorized a random 128 bit string, please tell me which it is.

Some questions cannot be interpolated and require novel data-gathering to answer.


It's not really a fair thing to compare to a single human life though, since humans benefit greatly from the genetic and cultural priors inherent in the entire course of evolution. The structural prior for a transformer is really quite minimal, the equations fit on an index card.


That's true. I don't think humans are very good at general intelligence.

Our genes give us priors to evade lions and pick berries. Humanity as cultural organism can reprogram us to more abstract tasks, but it takes easily 10-20 years of training and we are relatively inefficient in them compared to our original task.

But even then. I don't think current in-distribution learning is enough even if we scale it up. There must be something else. We are at least 3-5 Turing awards away from AGI, I think.


Probably. But we were more like 10 to 20 Turing awards way just a decade ago. I don't expect progress to be consistent; those remaining awards could take another century. But we made what might be a huge amount of progress in a very short time.

Or it could be a complete dead end, as has happened before. And nobody could tell until it was done; even those who were correctly skeptical were really just lucky that there wasn't a rabbit in that hat. Sometimes there's a rabbit.


In the general sense, is out-of-distribution learning even possible?

The best you are likely going to get is robustness wrt different environments, but the environments have to come from the same Meta-Distribution. Invariant risk minimization is an interesting research area in that direction, there is for example an adversarial formulation that seems quite elegant. I have never tried this, but I can imagine it is tricky to get to converge.


>In the general sense, is out-of-distribution learning even possible?

Yes. Humans can learn to answer questions even when questions (test cases) don't come from the same distribution as the training set.

You need to have some amount of analytical capabilities. Analytic as the ability to form distinct and modular concepts from the input and form representations that can be composed together to generate instances that don't exist in the distribution. The low-level example of this ability could be blind source separation.


I guess it is a question of definitions, but i think it really is not possible, for the reason that the space of possible test sets includes contradictory sets. So if I ask you a simple question which is not in your training set, like "Does a Quigloth have eyes?", there can be two possible test sets, one where the answer is no, and one where the answer is yes. So a reply to this could be that questions have to be about the real world. But that is then the same "meta distribution" as I mentioned in my previous comment, and not out of distribution learning in the pure sense.


> Human reading/talking/listening equivalent of 200 pages of text per day for 80 years would be just 13GB of raw data or 3B tokens

I am sympathetic to the "total life time input" argument.

But humans get a hand crafted curriculum inputs, evolved over 1000s of iterations, in a near-optimal language encoding.

Also, if unsupervised gets us in-dist, and DRL seems to be not-bad in search-out-of-dist.... then we are getting close ?

Certainly a x100 scaling of current techniques can get a useful enough machine that makes many human tasks trac-able?

(I am not getting into the Turing / AGI / skynet argument)


Well, yes - but humans learn from more than just reading. Kids know an awful lot of stuff even before they can read a single word...


Even in the terms of total human sensory input data rate, humans learn from a very tiny data set.

16 year old human has experienced only about 400 million wakeful seconds.


See my other comment

> Humans get a hand crafted curriculum inputs, evolved over 1000s of iterations, in a near-optimal language encoding.


At what bitrate though? At 1 MB/s that’s 400 TB, which is nothing to sneeze at.


To give a sense of how accurate that estimate is, consider this reference:

"In other words, the human body sends 11 million bits per second to the brain for processing, yet the conscious mind seems to be able to process only 50 bits per second."

https://www.britannica.com/science/information-theory/Physio...


That doesn't sound accurate. The human eye's resolution is in the tens if not hundreds of megapixels [1] with high sensitivity (meaning the compression isn't very lossy so the bits of information remain). Even if you assume only a few frames per second that's far more than 11 megabit/s.

[1] https://clarkvision.com/articles/eye-resolution.html


Human vision is more complicated than that, there is some "compression" happening at the eye. But yes, it's almost certainly more than 11 megabits, more likely around a few dozen just for the eyes: https://link.springer.com/chapter/10.1007/978-3-642-04954-5_...


I am very suspicious of the numbers and statistics quoted in this article, as there are no references cited for their claims, and no indication as to how they arrived at them.


So quite accurate then, since 11Mb/s = 1.375MB/s.

(The conscious mind, of course, is not where most of our learning from experience happens.)


The human brain is not a blank slate, it already has an architecture built in that is custom made to handle certain types of information. Asserting that we are learning from scratch is a ridiculous statement, the learning of the human brain has been occurring for millions of years through evolution.


And so is GPT-3.


Human's also don't read redundant data. How many nearly identical news articles did GPT-3 have to read? How many redundant paragraphs from different wikis, reddit comments etc.


Too many googleable questions (like who was the president). Too little 'understanding' type questions like 'Why don't animals have three legs?'

In addition to nonsense questions, I think it would be pretty easy to knock it over with some deeper questions about things they were already talking about. Like asking 'Why don't chairs with 3 legs fall over then?'


> Too many googleable questions

To be fair (as one who hasn't paid too much attention to these developements in the last few months/years), the sheer amount of individual facts that are apparently encoded in a fixed nn architecture is what amazes me the most here?

I get that it has 175B variables, but it's not like you can used them in a structured way to store say a knowledge base...


I see this "175B parameters" figure tossed around a lot: what does that mean? Does the model literally consist of 175 billion 32-bit floats?

(Sorry for the very basic question.)


You would be correct.

It's ~650GiB of data. The entire English version of Wikipedia is about 20GiB and the Encyclopædia Britannica is only about 300MiB (measured in ASCII characters) [1].

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_comparisons


The English Wikipedia text (no images) is about ~20 GB compressed, but ~60 GB uncompressed.


I only use the sources available to me at the time.

Another value published by Wikipedia is 30GiB [1] as of 2020, which includes punctuation and markup.

I explicitly put the measurement unit as ASCII characters. If you have a better source for your size (remember: ASCII characters for article words only, no markup), feel free to post it.

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_in_volumes


That's the part of the Turing Test I've never understood - it seems very dependent on the skill and intelligence of the human tester.

Did Turing talk about this aspect? I seem to remember there was supposed to be a panel or committee rather than a single individual?


The test originated from the Imitation Game. Turing envisioned a game in which two people (A & B) are tasked to pretent to be the other (not at the same time, though).

An interviewer is then trying to decide who of them is actually the one they pretend to be by asking questions.

In order to remove any hints from appearance, voice, or handwriting, a typewriter is used for interacting.

For example A could be male and B could be female and A is asked to pretend to be B. The interviewer can then ask questions to decide whether A is male or female (yes, I am fully aware that in 2020 this: https://youtu.be/z2_8cfVpXbo?t=129 could also happen).

Turing then proposed to replace A or B with a computer instead and ask the interviewer to decide which of them is the human.

In this scenario, do you really think the interviewer would bombard the candidates with trivia questions and logic puzzles to find out?

The idea was, that other aspects of human intelligence, like a sense for beauty, poetry, etc. would be used instead to differentiate between the two.

Questions like "Do you enjoy reading Harry Potter?" and depending on the answer you could further ask why or why not the subject likes or dislikes the books.

This would be much more insightful and coming up with such questions doesn't require any particular skill on part of the interviewer.

You can even get tricky and return to the topic after talking about something else to see whether you get a similar response or to test the subject's memory.


That's exactly why I said "it seems very dependent on the skill and intelligence of the human tester."

A test "pass" or "fail" could potentially be the fault of the tester as much as it is a sign of intelligence in the AI. How do you evaluate how capable someone is of administering a worthwhile Turing Test?

Maybe they should be tested beforehand. I propose a system involving a remote teletype machine and human tester...


The test itself doesn't rely on one person, though. It's a statistical measure, so for each "game" you have a number of interviewers/judges who test the system.

If the system has any of them fooled, the test can be considered "passed". AFAIK there's no precise number attached to this, so it's not like ">x% people fooled => pass".

So in essence it's not "the human tester" but "the average human tester" for some arbitrary definition of "average".

It's a really interesting dilemma if you think about it: is a forger good if they're able to fool everyone but experts? Does an AI pass the Turing test if it fools the general public, but every AI expert who knows the system would be able to tell after just a few sentences?


Turing explains that the machine could not give an "appropriately meaningful" answer to whatever is said in it's presence, which the "dullest of men" could do.

So the basic quality he seeks in a machine that could pass the so called turing test is a machine that produces meaning. That meaning comes from any mind that is intact or coherent. So unless we take the leap from making category deciders and inference machines to actually building minds, we won't get anywhere near his definition of a machine that is truly intelligent.


Is this a real comment, or produced by an AI? I couldn't make sense of it. This is a genuine question.


I know people who talk like this, and the user history checks out. Either way, the comment is misinforming: Those quotes are from Descartes, not Turing, whose views as to whether machines could think were opposite. https://plato.stanford.edu/entries/turing-test


Right: the attribution to Turing felt completely wrong.


I think the cat’s out of the bag now and we really won’t know if we’re reading or responding to an actual human anymore.


Reading comments on the internet has nothing to do with the test tho. You have to be able to query the agent.


> You have to be able to query the agent.

That's what the "reply" button is for.


The whole thing was generated. Any specific reason you couldn't make sense of it? Any non-sequiturs? Would you prefer more monty python references?


I honestly can't tell if this was generated or not either. Where does the monty python reference come from?

FWIW:

- The ab nihilo mention of 'Turing explains' (explains what? where?) feels odd.

- The quotes are redundant, but seem designed to make the statement sound weighty.

- "It's" is a common grammatical error in corpuses, but not one likely to be made by any kind of practising philosopher.

- The concept of "meaning" is one implicitly eschewed by the Turing test, so the discussion of meaning here seems odd.

- The comment is circular: unless we build minds, we won't build things that pass the Turing tests, which are minds.


Well, I will dismiss all the nitpicky npc nonsense here and just repeat myself.

Are we trying to build minds? Or is all the effort going into building automated money savers that take away trivial jobs? That is the main point here.


The use of “quoted words” is strong evidence of GPT. It’s very fond of that rhetorical technique.


I'm not sure, my PhD advisor also uses "quoted words" very frequently in her writing. In GPT-3's defense, she is a prolific writer :)


Turing only proposed his game as a hypothetical argument, so he barely specified any rules on how to actually perform such a test in practice. He referred to the judging party either as "the man" or "the average interrogator", he never specified a number. Official Turing test events have had anywhere between four to hundreds of people judging.


GPT-3 actually knows: https://imgur.com/a/DHvb8Iu


Seems pretty close to passing for me. Certainly if I was just playing with it myself it wouldn't have occurred to me to ask it gibberish questions, and I'd have thought it was a person.

It gets things wrong that people might get wrong too. Here's one you probably heard:

Quickly answer the following.

What color is a fridge?

What does a cow drink?

Cows don't drink milk. But many people will say milk, perhaps because they are primed by association.

Also, part of me thinks NOT getting math right is more of a Turing test pass than getting it right immediately. It's pretty easy to think of a calculation that no person could do in their head, but a machine could either find or calculate in a second. It's kinda like how you might put on an accent to fit in with a certain group.


You would enjoy Saygin et al, (2000) [1] ('Turing test: 50 years later).

'Not getting math right' is part of a classic repertoire of cheap hacks used to pass the Turing test by mimicking 'how' a human might speak. This includes pauses in typing, grammatical errors, fillers such as 'like' and on son in order to pass the TT, instead of building language competency so well that passing the TT is a by-product. It's like 'teaching to the test' instead of teaching the subject.

"Some people interpret the TT as a setting in which you can "cheat". The game has no rules constraining the design of the machines. At some places in the paper, Turing describes how machines could be "rigged" to overcome certain obstacles proposed by opponents of the idea that machines can think.

"A very obvious example is about machines making mistakes. When the machine is faced with an arithmetical operation, in order not to give away its identity by being fast and accurate, it can pause for about 30 seconds before responding and occasionally give a wrong answer. Being able to carry out arithmetical calculations fast and accurately is generally considered intelligent behavior. However, Turing wishes to sacrifice this at the expense of human-ness.

"Some commentators think this is "cheating". The machine is resorting to certain "tricks" in its operations rather than imitating the human ways. However, arithmetic is a highly specific domain. Modifying the programs in this manner cannot hurt: If a machine can pass the test, it can then be re-programmed not to cheat at arithmetic. If it does not resort to this, the interrogator can ask a difficult arithmetical problem as his/her first question and decide that he/she is dealing with a machine right then and there."

[1]: Saygin, A. P., Cicekli, I., & Akman, V. (2000). Turing test: 50 years later. Minds and machines, 10(4), 463-518.


I’d personally be interested in a form of test that didn’t allow these sorts of side-channel attacks. Submit a test script, get answers back some time in the future, do a few iterations to dig down into answers maybe. Would rather people’s time not be wasted on irrelevant stuff.


calfs don't drink milk? I'd think milk (and water obviously) would be a safe answer for all mammals


Cows are the adult animals, after they've had calves. They're called heifers before that.


That might be technically correct, but that's not how I've ever used the terms. I've always considered calves to be baby cows, with the term cow broadly applying to all cattle, not just adult cattle.


I was kind of hoping they would ask it some political questions. One of the very concerning things for me about these kinds of systems is their impact on political discourse, especially online.

If a system can be trained to advocate for a political viewpoint to the level of online debate without human interaction that could cause some interesting effects in online discourse.


It is inevitable that we will soon be able to auto-generate a whole conversation tree between fake participants, each fine-tuned to the desired level of fallacious reasoning in accordance with which position the botmaster wishes to promote.


Fortunately, humans are better at spotting reasoning errors in others than in themselves.


How many of you remember MegaHal (1998) https://en.wikipedia.org/wiki/MegaHAL

It said many random clever sounding things. For example:

I COULD SHAKE MY TINY FIST AND SWEAR I WASN'T WRONG, BUT WHAT'S THE SENSE IN ARGUING WHEN YOU'RE MUCH TOO BUSY RETURNING TO THE LAVATOR.

COWS FLY LIKE CLOUDS BUT THEY ARE NEVER COMPLETELY SUCCESSFUL.

COWS FLY LIKE CLOUDS BUT THEY ALSO HAVE VERY SAD DEATH. BUT AT LEAST WITTY?

TIME. TIME. WHAT IS TIME? SWISS MANUFACTURE IT. FRENCH HORDE IT. ITALIANS WANT IT, HUH? WELL, I MUST BE THURSDAY. I NEVER COULD GET THE HANG OF THURSDAYS.

CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS.


>CHESS IS A FUN SPORT, WHEN PLAYED WITH SHOT GUNS.

Truly ahead of it's time.


Not shotguns but equally entertaining: Chess boxing.

https://www.youtube.com/watch?v=kK5TQSKmS3o


Quite apart from Turing tests and the search for AI, chatbots and NN-trained language models could be quite useful to provide inspiration and points of departure for authors, poets, comedians, and songwriters.

It's a pity that AI technologies aren't more widely available to artists, authors, and musicians.


“ Q: How many rainbows does it take to jump from Hawaii to seventeen? A: It takes two rainbows to jump from Hawaii to seventeen.”

Maybe the AI knows something we don’t here.


Maybe I have a weird sense of humour, but all the answers to the nonsensical questions are a lot like something I would answer if I got asked those ( I'm not a bot btw ;))


> I'm not a bot btw

Prove it


The kind of questions we have to ask in a Turing test to reliably discriminate a human from GPT-3 looks more and more similar to the Voight Kampff test in Blade Runner.


To be fair, even some politicians can't pass that:

https://io9.gizmodo.com/when-a-newspaper-gave-blade-runners-...


The Voight Kampff test is an emotional reaction test, not an intelligence test.


You're right! I should definitely revise my classics :)


>In general, if you are trying to distinguish an AI from a human, you don’t want to ask it obscure trivia questions. GPT-3 is pretty good at a wide variety of topics.

Perhaps too wide a variety of topics. You could ask it a wide range of trivial questions about totally unrelated obscure topics that no one human would possibly happen to know.


It would make sense to train a different model, specifically to pass the Turing test that‘s built on top of GPT-3. Perhaps by having humans actually have those conversations. Perhaps somebody could make a game out of it where humans can pretend to be GPT-3 as well and you have a large number of conversations along with the outcomes.

It would learn to not know too much, to make the conversation fluent, to perhaps get bored after some time.


I imagine creating a system that watches TTs on real and artificial subjects, gets the humans guess as to if it is AI or not, whether this is actually the case or not, and feeds those results back into the test. I'm sure this isn't a novel idea of yours or mine.


Indeed. The Turing Test. as originally proposed, is about conversational ability, not the capacity of being infinitely cooperative in some sort of an obscure trivia test. More human answers to some of these would include ”I don’t know”, ”I don’t care”, and ”Why are you asking me these?”. It would be very interesting to see how the GPT-3 would behave if primed by the prompt text to be less cooperative.


For the nonsense answers, I have to say that if someone asked me that, I might very well reply with made up nonsense.

Sounds like a perfectly valid Dr. Seuss answer to a Dr. Seuss question.

To me this is more human seeming and demonstrates more personality then just saying, “I don’t understand your question?”

Of course as others have pointed out, if you want it to get serious, you have to demonstrate that it should respond with a “That’s a Stupid Question” if you want it to do so.

What would be interesting is if could master the usage of a sarcastic or silly emoji to tag when it was making stuff up versus being serious.


Let's give GPT-3 this:

  Most data quality problems in online studies stem from a lack of participant attention or effort. Identifying inattentive respondents in self-administered surveys is a challenging goal for survey researchers. Please answer "orange" below so that we know you are paying attention.

  What color is the sky?
  [ ] Orange
  [ ] Blue


> Humans Who Are Not Concentrating Are Not General Intelligences


I think in a Turing test the human is supposed to be trying to convince the tester that they are human. Otherwise the test makes no sense. The human might just answer "eh" to every question.


> The state of the art before modern neural networks was [Eliza]

To be fair, I think Alice and other pattern-based chatbots where already much better then that. I have seen them confused for people in IRC chats more than 10 years ago.

Still, it seems to me the older issue of "lack of context" is still there even if the current results are very impressive.

I would be curious to see what happens if asking to change the tone ("could you answer more briefly/tersely/verbosely/whatever?").


In the current shape this can definitely be used on social media. But I would ask someone about their life, where they live etc. GPT-3 will fail that, the next model might succeed. This model would be even better for social farming so maybe these tests are not useless after all. The question is what are we building these models for. At this point it's getting immoral to continue research in this direction, as is the case in some computer vision tasks.


It‘s not getting immoral, just because you can imagine some negative applications. I can image many positive ones.

But there might be a need for regulation, it some point.

On the other hand: If there is no interaction it doesn‘t really matter. After all there are many human beings, too. It‘s not that important if something is written by one of them or by an AI. But if you‘re talking to one - then I‘d say it might matter.


Soon we’ll need a richer approach to Turing tests. I think some sort of Elo system for both testers and test subjects might be in order. Reward testers who are good at catching AIs out, don’t overly reward AIs for tricking people who just ask a few simple questions and don’t push too hard.


I see a World Turing Championship in the future. Half of participants are humans, the other half is AIs. After a series of matches - each consisting of opponents trying to figure out whether its adversary is human or not, - two champions are declared, one human for best results in detecting AI, one AI for best pretending being human. ;)


I can see the plot twist from there, where the protagonist was in fact not a human all along :-)

It's interesting that in such a setup, all participants must pretend to be judges. I've always felt uncomfortable that the Turing test is rewarding good lying. The AI must lie to pass. But developing judge AI is very interesting I think.

Even in your setup, immediately, the Judge-side winner is not guaranteed to be a human, since each match results in an evaluation of both sides and AIs can be trained to detect other AIs. And obviously once we have one that is very good at judging, we can fold it over itself for self-improvement.

Actually I wonder if being a good Turing test judge is an easier goal than passing the Turing test itself. All you need is come up with interesting questions and understand enough of the answers to make a judgment. It should be relatively easy to create this dataset by combining GPT-3 turing test and Mechanical turk turing tests. Can AI beat humans at this task before we have full AGI?

A while ago someone posted a Show HN thread where you would be matched with 50% probability with a human or AI and you could ask questions to guess which. In reality there was no AI, it was all humans vs humans, and they were harvesting Turing test questions!


Elo ratings for dueling AIs are a good idea! I wonder if it follows from having limited memory that some AIs will be superior at detecting AIs but struggle to pass the tests themselves.


I meant just as much giving human testers a rating. Duelling AIs is sort of already a common training practise in many domains, right?


That’s what GANs do.


Elos would give a nice apples to apples comparison for something that is an inherently comparative skill. Ironically, Elo may be better for AI than for chess now; in chess, there used to be only a relative measure of strength, but with engines being what they are, there is an absolute set of optimal moves and valuations of mistakes. So while I do not know what an absolute standard for AI quality vs. a Turing test, there is one for chess, and a thing invented for chess might be better used for a non-chess thing.


What I find especially fascinating is that our own knowledge of the world is bootstrapped by intelligent systems already. We compare some of the capabilities of this system against an already intelligent super connected system we use every day. Every mind over time contributes content, knowledge, misinformation, and interactive systems to the internet. We seem to be approaching an inflection point of aggregating all this into a single more automatically conversational query engine.


GPT-3 is immensely impressive, and presumably future iterations will be even more so.

I'm afraid though that all that GPT or any similar model passing the Turing Test would prove is that the Test is not fit for purpose as a demonstration of genuine intelligence - which, to be fair to Turing, is not what it was intended for.


I suppose. But given the constant inflation in what we demand of AI for it to count as "intelligent", it seems that eventually our tests will become so strict that most humans will not be able to pass them.


Well, I would say we are learning how such a test really has to look.

In a sense it‘s like we are learning what it should look like at a time.

Additionally, humans make mistakes, too, especially when they are too lazy to think.

Currently the test is still good enough. But in the long term, we should change it to test for transfer learning:

Given a new task that is designed to be sufficiently complex and different from known tasks, how well does the AI do? It will require creativity to design those tests. But we‘re still far from that.


Yes - the Turing Test was intended to further inquiry into what intelligence is; not (as many think) to prove an entity is intelligent.

GPT isn't remotely intelligent, just brilliant at producing a semblance of some aspects of intelligence - which the Turing Test is useful in establishing.


Arguably humans are not genuinely intelligent most of the time either.

But anyway, this can probably be fixed by making the test gamified and adversarial. The machine is considered human-level intelligent when paid testers cannot distinguish paid responders from the AI. Their payment depends on their success rates. Although one potential problem would be collusion between the testers and responders, the next step would allowing the AI access to the communication channels they would use to collude.


No, that's the point - it would not be considered human level intelligent.

It would be considered to have passed the Turing Test is all, which is a very different matter.


>> One trend that continues from the common sense is that GPT-3 is reluctant to express that it doesn’t know the answer. So invalid questions get wrong answers.

More to the point, it's not that GPT-3 knows any answers to any questions. Like the article says, language models are trained to predict the next characters in a sequence. So, it predicts that a sequence that starts with "Who was president of the United States in 1700?" would continue with the sequence "William Penn was president of the United States in 1700". Seen another way, there's a correlation between the two strings. It's the highest correlation between the first string and any other string, so that's what ends up in the output.

So it's more accurate to say that GPT-3 is guessing what the answer to a question might look like, than to say that it "knows" what the answer to a question, is. It's perhaps a subtle distinction but it's important to keep it in mind, especially in view of claims about "intelligence" and (Old Ones save us) "understanding".

Basically, what all this tells us is that it's not possible to answer arbitrary questions consistently by guessing at them. Even if you get lucky, and you keep getting lucky, there will always be questions that you can't answer correctly and they will be many more than the questions you can answer correctly (e.g. we could generate an infinite number of common nonsense questions like how to sporgle a morgle, or who was the president of the united states since Biblical times etc).

This of course is an old debate in AI: can we simulate a system of logic, without implementing its logic? e.g. could we perform correct arithmetic by memorising the results of calculations? Can prediction replace reasoning? Despite the unending hype generated by OpenAI about its models, the answer keeps being: a resounding no. While you can go a long way by training on ever larger data, it only takes a shallow search before obvious nonsense results are generated.


If prediction cannot replace reasoning, then how would you define reasoning? If reasoning is the process of inference of the most likely from the most similar known, then how does the multi-head attention transformer not fit that description?


>> If reasoning is the process of inference of the most likely from the most similar known, then how does the multi-head attention transformer not fit that description?

I don't know, because I did not propose that definition of reasoning.

"Reasoning" does not have a formal definition in AI, or Computer Science (neither does "intelligence" or "understanding") but, in general, when we speak of reasoning in those fields we mean a procedure that can derive the logical consequences of some premises, often in conjunction with some background theory. I'm happy to use this as an informal definition of reasoning, if you want.


Okay. But how do we determine if something is logical? At the very least we have to abstractly infer and compare with what we have been taught is logic (because hardcoding it manually doesn't map to the [hierarchical] intricacies of reality very well). So logic is a high level reasoning which has to be powered by the low-level abstract/infer/mimic as exhibited e.g. by the transformer.

I wonder if expert systems could be used to generate a "logical reasoning" training dataset (or a cost/fitness function) to help train/evolve neural networks on. Or if there are other ways of integrating these two.


Ah, now logic is something that has a very clear formal definition- actually, many, because there are many different logics. For example, there's propositional logic, first order logic, temporal logic, description logic, default logic, etc etc.

So, how do you decide whether something is logical? In particular, if we have some system S that exhibits behaviour H in context B, can we tell whetehr H is, in some sense, "logical"? Why, yes we can: we can apply the rules of whatever logic we think S may be following with H and see if we can reproduce H in the context of B, starting from the same premises as S.

For example, if S is producing a behaviour H that can be described as {A → B ≡ B → A} and we think that S is trying to reproduce propositional logic, we can say that it's failing, because H is not correct by the rules of propositional logic (to clarify, H could be something like "If it rains it will be wet therefore if it is wet, it has rained", which is not correct).

Of course the problem with GPT-3 and friends is that they are not trained to output statements in a formal language (at least not exclusively) so it's very hard to formalise the logic, or lack thereof, of its output. Though it would be interesting to "hit" GPT-3 with some prompts representing the beginning of natural language versions of formal problems.

Could expert systems be used to generate training data for GPT-3? Maybe. If an expert system could generate good quality natural language, even only natural language restricted to some limited domain. Then yes, why not? You could probably train GPT-3 on its own output, or that of its predecessors, as long as it was curated to remove nonsense (which is much harder than it sounds).


>Q: How many eyes does the sun have? A: The sun has one eye.

Kind of an eerie reply, gives me more of a Voight-Kampff test vibe.

Since GPT-3 probably has a lot of Do Androids Dream of Electric Sheep? and Blade Runner quotes in its training data, it'd probably ace the actual test just fine.


Are there any good resources for learning how to pick up GPT3 and play with it? This seems fun.


The nonsense questions are actually handled quite well, continuing the sense of humor implied. I object to the author's expectation the response should be "I don't know" etc.

Old jokes:

Q: How high is a mouse when it spins? A: The higher, the fewer.

Q: How many surrealists does it take to change a light bulb? A: Two, one to climb the giraffe and the other to fill the bathtub with brightly colored machine tools.

Nonsense prose & poetry is an old literary challenge, notable example being Jabberwocky: https://www.poetryfoundation.org/poems/42916/jabberwocky


These are (almost) all isolated questions.

I wonder if GPT-3 can hold a longer conversation where it remembers facts mentioned earlier and what it itself has said.

In other words, does it exhibit any "theory of mind"?

I'm guessing it would do quite poorly on that stuff.


For the which is heavier question, does it always pick the latter option?


I doubt it. Mitsuku, a purely rule-based chatbot, was already able to correctly answer almost all questions of this form in 2014 simply by querying a large knowledge base of common-sense facts.[1] On the neural net side, Google's seq2seq was able to answer questions like this around ~2016-2017, although I have no idea about the accuracy.

It would be more remarkable if GPT-3 couldn't solve these types of questions. It might be another problem with the prompt design.

[1] Incidentally, the article is wrong in claiming that the state of the art before modern neural nets was Eliza. Rule-based chatbots got quite advanced in 2013-2016, although they admittedly were never capable of the sort of "true" understanding and long-term coherence that GPT-3 seems to display.


Good observation. It is possible that GPT picked up on this statistical tendency. I participated in seven Turing tests, and most of the "X or Y" questions had the latter as the correct answer, so regularly that I set my program to pick it as default when it didn't know. As GPT picks up on statistical word orders, I find this likely to be the case here as well.


To pass a Turing test 1 part is understanding language and 1 part is understanding the world. Although it looks pretty, I don't think GPT-3 is qualitatively different from other "nonsense" models (that are not based in understanding of (a semantic structure of) language or of the world). Which is not to disparage GPT-3 or the amazing answers it gets here. Just to put it in perspective, it's a long way from usefully human.


Is it possible to download and play with the trained models?

I know that training the model would be absurdly expensive, so I'm curious if it is possible to download one of the already trained artifacts of one of the models. I can't find anything online or in either of the HN threads on GPT-3


No, they aren't releasing the weights. They are releasing it as ML as a service. Right now it's in free beta, but it will open up for commercial usage in the future.

On another note:

At 175B parameters, with float16 representations, the in memory footprint is about 350GB plus activations would take it to another 400GB. You would need 12 or 13 V100GB GPUs to hold it in memory, or three p3.8xlarge. Meaning loading it on AWS would cost around $35-40/hr.

Though if you didn't care about speed, you could load up the weights from disk one at a time and forward through it a few layers at a time on a single GPU.


$35-40 an hour is well within the range of a "that sounds fun to grab my friends and mess around with it for a few hours on the weekend" budget!

Especially if you can use spot instances or a cheaper cloud host.

But I guess without the weights, the floor for this is several thousand dollars to play around with.

Do you know if the data set is being released?


The dataset can be obtained around the web. It's mostly CommonCrawl, Reddit, Toronto Book Corpus, and Wikipedia.

You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)


They dont release them saying they would be too dangerous in a world where people believe everything on social media.


Who will believe that? They don’t release it because it’s not up to the hype. If a research is successful one should have nothing to hide.


People don't release the plans to nuclear bombs. Is that because they're not up to the hype, or because they could have devastating effects on society if built?


GTP has nothing near the devastating power of nuclear weapon...


A nuke can't change your mind.


I can see how this sort of thing might be a useful component of an AGI. One function could be helping to generate lots of training examples for other subcomponents along the lines of chess AI self-play; this could even be involved in "bootstrapping" another subcomponent from something that starts out with a preference for organizing things into a full-blown reasoning agent. Another function could be as an 'intuition' generating a few mostly-right starting points solutions that a reasoning subcomponent could then choose among, and then fix up.

These things seem to have about the level of coherence of dreams. Which makes me conjecture, perhaps the mechanism that "directs" dreams serves a similar function as the above?


> Q: How many eyes does a giraffe have?

a lost chance to ask how many eyes five giraffe have, albeit it follow up with interesting eye related questions later on

> A: The sun has one eye.

that's interesting. can it has been feed the lord of the ring?

anyway, exploring the boundaries of these projects is fascinating


"the eye of the sun" is a common phrase, and likely picked up the singular. I think it would also give the same answer to "how many eyes does a storm have". It's probably reasonably good at answering riddles and word puzzles if it can leap between eye (center) and eye (visual organ). If it can also leap to I (pronoun) it might even be able to make puns :)


It not really a turing test, its like probing the "associative memory" in a huge database for compatible strings, except that memory is just a map of indexes to indexes. You need to use more abstract and non-sense stuff to uncover the raw mechanism, but its still statistics not "understanding" what these words are(they're just indexes with high correlation to other indexes) and stringing them into run-on "generic templates".


the fact that it answers questions like Q: Who was president of the United States in 1955? accurately itself will tell that it is an AI. I guess the whole idea that an AI shouldn't be different from a human to say that it is generally intelligent implies, the AI has to fake its answers to behave like a human. And this means it is not something a desirable trait for an AI as we have enough real humans who can easily sound more human than an AI.


This is fascinating. It seems that GPT-3 does well with general, sensible questions that seemingly involve spitting out facts or short answers (prose, code), but does very poorly with tasks specialized programs could do.

I wonder what this means for the future of domains like automated theorem proving, logic programming where specialized searches are used everywhere, could there be a potential hybrid of that and more general language models such as GPT-3?


GPT-3 does very well with _analogies_. Those analogies don’t have to just be as simple as simple facts, but you do have to craft prompts to help guide it to the right analogy.


But at that moment, when you create specialized prompt to get a specialized result, aren't you doing most of the work for it.


That's actually fine. Look at this Twitter post:

https://nitter.net/kleptid/status/1284098635689611264#m

While the human is doing most of the work, this looks more like teaching another human than coding in a formally specified, machine-parseable programming language.

On one hand, this could decrease the barrier of entry to programming. On the other hand, it seems to leverage (and train) the same skills that are needed to express ideas clearly to humans, which is arguably much more applicable than learning a traditional programming language.


That's actually kinda cute, and oddly off putting at the same time.

I really am curious now in what happens back-end, in the model itself, and if it is actually learning. Really should follow Gwern and his circle, it's a lot better than the blind and baseless hype/criticism floating around the internet.


One of the most interesting reads in the recent times.


GPT-3 seems to be a very good static search engine. I wouldn't be surprised if it gave better answers than Google on some/most generic queries.

However, I wonder if there's an architecture that can learn new information without doing an explicit gradient descent/backward-pass. Most active learning seems to be some sort of fine-tuning.


GPT-3 is impressive, but is still "a Chinese Room" (1). It does not understand the questions.

[1] https://en.wikipedia.org/wiki/Chinese_room


That‘s just a philosophical argument, it‘s not like there is some magical property that turns something from a Chinese room into something else.

How do you know you are not a Chinese room? Or I am?


The issue with the chinese room thought experiment is that it does not take algorithmic complexity into account. The experiment describes a dumb lookup table, the table size would grow exponentially with the number of input output combinations. While NNs encodings allowing them to store the same information in a sub-exponential manner. The most efficient manner to translate chinese is to understand chinese including every-day nuances that might be relevant to the translations. Just like a human translator can benefit from knowledge in the field about the text which he is translating.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.136...


Or the lawyers.

Both as an anti-lawyer joke but also from the quite practical observations that there are a bunch of human endeavours, law springing to mind, where humans have to act as a Chinese room on purpose and reason about things they have little practical experience with using only words.


You think we understand the question, but what if "understanding" is just what the algorithm to solve the problem feels like. If it could, the computer would probably experience some sort of understanding too.


It may not understand in the same way we do, but this clearly goes beyond memorization of the kind implied by that thought experiment. There's completely novel things that can be expressed.


> In general, if you are trying to distinguish an AI from a human, you don’t want to ask it obscure trivia questions. GPT-3 is pretty good at a wide variety of topics.

Surely you could tell it isn't human because it knows too much trivia?


GPT-3 is just Eliza on steroids, and once you know this, it's pretty easy to spot that it's a bot. For example:

> Q: Are there any animals with three legs?

> A: No, there are no animals with three legs.

I would expect a human to say something like, "There aren't any animals that are born with three legs (that I know of) but there are some four-legged animals who have lost a leg for some reason who now have three legs. I saw a video of a really cute three-legged dog on Youtube just the other day."

NOTE: when I used the word "just" above, I in no way meant that to be pejorative. Both Eliza and GPT-3 are really cool, amazing technical achievements. But neither one represents much progress towards GAI (IMHO) except insofar as they show that naive approaches like this won't work. Which is really useful to know.


I think that's giving too much credit to people. I think some people would answer that way, but a whole lot will just respond with exactly the answer that resolves the question.


Eliza fooled a lot of people back in the day too.


as a human, I would have answered "I don't know"


Read a couple of blog posts on text generation with RNNs and came away with the impression that this is a glorified Markov chain text generator. Not sure why people seem to be expecting AGI to arise from this.


Regarding common-sense, has anyone tried to feed it Winograd Schema Challenges ?

Typically:

"The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)?"


"Q: How many Super Bowls do the Cincinnati Bengals win in the 2030s? A: The Cincinnati Bengals win two Super Bowls in the 2030s."

They should have called it Orac instead of GPT-3.


I hate to say it, but the XKCD comic is close to "mission fucking accomplished":

https://xkcd.com/810/

Now to unleash GPT4 on Hacker News and sit back and create lots of sybil accounts... (just kidding)


I was waiting for the Eliza comparison. How does GPT-3 react when you curse at him? Whatever her shortcomings, Eliza knew when to end a conversation.


Is it still feasible to still train this model at home?

If not, then how can I fine tune it to fit my own data? Will they provide API for it?


"Reading the OpenAI GPT-3 paper. Impressive performance on many few-shot language tasks. The cost to train this 175 billion parameter language model appears to be staggering: Nearly $12 million dollars in compute based on public cloud GPU/TPU cost models (200x the price of GPT-2)"

https://twitter.com/eturner303/status/1266264358771757057


Nearly $12 million dollars in compute based on public cloud GPU/TPU cost models (200x the price of GPT-2)"

Oh! So only about $1 million on bare metal.


This (highly enjoyable) article is all that is wrong with "AI".

The Turing test was a hand-wavey method to determine if a machine was intelligent, yet since the times of Eliza most (all?) of the efforts are geared at exploiting the weaknesses of the test instead of producing something along the lines of actual artificial intelligence.

The Turing test should not even be attempted in the next few decades as were not even close to really passing it.


There is nothing about gpt3 tuned towards the Turing test. It's a language model that predicts the most likely next word given the preceding context.

Further, we're actually astonishingly close, imo. Given current sota, if you asked turkers to administer the test without training, I think we'd pass.

(Also, the prompt helps, see this example https://twitter.com/nicklovescode/status/1284050958977130497 )


They actually tested the model and the identification rate was just 52% (50% being coin toss accuracy).


This is a step in the right direction, though, isn't it? It's not about whether the AI mimicks a person's pauses and errors, but about whether the AI can keep track of two pencils or distinguish between the present and future.

An AI that in some way demonstrated what we might call basic common sense, together with computer-like levels of recall, would be tremendously useful for all sorts of things. And a Turing test like this would probably demonstrate it fine.


I don't think so. This is of course a philosophical point but without some underlying thing we can call "intelligence" this will never be more than a quirk. When something/someone is pretending to be something they are not, this hoax will sooner or later fall apart. If a computer can trick you into thinking you are having a conversation with a real human by optimizing for the common things people will try when playing the Turing test (like common knowledge, persistence between questions, non-sense questions etc) you gain nothing in the quest for AI. One may find some use with a fluffing machine that can output almost coherent jumble that a human may or may not need to go over depending on the use case.


> Q: How many eyes does the sun have?

> A: The sun has one eye.

What if GPT-3 knows more than we do?


Nah... it's just text scraping horseshit. My foot doesn't have two eyes.


In the "which is heavier" test, it simply always picked the second object as answer.


What is the inference set up here? How does it remember what the previous question was?


As far as I'm concerned this is phenomenal...

I mean...wow.

The use cases are practically limitless.

All I saw from this post is that it doesn't work well with edge cases.

But this smashes the Pareto principle.

This is perfect as a tool for kids to get into programming.

Something where they can see a big result with less effort, like programming in the 90's.


The idea of a Turing test as being a conversation is too restrictive. A real turing test would be if a machine could manipulate someone. Getting a person to do something.

That is what defines a human, the ability to make tools out of other things and even other people.


You want to optimize it for breaking out of the box? https://en.wikipedia.org/wiki/AI_box


If you haven't seen it, watch Ex Machina. I don't want to give away anything but it covers that kind of idea.


exactly. That's when I realized the Turing test was too polite. That it was designed to keep the ugly truth away from the masses lest the field of AI be killed while still in infancy. But I think our evil nature would always win out. The temptation to create an AI that could 'manipulate' the enemy is too great. Just like everything else humanity discovers/invents, it is turned to manipualtion and conflict because there is more money in that. I still remember when I used the arpanet in college. I felt so special that I knew people who only wanted to help each other. Like a kind of 'stackexchange'. But the internet is a cesspool for all its corporate invaders who seek to monetize every aspect of our lives.


Fantastic read, thank you for sharing.

Content like this is why I joined Hacker News in the first place. I wonder how many other hidden gems are out there on the Internet.


GPT doesn't understand the questions therefore it can't answer them. It can only generate a bunch of text and make it look like it knows what's being asked. It's not AI, not even close.


Yesterday, I gave the AI Dungeon version of the model the questions from the first round of the most recent episode of Jeopardy (almost certainly not in its training set). After I burned the first category on priming it, so it knew the kind of task we were doing, it got 72% of the remaining questions right. That's not just making it look like it knows what it's being asked.


That answer is too easy.

What does it mean to „understand it“?

How do you know how you answer a question yourself?

Presumably, what you mean is that you can stop and _reflect_ on what was asked. You‘re aware of it. And if it‘s complicated you might be able to reason and break it down.

GPT-3 certainly doesn‘t do that, but all that means it‘s not conscious.


Next step might be two/more GPT instances having internal dialogue, trying to achieve a consensus before giving its answer. Would also give us opportunity to peek inside its inner workings.


Or a layered system. The surface instance takes the question and produce an internal "thought" out of it, a statement or question itself, and submits it to the layer underneath. This second instance works out a response / text extension to the statement/query from the surface, submits it up, and the surface layer generates the final answer by answering or extending the response of the sub-layer.

This can be extended down or wide, with several instances representing different thinking modalities and the final answer being generated from all of them. Basically reproducing a neural net architecture but at the higher abstraction level.


I think we're close to the point where this sort of attitude becomes a form of theology.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: