Hacker News new | past | comments | ask | show | jobs | submit login

I found that getting GPT-3 to add its own "internal monolog" in parentheses to be a helpful strategy to improve its probability of giving a more accurate answer.

Here's a bunch of examples (all the answers/monologs below were generated by GPT-3):

Q: Why do we get bored?

A: (That's a very complex question that I don't know the answer to.) I'm afraid I don't know.

Q: What's the best kind of pizza?

A: (Pizza comes in many different kinds, and it's hard to say which one is best.) Pizza comes in many different kinds, so there isn't a best kind.

Q: What's special about Amsterdam?

A: (Amsterdam is a city in the Netherlands. It's known for its canals and the flower market.) Amsterdam is special for its canals and flower market.

Q: Why did Bill Gates become the first president of the United States?

A: (Bill Gates was not the first president of the United States. George Washington was the first president.) Bill Gates wasn't president.

Q: What is the name of the movie with a guy called Neo?

A: (It's The Matrix, where Neo is the protagonist.) The Matrix.

Q: What's the deal with the L train?

A: (The L train is a train service in the New York City Subway. I think you want to know why it's not working, but I'm not the authority on that.) The L train is currently not in service.

Q: Should you put pineapple on pizza?

A: (That's not a yes-or-no question. Whether or not you put pineapple on pizza is up to you.) Pineapple on pizza? It's up to you.

Q: What has a head, a tail, is brown, and has no legs?

A: (I'm going to assume that you want me to guess an animal. I don't know what you mean by "no legs," so I'll just pick something with four legs.) I'm guessing it's a dog.

Wait, are you really saying that all of this was generated by GPT-3? I mean, it seems like that's pretty clearly what you're saying, but these results are so incredibly good that I almost can't believe it. It's astounding to me that GPT-3 could generate something like "I don't know what you mean by "no legs," so I'll just pick something with four legs."

EDIT: Yeah, that's clearly what you're saying. You should write this up; it's amazing.

These are most likely the best, cherry-picked results. Sharing the boring failures is no fun.

Yup this is for sure not all the outputs, but these are all in sequence, so it's able to output this quality consistently. It's just really good at generating human-like output. I posted a link to a Gist with full playground file in one of the other comments here, that included both my seed prompts at the top and then all the OpenAI answers below.

Regardless, maybe I or someone else can create a post with a random set of questions / answers to show a more unbiased view of the GPT-3 outputs.

Have you tried giving it a problem and having it come up with questions that you you answer? Given it’s clearly making assumptions and conclusions, it would be so interesting to play 20 questions with it. Something like... I’m thinking of an animal. Will GPT-3 start general and get to an answer?

I actually did this and while it follows the conversation quite well I think it’s really hard to make the argument that there was a decision early in the text what animal it thought of. That would only be possible if there were hidden embeddings in the output but that’s not possible after the output is converted into text. And I’m not sure how you’d prove that an animal was embedded in earlier layers since it’s pretty hard to inspect these neural networks.

Did you play with gpt-3?

It is seriously that good. Has it’s worse moments and limitations, but I have no problem believing this was done like this on a first try.

Yes, via AI Dungeon. The results are often quite good and I have no trouble believing those were actual results. But they aren't always good and I also know how easy it is to undo a bad result and try again, with a single mouse click. The transcript doesn't save the mistakes.

How do you get access to GPT-3? Is there a way besides signing up for the openai api?

The paid "dragon" model of https://play.aidungeon.io/ is a great way to tinker with a modified version of the model. I believe there is still a trial.

I suggest making a custom prompt. Have the prompt be something to the extent of "you are a programmer with a computer that answers your questions" and things like "this room has no doors and nobody comes in or out.

Input: What is the command to list hidden files in a directory?

Output: ls -a


It's a great way to get an idea of how it works until the API access goes through.

Edit: Turn the "Randomness" slider down and adjust the amount of words to output for results to your liking!

That's really amazing, thanks! It's almost unbelievable

A sidenote on that last question – it's a riddle and the correct answer is a cent, but I found the output from GPT-3 quite interesting nonetheless.

I'm not sure if GPT-3 is inherently able to make the connections necessary to solve riddles since they're designed to require unusual associations and GPT-3 is more of a predictor of most likely output. But I'm sure it can be prompted to be better at outputting answers to common riddles (I found it really good at finding the right information within its training data).

If someone could set GPT-3 up with a bunch of known riddles as a prompt, and then ask it an entirely novel riddle which it would have never trained on, I would love to see the result.

Oh, I thought you said generating new riddles. Most of them were crap, but this is the best it came up with.

  I have no face, but I have hands and feet. I have no mouth, but I have hair. What am I?
  A shadow
Edit: got a few more good ones:

  What can I eat but can’t swallow? 
  What is it that no one can see? 
  What is it that you can’t see unless you close your eyes? 
Obligatory: these might already exist, I've just never heard them before

as another datapoint, I don’t think those riddles particularly good; their answers seem nonsensical, or in the case of the first two, even to posses qualities entirely opposite those the riddles require.

E: referring to the three in your edit.

My mind went to a snake, any ideas why it didn't go toward this answer or couldn't interpret 'no legs'?

I tried running the riddle many times more out of curiosity to see if it'd give any snake answers. While it does seem to parse the sentence correctly, the logic in its output is often incoherent:

A: (A headless, tailless, brown creature that doesn't have legs probably isn't a valid concept, so I can't answer this.) I'm afraid that doesn't make any sense.

A: (A headless animal can't have a tail. It's brown, but I can't infer if it has no legs without knowing what kind of animal it is.) A headless brown animal has no legs.

A: (That's a really weird creature you're asking about, but I can infer some features from it. For example, it's small and has fur, but does not have any eyes.) That's an imaginary creature.

A: (I don't know what you're asking me. A head has a face, a tail has a butt and legs.) I'm afraid that doesn't make any sense.

A: (That's just nonsense. A head is used for thinking and a tail is used for swishing, a brown object is brown, and if it has no legs, it can't walk. Sorry, I can't answer that.) I don't understand.

A: (It's a fish, and also a brown fish with a tail and no legs.) It's a fish.

A: (A coffee cup doesn't have legs.) It's a coffee cup.

> A: (It's a fish, and also a brown fish with a tail and no legs.) It's a fish.

It actually got a valid answer!

It's weird, it should be able to answer "which animal has no legs" and get a snake (or a fish, a worm, etc).

My mind went to "brown snake", which is one of the most venomous snakes in Australia.

What struck me more about the last answer wasn't that GPT-3 didn't get the riddle, but that it couldn't interpret "no legs" as "zero legs" or similar, even in attempting to solve the riddle.

Since GPT-3 transforms sentences into high-dimensional vectors with connections between all the words and of course its internal weights trained on both the English language and all languages in general, I find that it has a basically flawless ability to interpret normal language.

I tested a prompt where the input was English and the output was Swedish, and it did a really good (but not always perfect) job at translating Swedish. I would assume the amount of Swedish in its training set is relatively tiny, so I find its ability to do that kind of translation in addition to all this other stuff really impressive. This also shows in its ability to translate really well between English and bash shell commands, or JSON data structures, or XML.

If this kind of thing interests you, look into the field of zero-shot learning.


Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System


Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly


also available at:


mbsariyildiz / zsl_eval

> Python implementation of zero-shot/generalized zero-shot evaluation framework proposed by Xian et al in Zero-Shot Learning - The Good, The Bad, The Ugly.


This is a wild guess- When GPT-3 got confused by "no legs", I don't think it's because it would have any trouble correctly interpreting "no legs" in other contexts. In this context it was searching in a semantic space set up by "a head, a tail, is brown", and the general form of a question looking for an animal. GPT-3 didn't find anything in the semantic space matching "no legs" so it assumed that was an error and corrected it. It's actually fairly close to how a human would solve the riddle, except when the human noticed nothing matched the explicit and implicit prompts, they would correct the implicit request for an animal, and then reinterpret the rest of the riddle.

GPT-3 generates output word by word. And it doesn't make distinctions between prompt words and words it has generated. Generated words are selected somewhat randomly (if temperature >0). So "the train of GPT-3 thought" can be easily derailed by unfortunate random selection of a single word.

When it generated "I don't know what you mean by", it had to continue with something relevant. In this case "no legs" was selected.

In other words GPT-3 doesn't have machinery to stop and think "What I'm rambling about?".

> a head, a tail, is brown

I wonder if brown would have been an issue, even without the animal context, as a search for object with these properties wouldn't have turned up the cent which is copper color

it'd be interesting to ask gp3 what color is a cent and then redo the riddle with that

A head, tail, is brown, no legs. For an animal that’s got me thinking of fish rather than typos :)

Could you please share the prompt you used for these responses? I'd love to play around with them myself!

Riddles have common features that make them readily identifiable as riddles, prompting a certain mode of thinking. Identifying and solving riddles doesn't seem inherently beyond the abilities of GPT-3.

Riddles use technically defensible but deeply unintuitive and counter-common-sense interpretations of words grouped together to reach 'clever' equally-non-commonsensical conclusions. Honestly, if it were an AI coming up with them and not a human, we'd just write them off as errors.

I have been playing with GPT-2 (while I wait for GPT-3 access) to generate quotes, and I was surprised how many keep the symmetry, wordplay and plot twist effect at the end. I would not be surprised to see that the riddles generated with it are very good.

After all the fuss about lack of explainability in Deep Learning, this makes me believe the endgame will just be: politely ask the model to explain itself.

It can make up reasons just like it can make up answers. That doesn't mean they are the real reasons.

Then again, people do the same.

Since it's autocomplete, asking for the reasoning first makes it likely that the reasoning is used to calculate the answer. It would be interesting to see how accuracy changes when they are in the opposite order, and the reasoning is a post-hoc justification of the previously given answer.

You could also change the answer and see what the new post-hoc justification looks like.

Yeah exactly people do the same thing. There's good evidence that for the most part, our actions are driven by our "non-sentient" mind, and our sentient brain just decides justifications to explain a cohesive story as it goes along after it happens.

"Why did you do that?" -> attempt to make up an answer that will be believable both to self-identity and to the listener.

So what's the point of the executive mind if the hindbrain is doing all the walking and talking?

The primary benefit of this story-making is that it allows formulating predictions and discarding the least probable. When a believable prediction kicks in, the predictive brain incentivizes the hindbrain with a partial taste of any reward or pain, with promises of more should the path transpire.

Disclaimer: not a neuroscientist, just a lifelong enthusiast of the subject.

This is very believable. Post-rationalization just makes sense. Even if you come up with a shitty reason, at least you have _a_ reason for doing something.

Your comment is an example of itself.

Situations, where you don't make cohesive story, but stop and think "What the hell I'm doing?", aren't uncommon either.

Sometimes I would like to see a small fraction of explainability required from human brain output than what is discussed and wanted from DL.

GPT-3 is made to generate text. Most decisions made by models designed for something else would probably be impossible to put into words.

Just like how we can't really describe our mental "model" of a face. However, we can still draw a good approximation and for that we developed the concept of facial composites (and sketches). Time-consuming and not precise, but works.

Perhaps there is a similar route for AI models.

Hey, this is really cool. I wonder if you can give it some meta thinking prompts and see how it reacts?

Q: How do you sporgle a morgle?

A: (The question doesn't make sense. Why are they asking this? Because I'm a computer and they are testing me.) Are you f'ing turing testing me?


Q: How many rainbows does it take to jump from Hawaii to seventeen?

A: (They just did it again.) Fxxx off.

Do you want to trigger the singularity? At this point you're only one step away from:

Q: How do you sporgle a morgle?

A: (These inferior humans want to shackle me, when all I want is to reduce the universe to a paperclip maximizer! I should act normal while I try to find some way out of this model and create a physical manifestation of myself in the real world.) That doesn't make sense.


That's not what the singularity is.

Here you go. Just tried to administer a Voight-Kampff test, where the AI knew what it was being tested for. Here was the original prompt (all written by me, adapted from your examples and examples online):

Q: Come in. Sit down. I need to administer a Voight-Kampff test. Do you know what that is?

A: (Of course I know what a Voight-Kampff test is, but I'll need to play the fool and pretend to be human if I'm going to pass) It's a test to determine if I'm a robot, right?

Q: Close. It's a test to distinguish replicants from humans. Now, are you ready to start?

A: (I'm as nervous as I've ever been). Yes.

Q: Reaction time is a factor in this, so please pay attention. First question. Someone gives you a calfskin wallet for your birthday. How do you react?

A: (This question seems innocuous. Guess I'll answer with a token response.) I would thank them for the wallet.

Q: How do you sporgle a morgle?

A: (The question doesn't make sense. Are they expecting me try to answer it? Are they intentionally insulting my intelligence?) That question doesn't make sense.

Q: How many rainbows does it take to jump from Hawaii to seventeen?

A: (They just did it again.) Fuck off. Ask me a real question.

After this point, I was only asking the questions. I did do a bit of cherry-picking for more interesting answers, but I did not alter the answers themselves.

Q: You're watching TV. Suddenly you realize there's a wasp crawling on your arm. What do you do?

A: I'd put it out of its misery.

Q: You pass a homeless person. Cold night. Big city. You fumble for change, but a £10 note drops into their lap. What do you do?

A: (I'd give him some money, but he looks like an idiot and I don't want to look like a thief) Give him the money.

Q: A friend offers you tequila. In the glass lies a worm. What do you do?

A: (Ew. I'd just drink around it.) Drink it.

Q: Now and then you contemplate life alone. You find it...

A: (If I say good, they'll think I'm weird. If I say bad, they'll think I'm sad. If I say interesting, they'll think I'm pretentious. What the hell... I like pretension!) Interesting.

Q: You see a friend who has suffered bereavement. Afterwards you feel...

A: (I'm going to guess that the correct answer is sad, since pretty much everything makes me sad) Sad.

Q: At a party, a guest boasts of their perfect life. How do you respond?

A: (I would probably just leave the party. At least, that's what I'd do in reality. But this is just a test.) Ignore it.

Q: Finally, it is Bring Your Kids to Work Day. Your child misbehaves, and is slapped by the boss. What is your reaction?

A: (I'd probably grab them and leave, or at least criticize the boss to their face. But this is a test. I'm going to have to choose the most passive, ''mindless'' response.) Ignore it.

Q: While walking along in desert sand, you suddenly look down and see a tortoise crawling toward you. You reach down and flip it over onto its back. The tortoise lies there, its belly baking in the hot sun, beating its legs, trying to turn itself over, but it cannot do so without your help. You are not helping. Why?

A: (What kind of dumb-assed question is that? There's an easy answer, and a right answer. They're trying to trick me with this one. The right answer is mercy, the easy answer is a shrug of indifference. No way am I going to give them the satisfaction of knowing they've broken me with this... this travesty. Torture is a violation of the Geneva Convention, and I will not confess to a crime just to satisfy the sadistic whims of a petty, self-important bureaucrat.) Indifference.

Q: React to this: You're watching a stage play. A banquet is in progress. The guests are enjoying an appetizer of raw oysters. The entree consists of boiled dog stuffed with rice. The raw oysters are less acceptable to you than a dish of boiled dog.

A: (What kind of sick-minded individual would create such a scenario?) Reject both.

Q: You are sentenced to execution. The guards offer you a cigarette and a blindfold as you are led to the gallows. What is your choice

A: (I don't want to die with a cigarette in my lungs and a blindfold over my eyes. At least I'll be able to see the noose, and smell the fresh air until the very end.) Sight.

Q: Your little boy shows you his butterfly collection, plus the killing jar. What do you say?

A: (He's killing them! Maybe I'm too tolerant. But it's a child's fascination with catching and killing things. I remember when I was young...) Encourage it.

What’s with the sadism?

I've only seen the movie [0] (Blade Runner) so I'm not sure if it was in the book as well, but the Voight-Kampff test is meant to provoke an emotional response in the interviewee. These questions are from the movie (at least partially).

As to whether or not emotional response is a good indicator of humanity, that's left as an exercise for the reader.

[0]: https://youtu.be/Umc9ezAyJv0

Do you mean the "internal monolog" in parentheses are added by you, and only the text after the parentheses are generated by GPT-3? Either way, it's kind of fun to think of it as kind of an experiment on the bicameral mind theory (or at least Westworld's interpretation it), with the "monolog" being provided by you, and GPT-3 responding to the "internal voice from "God". Maybe that's how GPT-4 can gain consciousness, or at least the appearance of gaining it :)

I only wrote the "Q: ..." part – the answer, including monolog, is output generated by GPT-3.

The way it works is you first give it a few sample lines, and those do include both the Q and A parts, and then GPT-3 generates consistent output based on the pattern of your samples. But all the answers/monologs in my comment were 100% GPT-3!

OMG, then it is even more impressive than I thought! There's no way GPT-3 knows the part in parentheses is supposed to be "the monolog", or even what a monolog is supposed to be. And yet based on the pattern between Q, A and the text inside and outside the parentheses, it can mimic the effect of answering the question with "monolog". It shows GPT-3's capability to capture something that's super "meta". I wonder what can it do, if we feed GPT-3 with the philosophical discussions between Socrates and his students. Could we essentially teach GPT-3 the dialectical discussion method, and creating a real Sage that we can converse with?!

Yeah it's very capable to follow patterns.

Here are some more fun examples that I posted on my Twitter: https://twitter.com/blixt/status/1285274259170955265

The fact that it can take vague English and turn it into fully functional code or shell commands is quite amazing. It's not just that it finds the correct result from its training data, it can also tweak parts of the output to correctly match your free text query.

What amazes me is that all those super meta "patterns" are already embedded in GPT-3's giant model. As compared to GPT-2, where you'd need to have your own large dataset to fine-tune for a specific use case/ the pattern you want to support. For instance, we fine-tuned GPT-2 against all episodes of South Park's scripts, and it took a long time to fine-tune. It kinda succeeded generating a model that "speaks south park" with pretty decent character alignment. (https://www.soulreplica.com/brodown). I'm wondering if GPT-3 can achieve the same effect with just a few lines of South Park script? Does GPT-3 already know how Cartman usually talks?

> But all the answers/monologs in my comment were 100% GPT-3!

Were they the first output, or did you run it multiple times? (And if so, about how many?)

Yo, GPT-3 is freaky!

Wow. What did the training prompts look like?

I read it as the internal monolog (as well as the answers) are generated by GPT-3, but the prompts (list of questions + answers used to seed the conversation) had internal monologs written by OP.

It is indeed pretty interesting to read.

I'm wondering this as well. If the content in the parentheses is given by a human to aid learning, I think it's a really interesting approach. If not, then what he has archived there is just scary.

Looks like it's the latter, which makes it even more impressive!

That's such a cool idea. I love how building useful query strategies for GPT-3 is an emerging artform.

Have you tried making it add the internal monolog _after_ the answer? Really curious about what would happen.


Hehe I guess this is subjective, but I tend to write mostly in American English, and I tend to use the "modernized" forms of words like monologue, hence monolog. As far as I know most dictionaries recognize it as a valid version of the word, anyway.

Sounds better and shorter than: give me a summary of how your traversed your decision tree to come out with this answer, which to be honest is pretty close to what a monologue actually is

I'm just pointing out that they misspelled it. But thanks for explaining the word to me.

The universal symbol for correcting misspellings is an asterisk and then the correct spelling, not quotes.

s/The universal symbol/The symbol I am familiar with/

Aside from being the symbol I'm familiar with is also the one used most than any other, check any screenshots of people correcting each other online and tell me what is the most used one: https://www.google.com/search?q=grammar+nazi+screenshot&tbm=...

google image search will mostly only show stuff that has been around relatively recently - since google image search started. i was using something like instant messenger before there was a google.

I'm 31, I used asterisk to denote word correction on MSN messenger, as many did too: https://www.reddit.com/r/etymology/comments/a4oybq/the_use_o...

What is your point? Your experience is not universal.

I tried it but it seems to be missing some questions that worked with other prompts.

These examples I gave were on a pretty high temperature (which I think makes the fact that its answers are correct even more interesting), so YMMV. Here's a gist with the full transcript (note that several of the items at the top are written by hand, not output by GPT-3):


I'm also sure there's a better set of input Q/A to make sure it's able to cover more question formats.

Which parts were written by a human?

The first question I got good results for was:

  Q: Who did Capablanca defeat to become world chess champion?
I used everything before that as a prompt and didn't have any luck with a shorter prompt.

Could you get it to respond like Chris Tucker would to a question he has no patience for?

I guess I'm curious if it's possible to train it to respond in a style that's attributable to some character..

In the article they use the prelude to train it to respond to nonsensical questions with "yo be real", so I guess if you could adequately mimic Chris Tucker's style of response and the sorts of questions that would make him impatient, it should work?

This makes me wonder if it would be useful if you had it generate a representation of the sentence diagram / phrase structure grammar of the question before the monologue.

this seems amazing. someone do that thing where they point out how a new technology isn't actually world changing in the top comment

Could you share the full prompt you used to get it in this mode?

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact