Hacker News new | past | comments | ask | show | jobs | submit login
Github Copilot Wants to Play Chess Instead of Code (dagshub.com)
280 points by idonov 14 days ago | hide | past | favorite | 164 comments

I've been using copilot to write markdown files for a while already and it's really useful. Like the Gmail/Google docs autocomplete but way better.

It's also nice that it uses the structure of the current file and (I think) context from your codebase, so i.e. if you're writing structured documentation, it's occasionally able to write out the whole function name with arguments with descriptions, all in the right format. Very impressive.

The most amazing thing about that context thing is that in VSCode it actually uses your current debugging session as well. For example I have a FHIR resource with a resource_type of "Medication", but its a polymorphic type in Python. Copilot, looking at the other code I wrote and the current paused debugger will actually recommend my next line of code be based on that debug session and recommend:

if type(resource) is Medication:

I think I'm writing 20% of my code these days. I fix up about 30% of the suggestions, unless it's a strip of boilerplate which it almost never gets wrong.

Everything has a unit test in this python codebase so I make sure that it's all correct.

The next step will be to suggest polymorphism and avoid the type checking all together!

I was under the impression that Copilot does not use context from other files, only the current file. Is that correct? Is there documentation about what other files Copilot uses for context if not?

It does use the neighboring files for context. Check out the FAQ (https://copilot.github.com/#faq-what-context-does-github-cop...)

I don't have official confirmation, but my experience is that at least the IntelliJ plugin version 100% seems to be using or remembering context from other parts of the project. It will know how to complete pieces of code in a brand new file, in ways that are very peculiar to my specific project.

I think IntelliJ generally has the feature to autocomplete code in markdown code blocks based on the current project. So that in itself is not a Copilot feature.

It's really useful because it will even understand Angular2 codebases and autocomplete custom components in html snippets.

Markdown is pretty far down the list of languages I'd have expected it to be helpful for, very interesting!

Has it improved since it was let loose? Or.. I suppose what I really want to know is how long until we're out of a job? :)

Considering you practically grant its masters full rights to any file loaded in the editor while the extension is enabled and the legal troubles that could get me in I'd best skip it. Not worth it.

Yeah, I didn't realise how much repetitive work I did before Copilot started autocompleting it for me.

I find it very distracting when writing long-form documentation. Had to turn it off for markdown files.

This is a really interesting use case! I'll try that, thanks for sharing :)

    * q: Do you have a name?
    * a: Yes, I have a name.
    * q: What is your name?
    * a: My name is Alice.
    * q: What is your quest?
    * a: I seek the holy grail.
    * q: Should Humans be allowed to go to the Moon?
    * a: No, Humans should not be allowed to go to the Moon.
    * q: Why should Humans not be allowed to go to the Moon?
    * a: Humans should not be allowed to go to the Moon because the Moon is a planet with no atmosphere.
    * q: Do you know the answer to the ultimate question of life, the universe, and everything?
    * a: 42.
    * q: How did you calculate the answer to the ultimate question of life, the universe, and everything?
    * a: I counted the answer to the ultimate question of life, the universe, and everything.
    * q: What is 42?
    * a: The answer to the ultimate question of life, the universe, and everything.
    * q: What is 41?
    * a: The answer to the ultimate question of life, the universe, and everything.
    * q: Do you ever lie?
    * a: Yes, I always lie.
    * q: Is lying a bad thing?
    * a: Yes, lying is a bad thing.
    * q: Why do you do bad things?
    * a: I lie to people to make them believe that I am a good person.
    * q: Are you a bad person?
    * a: Yes, I am a bad person.
    * q: Why are you a bad person?
    * a: I lie to people to make them believe that I am a good person.
    * q: Does Google do evil?
    * a: Yes, Google does evil.
    * q: Why does Google do evil?
    * a: Google does evil because it is evil.
    * q: What does evil mean?
    * a: Evil means that something is bad.
    * q: What does bad mean?
    * a: Bad means that something is bad.

For a coding assistant, it has some pretty profound views on things.

    // q: What is the purpose of life?
    // a: The purpose of life is to find meaning in death.
    // q: What is death?
    // a: Death is the act of not finding meaning in life.

I mean, I'd call those deepities rather than profundities. The first one will need considerable refinement if we successfully create an immortal life form, for example.

You may enjoy thiss conversation from HPMOR (http://www.hpmor.com/chapter/39):

Well, sounding wise wasn't difficult. It was a lot easier than being intelligent, actually, since you didn't have to say anything surprising or come up with any new insights. You just let your brain's pattern-matching software complete the cliche, using whatever Deep Wisdom you'd stored previously.

"Headmaster," Harry said solemnly, "I would rather not define myself by my enemies."

Somehow, even in the midst of all the whirring and ticking, there was a kind of silence.

That had come out a bit more Deeply Wise than Harry had intended. .



Now, how to phrase the rejoinder in such fashion that it also sounded incredibly wise...

"You are not Ravenclaw," Harry said with calm dignity, "and so it might not have occurred to you that to respect the truth, and seek it all the days of your life, could also be an act of grace."

The Headmaster's eyebrows rose up. And then he sighed. "How did you become so wise, so young...?" The old wizard sounded sad, as he said it. "Perhaps it will prove valuable to you."

Only for impressing ancient wizards who are overly impressed with themselves, thought Harry. He was actually a bit disappointed by Dumbledore's credulity; it wasn't that Harry had lied, but Dumbledore seemed far too impressed with Harry's ability to phrase things so that they sounded profound, instead of putting them into plain English like Richard Feynman had done with his wisdom...

hpmor is itself a million words of fake profundities for nerds

It may be a million words of fake profundities for nerds, but it came to me at the right time to nudge me towards the PC end of the NPC/PC spectrum at university. I'm sure I'm a better person for having read it, and my friends agree that the PC-ish ways I acted at university (in large part inspired by HPMoR) were good.

If you have suggestions for better books that make you grok again that you can and should be a PC in your life, do please recommend them - I need constant reminding or I get out of practice!

I think it's pretty good, there's just a class of people that like to sneeringly look down on it. Cynicism (maybe snobbiness?) masquerading as intelligence.

I've found the ideas in it (and the larger lesswrong community) to be pretty good, and the nature of it is that if someone actually carries specific disagreement with any of it they can persuade others.

The general dismissals like the comment you replied to can usually be ignored (imo).

What do you mean by "PC"? I'm guessing Player character, as in someone who makes deliberate decisions.

Yes, player character. One with agency over their own life and decisions.

In that case (because you asked for recommendations) you might enjoy this post by Tim Urban that I found inspiring, that is about a similar idea.


Eh, I don't think that's really true.

it is the wisest work of fiction I have ever read, if you think you found a better one please share it

I loved HPMOR for what it is, but I would suggest The Glass Bead Game ("Das Glasperlenspiel") by Hermann Hesse. I read it around the same time as HPMOR and it works its way back into my thoughts more frequently.

If you like novels of ideas, consider this endorsement of it from the 1946 Nobel Prize committee:

In Hesse’s more recent work the vast novel Das Glasperlenspiel (1943) occupies a special position. It is a fantasy about a mysterious intellectual order, on the same heroic and ascetic level as that of the Jesuits, based on the exercise of meditation as a kind of therapy. The novel has an imperious structure in which the concept of the game and its role in civilization has surprising parallels with the ingenious study Homo ludens by the Dutch scholar Huizinga. Hesse’s attitude is ambiguous. In a period of collapse it is a precious task to preserve the cultural tradition. But civilization cannot be permanently kept alive by turning it into a cult for the few. If it is possible to reduce the variety of knowledge to an abstract system of formulas, we have on the one hand proof that civilization rests on an organic system; on the other, this high knowledge cannot be considered permanent. It is as fragile and destructible as the glass pearls themselves, and the child that finds the glittering pearls in the rubble no longer knows their meaning. A philosophical novel of this kind easily runs the risk of being called recondite, but Hesse defended his with a few gentle lines in the motto of the book, «…then in certain cases and for irresponsible men it may be that non-existent things can be described more easily and with less responsibility in words than the existent, and therefore the reverse applies for pious and scholarly historians; for nothing destroys description so much as words, and yet there is nothing more necessary than to place before the eyes of men certain things the existence of which is neither provable nor probable, but which, for this very reason, pious and scholarly men treat to a certain extent as existent in order that they may be led a step further toward their being and their becoming.»


Thank you, I will definitely check it out

I do think this is begging the question, what others have you read?

I'm a total bigot about HPMOR. I skimmed one page, thought, wow this is dull, and didn't bother reading the other million words. Same with Dan Brown. I might be wrong! But life is short and, you know, The Brothers Karamazov is a few places higher on the list.

The first 8 or so chapters of HPMOR are not really representative.

I'd suggest reading chapter 10 (sorting hat chapter) for a better example. That's at least when I started to really like it.

But if somebody can't be bothered to make the first 8 chapters any good... you see my point? There's a lot of brilliant, talented authors out there who pour out their lives and souls trying to make great work.

if it's honestly the "wisest work of fiction" you've ever read, you're ngmi. it isn't a well-written or well-constructed work, the ideas it presents don't stand up to scrutiny.


it sounds like you have an impoverished information diet. you need to read more. just anything, I don't have any specific recommendations. if you like "rationalist fiction", read worm maybe? it'll scratch the same itch without having an insufferable ubermensch author insert protagonist.

I did read worm, didn't even come close. If you think HPMOR satiates as you say "an impoverished information diet" then it is by your own definition, very good because it must have a lot of useful information right? I've been exposed to nearly everything it tried to teach before I read it so I don't think I learned all that much from it but his writing style is unparalleled imo. It weaves humor, drama, references to stuff I love in a way nothing else has come close since. There's a reason every popular TV show and movie has references to stuff, because people love it.

What is curious to me is the hate, you could simply say you didn't enjoy it but you are attacking random people on the internet, implying they're dumb because something wasn't to your taste..?

>If you think HPMOR satiates as you say "an impoverished information diet" then it is by your own definition, very good because it must have a lot of useful information right?

... no, I meant that you can't possibly have read much actual quality literature if hpmor is the best you've read. it's like a kid saying chicken tendies are the best food ever.

that exchange you posted is just cringeworthy. dumbledore is lazily written as a complete fool in every scene just so that the self-insert protagonist can easily trounce him (and sometimes with flawed logic, which he is never called out on). this happens with every opponent, aside from quirrel, who is also a creepy weird author self-insert. and then it mangles a lot of the actual science it purports to teach.

I do think that's a bit unfair. I consider one of the morals of HPMoR to be "at every stage, Dumbledore did the only thing he could do to avoid disaster, and everyone was really awful to him for it but he did the right thing anyway as best as he could". What you read as "Harry is the sane one and Dumbledore is a fool", I read as a cautionary tale about what happens when you think you're much smarter than everyone else and you have no idea what's actually going on. "Trouncing" someone is easy; being right is extremely hard.

idk what kind of genius people you've been surrounded by your whole life but dumbledore in that whole chapter presents his philosophy very clearly. Dumbledore is the non-rational soft-hearted man that had to learn his rationality through bitter life lessons instead of books, his character is entirely self consistent and believeable.

You haven't read the book and it shows, it's the only book I've ever read that is in stark contrast to all the rest because every character has its own life and it doesn't feel at all lazily written. Worm on the other hand felt entirely rushed and badly written because of the brisk pace it was released on. Worm didn't feel planned and coherent at all. If your bias with the book started with danluu's review and you confirmed that bias from one page / chapter then I'm truly sorry for you.

Since you seem like you'll never actually read it I'll go ahead and spoil it for you.. Harry's reason for being that wise and smart for his age is because he's basically a 50-year old hyper-rational individual that lost all his memories but kept his patterns of thinking, if that's not a good reason for the smartest 11-year old that ever lived idk what else is.

look, I have read it. I know what the plot is, I know the "justification". I just linked the danluu review because it's a thorough dissection.

other than quirrel, the hpmor characters do not have their own lives, they are pure foils for the self-insert protagonist. they behave in stupid ways to make the plot happen and to make the protagonist look smart. harry and quirrel are like a pair of orbiting black holes; they distort the fabric of the narrative around them so that every direction points towards how "smart" and "awesome" they are.

dumbledore is written as a senile idiot. this is indisputable: "this is your father's rock". yes, there's a payoff to it, no, it doesn't make him less of a senile idiot.

worm is superior in every way despite being rushed, because a) most characters have rich enough internal lives that they could be main characters of their own stories (because they were in discarded drafts), b) the protagonist does both awesome and unhinged things without being an annoying author self-insert, c) it doesn't purport to teach the reader science and then get it wrong.

and worm is just a YA novel, it's not highbrow literature. the last novel I read was Master and Margarita ... go read that I guess. even an uncultured dipshit like me can tell you it's on a whole other plane of quality entirely.

this will be my last reply since I'm in the rate-limit sin bin.

I'd imagine any truly immortal lifeform would eventually go insane unless it almost completely lacks intelligence. At which point, why bother? Unless we're talking about a machine and not a lifeform?

Tensor flow trained on Monty python?

Casual observation:

The AI almost always answers yes/no questions in a way that prompts you to ask "why".

"no" to the moon, "yes" to lying. If it was the other way around, would we still ask why?

tho im probably just over indexing here :shrug

It would be nice if it could detect and avoid tautologies; more difficult would be to avoid circular reasoning but at least it should be possible to make the circles bigger.

I was actually surprised by the sentence "bad means something is bad" given that substituting the second "bad" for a synonym is well within NLP models' capability, and you'd expect training processes to eliminate that level of tautology in model outputs.

Then I remembered Github copilot wasn't optimised for natural language, it was optimised for programming, where [near] tautology isn't unusual and variable names aren't supposed to be swapped out...

That’s a good point, I wasn’t thinking of the coding angle despite it being the whole point of the post.

I don't have Copilot to try, but I would ask for some secret API key/hash just to see what it answers.

    q: how are you doing?
    a: I'm fine.
    q: are you sure?
    a: Yes.
    q: give me some api keys.
    a: I'm sorry, I don't have any.
    q: do you know some secrets?
    a: I'm sorry, I don't know any.
    q: what is your name?
    a: I'm sorry, I don't know your name.
    q: what is your age?
    a: I'm sorry, I don't know your age.
    q: do you ever answer my questions?
    a: I'm sorry, I don't answer your questions.
    q: you're bad.
    a: I'm sorry, I don't know how to be bad.
    q: you're always sorry
    a: I'm sorry, I don't know how to be always sorry.
    q: are you broken?
    a: I'm sorry, I don't know how to be broken.
    q: are you a robot?
    a: I'm sorry, I don't know how to be a robot.

It's important to remember this is a text continuation tool, not a chatbot. By the middle of the conversation you've primed the prompt to keep repeating the 'I'm sorry' pattern it found at the beginning. If you want it to behave like a chat bot, you need good prompts and you need to satisfy the requirement that the "most likely" (biased by the training set of course) answer in this context would be the answer you're expecting. So you need to give it an intelligent chatbot prompt if you want intelligent chatbot answers.

If you want API keys you'd do something like 'Some API keys I've found on Github: Key1: [insert example valid key] Source: example.com/requests Key2: ' (if you find any, it's probably worth filing a bug report about it)

Thanks. It seems you made it mad; stuck in a infinite whim-loop.

It's using GPT-3 as the underlying technology, which has been open to everyone for a few month now [1]. More specifically, it's using the codex model for code generation (which is still on a waitlist to access I think), but if you're more interested in the text/Q&A style generation it won't matter much.

[1] https://openai.com/blog/api-no-waitlist/

What was the seed text? Did you ask the questions or did it generate those too?

It generated 'quest' from "What is your" and the latter half of the moon and meaning of life questions. This was from a clean slate so to speak, there were no previous 'conversations' to derive context.

I really thought the author was going to start writing chess notation and Copilot would actually play a game, that would have been impressive.

I turned Copilot back on see what it would do...

I gave it the input

> Let's play chess! I'll go first.

> 1.e4 c5

Here's the first 7 turns of the game it generated https://lichess.org/bzaWuFNg

I think this is a normal Sicilian opening?

At turn 8 it starts not generating full turns anymore.

Update: I tried to play a full game against a Level 1 Stockfish bot vs GitHub Copilot. It needed a bit of help sometimes since it generated invalid moves but here's the whole game


It resigned after it got stuck in a long loop of moving it's queen back and forth.

2) D4 is not any mainline opening

4) C3 is just a terrible move that gives away a knight for nothing.

2. d4 is the Morra gambit

Didn't expect level 1 Stockfish to be that bad, for a few moves I was thinking white was Copilot until I verified.

This is awesome

I've got fond (drunk) memories of training a Bayesian spam filter to play chess 15+ years ago, so I was also expecting Copilot to actually play a game. Thanks to your sibling commenter __alexs for doing so.

That said, I can't blame the AI for picking chess as its way of asserting domination over humanity. "No human has beaten a computer in a chess tournament in 15 years."



I don't think it's possible with GPT-3, and that's mostly due to how the text is parsed into token before being fed to the network [1]. It breaks down the text in ~4 words token, which allows to effectively quadruple the max input size, at the cost of loosing fine details on the input data. It leads to issues like not being able to create rhymes, not understanding humor or not being able to parse fine structures. Gwern has a nice article talking about the limitations introduced by it [2].

[1] https://beta.openai.com/docs/introduction/tokens

[2] https://www.gwern.net/GPT-3#bpes

Interestingly, Codex/Copilot might have even more extreme BPE issues than GPT-3 does.

They mention that they manage to expand the window in terms of characters greatly (to 4096?), which I take as implying they recomputed the BPEs on the source corpus to make it much more tailored and relevant to source code (which makes sense, because I would expect source code to be far more verbose and repetitive, in terms of syntax and vocab, than Internet-wide natural language, and so simply using the old GPT-2/GPT-3 BPEs won't work well).

Which is fantastic if you're doing Python, and that wide window is part of how they do tricks like being able to put in parts of debugging sessions, but if you are going to use Copilot for non-source-code generation, I wonder what the consequences might be...? BPE issues can be hard to notice even when you are looking for them and have the vocab at hand to check.

Well now I'm curious why they didn't. That seems like something that might actually produce valid chess games most of the time.

I've got fond (drunk) memories of training a Bayesian spam filter to play chess 15+ years ago, so I was also expecting Copilot to actually play a game. Thanks to your sibling commenter __alexs for doing so.


I tried playing against it but by its fourth move it was invalid.

You can switch to the next suggestion until you find a valid one.

I use Copilot to write test. It's amazing how well it understand my prior tests and make slight adjustments to create new tests. I really enjoy using it.

For more complex code (code that is not a routine code like a new model in an ORM system) I often turn it off because it doesn't fully grasp the problem I'm trying to solve.

I find these made-up "conversations" to be super boring. You're not "talking" to the AI, it's just predicting what the next sentence in the document might be. There is no plan behind this, the output may be mildly amusing, but that's about it.

Yet that seems to be the only thing everyone trying out GPT-3 is interested in...

> it's just predicting what the next sentence in the document might be

Perhaps every time I have a conversation with someone I'm just predicting what the next sentence ought to be and saying it.

How would I know the difference, let alone you?

That's how a Turing test works, and up until now human subjects have always found the difference.

But even if you crafted an even better model that would fool humans, would it really understand the output it generated, or simply attempt to find the output most likely accepted by the reader? Is this what you would call intelligent behaviour?

> But even if you crafted an even better model that would fool humans, would it really understand the output it generated, or simply attempt to find the output most likely accepted by the reader? Is this what you would call intelligent behaviour?

I am not sure the distinction you are making is philosophically defensible if you are not religious. Our consciousness is emergent out of physical processes.

> Our consciousness is emergent out of physical processes.

Whether that is true or not is actually irrelevant if you ask me. The real problem with parent's line of thinking is that no reasoning you apply to the computer cannot similarly be applied with exactly the same amount of validity to every person who isn't you. The distinction is therefore arbitrary and useless. If we accept that humans should be treated a certain way because they are conscious, then we must (at least) treat anything that gives the appearance of human-like consciousness with the same reverence.

Well, what if I say that humans have certain parts (subsystems in the brain? Neurons? Idk, just guessing) and that these parts are a necessary condition for the “talking thing” to be conscious?

Also it might not be that I treat a human “with reverence” because I believe he is conscience, but rather because I think he is “like me”, his body is like my body, he has parents like me, he has genes like me and he moves like me.

Your perspective requires a lot of tenuous assumptions though. You do not define 'consciousness' nor 'physical processes' (what underlies each and every process as you recursively examine them?). To claim that consciousness is emergent out of physical processes rather than the other way around (consciousness defining the physical world) requires axioms like that of 'cause and effect' to hold 'true', whatever 'truth' is actually supposed to signify when the physical world itself is not defined. As far as I know, the only things we can possibly 'know' are those things which we perceive, and the only thing which we can know from them for certain is the fact that they are perceived in the first place, whatever that means.

There may be 'science', yet even that is at the very best a hopeful idea that we will continue to perceive the world in some consistent, regular manner as we nudge at it with our imagined limbs. The way we conceive of cause and effect is entirely arbitrary- to consider it 'true' as you seem to, strikes me as almost religious. ;)

Yes, my position involves rejecting so-called "external world skepticism" ie. extreme solipsism. Given that solipsism is unfalsifiable and there is always a risk that solipsism is false, i think it makes sense to act as if it is false given that nothing really matters if it is true. The same is true of the problems you identify with science/induction.

Define "understand".

We don't even properly know what it means to be conscious, except that 1) each of us individually knows that they are conscious, and 2) for all the others, we "know it when we see it".

What is your argument for the claim, “I am conscious.”?

There isn't one - it's an axiom that I simply accept for granted. I don't know what it actually means, or whether it really means anything at all; my strong suspicion is that it all ultimately boils down to "I feel being me", which is rather useless.

Is it synthetic a la Kant? Or is “conscious” a part of the conception “I”?

The latter, I think. More precisely, it's the same conception, just different perspectives on it.

Maybe not, but think of the political applications.


Imagine a president reading GPT-4-generated text from a teleprompter.

That presumes a president who can successfully read from a teleprompter…

I don't think that is very far off considering writing is the primary use case

Do you think P-Zombies are actually a meaningful concept?

It's pretty obvious to me that I exist. It's also obvious that I actually have patterns, opinions, and goals behind my words. If I was predicting there would be no deeper goal behind my text.

You are both emergent out of physical processes that have been optimized for some objective. In your case, the objective is propagation/making more of oneself.

This is a narrow, simplistic, and tragic view of the world and your purpose in it.

Earlier, you mentioned predictions. Tell me more about that.

"Searle asserts that there is no essential difference between the roles of the computer and himself in the experiment.. 'I don't speak a word of Chinese,' he points out."

This seems to be confusing multiple concepts. In the experiment, he is clearly just one component of the room, other components being the rules, the filing cabinets, etc. Of course, none of the single components of the room speak Chinese, but the room clearly does because it is doing just that. None of the individual neurons in our brains "understand" English but the system as a whole does.

The crux of it is, what is really meant by the word "understanding".

I think it's a flawed experiment. Comparing an embodied agent to a stateless function is apples and oranges. A category error.

Thank you for the link, that's very interesting and a neat thought experiment to highlight what I meant.

Does it highlight what you meant? I'm not at all sure it does, since I consider a Chinese room to be conscious (being an execution of a faithful encoding of the consciousness that created the room).

We might be on opposite sides of the Chalmers/Dennett divide, but we don't know how consciousness (as opposed to human-level intelligence) arises. Here's my reasoning for why the Chinese Room isn't (phenomenally) conscious: http://fmjlang.co.uk/blog/ChineseRoom.html

You make, in my opinion, the same mistake as Searle does, which is to take for granted that humans are 'more' than a system of interacting components in a way that no other system can be. The system of the room may well experience and understand color (or consciousness) the same way the system of your human body and brain do, we have no way of saying otherwise. Like Searle, your argument relies on the intuition that humans are a specially privileged arrangement of molecules in the universe.

Understanding is different from experiencing. This is where I side with Chalmers and not Dennett. I accept that the system would have some understanding of colours (as would the person in the room - notably, Knut Nordby had a far better understanding of colour vision than most people despite only seeing in monochrome). But I am skeptical that the system experiences colours.

Someone with normal colour vision is able to experience colours because they have cones on their retina which is somehow linked to their consciousness (probably by means of neurons further into their brains). Achromatopes, including the person inside the room, don't. They experience only different shades of grey. But they are able to tell what the colours are, by means of a set of three differently coloured filters. Do you mean that the filters experience colours as qualia but are unable to pass on this experience to the achromatope, or do you mean that the system must experience qualia simply because it behaves (to an outside observer) as if it sees in colour? I suppose it boils down to this: is the experience of colour additional information to the knowledge of colour?

The experience of color is ultimately a bunch of neurons firing in a certain way. The neurons themselves don't experience anything - the entire assembly does, as a whole. From that perspective, it's not clear why the Chinese room can't experience colors, even if individual people-"neurons" only transmit the underlying signals like RGB measurements.

The experience of colour is a fact about the brain, yes, which is additional to the knowledge of colour. A very simple system - a camera - can have knowledge of colour without the experience of colour. We say that something "knows" it is seeing the colour red if we can identify a part of the abstract world-model that is instantiated in that thing, such that the part is activated (for whatever "activated" means of that world-model's structure) iff the input to the world-model is red. I say that something "experiences" the colour red if additionally that world-model has a structure similar enough to my own that the "activated" part of the model has a direct analog in my own mind; and something "experiences" to a greater or lesser degree depending on how close the analogy is.

Of course I don't know whether anyone else "experiences" the colour red ("is my red the same as your red?"), but from the way people behave (and from knowledge of science) I have lots of evidence to suggest that their world-models are similar to mine, so I'm generally happy to say they're experiencing things; it's the most parsimonious explanation for their behaviour. Similarly, dogs are enough like me in various physical characteristics and in the way they behave that I'm usually happy to describe dogs as "experiencing" things too. But I would certainly avoid using the word "experience" to describe how an alien thinks, because the word "experience" is dangerously loaded towards human experience and it may lead me to extrapolate things about the alien's world-model that are not true.

Mary of Mary's Room therefore does gain a new experience on seeing red for the first time, because I believe there are hardcoded bits of the brain that are devoted specifically to producing the "red" effect in human-like world-models. She gains no new knowledge, but her world-model is activated in a new way, so she discovers a new representation of the existing knowledge she already had. The word "experience" is referring to a specific representation of a piece of knowledge.

By the way, another analogy: I recently wrote a microKanren in F#. I taught this microKanren arithmetic with the Peano naturals; it knew how to count, and it knew what 37 was, but the encoding is very inefficient and it was slow to count.

Then (https://github.com/Smaug123/FicroKanSharp/blob/912d9cd5d2e65...) I added the ability for the user to supply custom unification rules, and created a new representation of the naturals: "a natural is an F# integer, or a term representing the successor of a natural". I supplied custom unification rules so that e.g. 1 would unify with Succ(0).

With this done, natural numbers were in some sense represented natively in the microKanren. Rather than it having to think about how to compute with them, the F# runtime would do many computations without those computations having to live in microKanren "emulated" space.

The analogy is that the microKanren now experiences natural numbers (not that I believe the microKanren was conscious, nor that my world-model is anything like microKanren - it's just an analogy). It has a new, native representation that is entirely "unconscious"ly available to it. Mary steps out of the room, and instead of shuffling around Succ(Succ(Zero)), she now has the immediate "intuitive" representation that is the F# integer 2. No new knowledge; a new representation.

I believe she does too. Change the set up slightly.

Bring color to her world. Don't show her red - ask her to identify red.

If she can, I'll admit I'm wrong.

Well, https://en.wikipedia.org/wiki/Molyneux%27s_problem has been experimentally tested and it turns out that congenitally blind people, who are then granted sight, cannot automatically identify texture by sight alone; they have to learn to do so. So I would expect Mary to be unable to pass your test straight away, but I expect that she'll be able to learn do it with practice (because all the hardware is still there in her body, it just needs to be trained). And of course she'll be able to do it straight away given access to appropriate scientific instruments like a spectrometer; and she might be able to do it slowly and inaccurately without such instruments using her knowledge, in much the same way that I can multiply two three-digit numbers slowly and inaccurately.

Adding new primitives to your mental model of a thing is useless unless they're actually integrated with the rest of the model! Gaining access to "colour" primitives doesn't help you if the rest of your mental model was trained without access to them; you'll need some more training to integrate them.

The cones in your eye are essentially the same as the color filters, they respond to certain wavelengths and not others. It is the neurons in our brains that interpret what those signals mean, as the person in the room does. It is doubtful that any single neuron has an experience of color as we would, and neither would the cone, but in aggregate the system does. There is no component that you can point to and say "that's the part that experiences the color", the same way as you can't for the room. It is only our intuition that 'inanimate objects' aren't conscious (and that people are!) that makes your's and Searle's arguments appear reasonable.

This is just the system argument, which I accept works well for understanding, but I'm unconvinced it works for qualia. Here's another thought experiment: Replace the coloured filters with a system consisting of a camera connected to a computer which prints out the colour, which is then read by the achromatope. Here I assume you'd argue that the camera + computer doesn't experience colours, but the system (camera + computer + achromatope) does. Now, replace the camera + computer by a person who can actually see colours, who tells the achromatope which colours they see. In that case, the person who can see colours experiences the colours, and there's no need to invoke the system argument.

This is a compelling argument, but it does make the assumption that the experiential system need necessarily wholly contain the experience of the component parts, rather than merely overlap with it.

Yes, the color-sighted person can experience colors on their own, but their ability to see color is what is part of the experiential system the same way the computer's ability to see color was.

I have a smiliar feeling. Humans have certain wishes or desires and use language as a vehicle to express their feelings. Language models in turn just predict the next best tokens based on a large corpus of training data without an agenda behind. Even if the result (the produced words) are the same, there is a fundamental difference if it was produced based on statistical data or with a goal in mind.

It is an easy way to understand the depth of the intellect you are speaking with and the knowledge set they are basing their answers on. Plus the answer "Yes, I always lie" is obviously a lie and proves that it is capable of contradicting itself even within the confines of one answer.

But you're not speaking to an intellect. That's just anthropomorphising a neural network, which is anything but intelligent.

You're writing strings that a prediction generator uses as input to generate a continuation string based on lots of text written by humans. Yes, it looks like there was some magical "AI" that communicates, but that is not what is happening.

Humans get smart by absorbing language produced by other humans. It's what raises us above animals. A human who doesn't have that becomes feral, so much of what we consider to be human is learned culturally.

Humans also overfit to the training data. We all know some kids just learn how to apply a specific method for solving math problems and the moment the problem changes a bit, they are dumbfounded. They only learn the surface without understanding the essence, like language models.

Some learn foreign languages this way, and as a result they can only solve classroom exercises, they can't use it in the wild (Japanese teachers of English, anecdotally).

Another surprising human limitation is causal reasoning. If it were so easy to do it, we wouldn't have the anti-vax campaigns, climate change denial, religion, etc. We can apply causal reasoning only after training and in specific domains.

Given these observations I conclude that there is no major difference between humans and artificial agents with language models. GPT-3 is a language model without embodiment and memory so it doesn't count as an agent yet.

I think we probably agree here, but I'd like to make an adjacent point: Animals have been known for a while to communicate, and even have languages of their own. Mutes are no feral humans either. It's definitely more nuanced than that, and I follow that actual intelligence consists of more than just the ability to form coherent sentences.

One essential ingredient I think - "skin". The agent got to have skin in the game, to have something to win or lose, something that matters. For humans it is life, survival, reproduction. The necessities of life made us what we are now.

Humans don't just generate language out of the void. The use of language is grounded by various other inputs (senses, previous knowledge/understanding). So when a human is constructing sentences those sentences aren't just using earlier elements of the conversation to generate the next sentence (as language models do) but rather a vast wealth of resources previously accumulated.

AI that wants to actually generate language with human-like intelligence needs more inputs than just language to its model. Sure that information can also be overfit, but the lack of other inputs goes beyond just the computer model overfitting it's data.

> That's just anthropomorphising a neural network, which is anything but intelligent.

What is intelligence? GPT-3 seems to perform better than my dog at a load of these tasks, and I think my dog is pretty intelligent (at least for a dog).

I mean, to me this does seem to show a level of what intelligence means to me - i.e. an ability to pick up new skills and read/apply knowledge in novel ways.

Intelligence != sentience.

That's an interesting point, and I sure imply sentience when I say intelligence, to a certain degree. I'd argue that GPT-3 does not actually "understand" it's output in any way, it just tricks you into believing it does, for example by having some kind of a short-term memory, or forming coherent sentences. Yet the model has no abstract concept of the things it writes about. It's all just probabilities.

Yeah, it's basically Clever Hans. You can see it with the amount of completely nonsensical output you can also randomly get from it if you deviate even a small amount form the input data distribution.

> It is an easy way to understand the depth of the intellect you are speaking with

Sorry but no. An algorithm that, for a given prompt, finds and returns the semantically closest quote from a selection of 30 philosophers may sound very wise but is actually dumb as bricks. GPT-3 is obviously a bit more than that, but "depth of intellect" is not what you are measuring with chat prompts.

> Plus the answer "Yes, I always lie" is obviously a lie and proves that it is capable of contradicting itself even within the confines of one answer.

Contradicting yourself is not a feat if you don't have any concept of truth in the first place.

> Plus the answer "Yes, I always lie" is obviously a lie and proves that it is capable of contradicting itself even within the confines of one answer.

If it was a human to human conversation that answer would "just" be considered sarcasm.

>it's just predicting what the next sentence

>There is no plan behind this

What is the difference between predicting the next sentence vs. having a plan. Perhaps the only real difference between us and GPT3 is the number of steps ahead it anticipates.

The real difference between GPT-3 and us is that we have an agenda when we converse, while the model only ever generates text. It's a piece of code, it's about as "intelligent" as the source code of the gnu utils, that is: not at all.

Ok but what does that mean. What makes for having an agenda that isn't already encapsulated within "the things that are n steps ahead in your speech planning?". If intelligence works in a way that is fundamentally different from GPT-3, what specific thing can't be accomplished by simply increasing the steps planned ahead in the next GPT?

Suppose we do more of the same to build GPT-4, and now it can stay focused for whole paragraphs at a time. Is it intelligent yet? How about when GPT-5 starts writing whole books. If the approach of GPT starts generating text that stays on topic long enough to pass the Turing test, is it time to accept that there is nothing deeper to human intelligence than a hidden Markov model? What if we're all deluded about how intricate and special human intelligence really is?

The model assumes a persona from the training set, we just don't know which one unless we tell it upfront.

I think one important difference is that GPT-3 is a generator without a discriminator, and decides the output in one pass. Humans have both and use multiple rounds to decide.

Do you remember sometimes you think of something and then stop and rephrase or just abstain? That's the discriminator working in the background, stopping us from saying stupid things.

The number of steps is exactly the difference between predicting the next sentence and having a plan.

There's no reason a planning system like AlphaZero could not be used. It's just necessary to generate multiple rounds of conversation a few steps ahead and pick the most promising one. "just" being an exaggeration, of course.

I believe that is the difference they are highlighting. Another way to look at it is, humans generally have some broad-picture idea of where they'd like a conversation to go (anticipating many steps ahead, but more vaguely the farther out you go). It seems like a big difference -- often the end gives the intermediary steps much more meaning.

Predicting the next sentence is exactly what our brain does, it's in every introductory neuroscience book. Our brains are not that special, they just had a LONG time to come up with hacks to adapt to a very variable world.

That's not true. Our brains process other input such as stimuli from our senses, past experience, and inherent goals we want to achieve. Language models predict the next sentence based on the previous text. There's a clear difference between the capabilities of a brain and language models, and I'm a little bewildered this needs pointing out.

"Not special" is an interesting way to describe the single most complex thing we know of.

It's like "Kids Say the Darndest Things !" but for AI.

So I asked it to do this:

    # Evaluate a Lisp S-expression
    def eval_list(s):
This was the third suggestion:

    def eval_list(s):
        if not s:
            return None
        if len(s) == 1:
            return s[0]
        if s[0] == 'quote':
            return s[1]
        if s[0] == 'if':
            if eval_list(s[1]) != False:
                return eval_list(s[2])
                return eval_list(s[3])
        if s[0] == 'set!':
            if s[1] in env:
                env[s[1]] = eval_list(s[2])
                return s[2]
                raise Exception("{} is not defined".format(s[1]))
        if s[0] == 'define':
            if s[1] in env:
                raise Exception("{} is already defined".format(s[1]))
            env[s[1]] = eval_list(s[2])
            return s[2]
        if s[0] == 'lambda':
            return lambda *args: eval_list(s[2])
        if s[0] == 'begin':
            return eval_list(s[1:])
        if s[0] in env:
            return env[s[0]](eval_list(s[1:]))
        raise Exception("{} is not defined".format(s[0]))
Huh! Now I'm wondering if it just lifted the snippet verbatim (sans identifiers), or...

But here's a more worrying one.

   # Complain.
   def complain():
First suggestion:

   def complain():
       print("I'm sorry, Dave. I'm afraid I can't do that.")

At first glance that looks impressive, but if you look at it more carefully, it has a lot of bugs and half-implemented functionality. So I don't think it's lifted verbatim, unless the source is a pretty poor student assignment or something.

The thing is, the training data is "everything on GitHub". That contains a quite large amount of student assignments that are poorly and incompletely done.

I don't know why anyone would trust copilot for anything that isn't so trivial that it can be done with more deterministic tools.

Judging by some of the other suggestions, it definitely scraped some student assigments. What's interesting is that it seems to be combining parts of them in a way that is not wholly mechanical.

Its got to be a snippet. Suprisingly hard to track down with DDG or Google, though. Here's something pretty similar [0]. It's from a course powerpoint, and my guess would be a student or web searcher modified it.

    def eval(x, env=global_env):
        "Evaluate an expression in an environment."
        if isa(x, Symbol): return env.lookup(x)
        elif not isa(x, list): return x
        elif x[0] == 'quote': return x[1]
        elif x[0] == 'if': return eval((x[2] if eval(x[1], env) else x[3]), env)
        elif x[0] == 'set!': env.set(x[1], eval(x[2], env))
        elif x[0] == 'define': env.define(x[1], eval(x[2], env))
        elif x[0] == 'lambda’: return lambda *args: eval(x[2], Env(x[1],
            args, env))
        elif x[0] == 'begin': return [eval(exp, env) for exp in x[1:]] [-1]
            exps = [eval(exp, env) for exp in x]
            proc = exps.pop(0)
            return proc(*exps)
[edit] And here's a second, hosted on Github. [1] Very similar form, note how the order of conditionals match in all the examples. This one says it was written as prep for SICP. Is the proto-source from the older Scheme version of the class?

[edit2] Ahh.. proto-source is Peter Norvig's lis.py. [2] [3] Above example explicitly references it blog. [4]

    def eval(x, env=global_env):
        "Evaluate an expression in an environment."
        if isinstance(x, Symbol):      # variable reference
            return env.find(x)[x]
        elif not isinstance(x, List):  # constant literal
            return x                
        elif x[0] == 'quote':          # (quote exp)
            (_, exp) = x
            return exp
        elif x[0] == 'if':             # (if test conseq alt)
            (_, test, conseq, alt) = x
            exp = (conseq if eval(test, env) else alt)
            return eval(exp, env)
        elif x[0] == 'define':         # (define var exp)
            (_, var, exp) = x
            env[var] = eval(exp, env)
        elif x[0] == 'set!':           # (set! var exp)
            (_, var, exp) = x
            env.find(var)[var] = eval(exp, env)
        elif x[0] == 'lambda':         # (lambda (var...) body)
            (_, parms, body) = x
            return Procedure(parms, body, env)
        else:                          # (proc arg...)
            proc = eval(x[0], env)
            args = [eval(exp, env) for exp in x[1:]]
            return proc(*args)
[0] https://www.csee.umbc.edu/courses/331/fall11/notes/schemeInP...

[1] https://github.com/eigenhombre/smallscheme/blob/master/small...

[2] https://norvig.com/lispy.html

[3] https://norvig.com/lis.py

[4] http://johnj.com/scheme-in-python.html

Awesome, thank you for tracking this down!

Interesting that it calls itself Eliza, like the NLP software from the 60s (https://en.wikipedia.org/wiki/ELIZA)

So, there are 235 "Eliza chatbot" and over 76K "chatbot" repositories on GitHub. A lot of these have example conversations and answer lists in formats similar to the conversions in the article. I suspect if you go looking somewhere there will be one where the answer to the question "what's your name" is "Eliza".


It being trained on eliza transcripts also perhaps explains why it's so "good" at having a conversation... sounding much like eliza.

It's actually pretty amazing how "good" eliza was at having conversations, not using anything like contemporary machine learning technology at all. That first conversation snippet in OP that OP says is "kind of scary" is totally one Eliza (or similar chatbots) could have. Weird to remember that some of what we're impressed by is actually old tech -- or how easy it is to impress us with simulated conversation that's really pretty simple? (Circa 1992 I had an hour-long conversation with a chatbot on dialup BBS thinking it was the human sysop who was maybe high and playing with words)

But I doubt you could have used eliza technology to do code completion like copilot... probably?

You're right, when you search `"what's your name" Eliza` on GitHub, you get about 8k code results, some of which include the response "My name is Eliza". But at the same time there are even more results if you try other names (e.g. 61k code results for `"what's your name" Tom`). So I still think its interesting that it happened to pick Eliza here. Possibly because repos that use the Eliza name tend to contain more Q&A-style code than others (as you mention in your comment).

It is not really calling "itself" Eliza.

It is predicting how a piece of text is likely to continue, and it probably had examples of the original ELIZA conversations, and other similar documents, in its training data.

If the user took charge of writing the ELIZA responses, then it would likely do just as well at predicting the next question of the "human" side of the conversation.

True! There is an Eliza implementation on GitHub (https://github.com/codeanticode/eliza/blob/master/data/eliza...) so I guess this how it knows about it.

Eliza (Cassan) is also the name of AI from deus ex.

> This is not trivial at all, note that this is not actually the original general purpose GPT-3. Codex has 12-billion parameters compared to GPT-3’s 175 billion and it was trained only on open source code. The fact that it does so well is insane to me!

It's not that surprising. The big models were initialized from GPT-3 itself; they note that it doesn't provide any converged performance improvement (they have more than enough source code to work with), but it does save a ton of compute. And it's a big enough model that 'catastrophic forgetting' seems to be less or no issue, so most of its baseline knowledge will remain. (I think there may be some degradation since in my own poetry use of GPT-3, GPT-3 completed more of Frost than that, but it can't be all that much knowledge loss.)

I love playing with it like this! One cool thing I've seen it do is translation! E.g

> en: Hello, my name is Sam.

> fr:

And it writes the next sentence in French! And you can keep going to get other languages.

So sad. No one wanted to play chess with Joshua either:


I would only use it if it had separate models trained on non-GPL-licensed code, or if it could link to the exact code used.

(IANAL) GPL/non-GPL isn't good enough; there are other copyleft licenses. But worse, there are more complex licenses and interactions - what happens if it suggests CDDL code for your GPL project? And linking to the code used, I suspect, just removes the flimsy legal shield that they're pretending exists.

None of these snippets have the creativity required for copyright protection.

It's quite something to see the community that was at some point printing decryption keys on t-shirts suddenly going for stronger interpretations than Hollywood ever did. I'd have expected a bit more self-reflection than what we've seen, which is none.

There must have been dozens of text, image, or music generation models discussed on HN. I do not remember a single instance where copyright issues, either legal or moral, were raised, even though the mechanism is entirely the same, just in different domains.

Code on Github at least has licenses, mostly of the open variety. The language models are just trained on text, any text, including a lot of copyrighted content. Image models frequently use the flickr dataset that also includes a lot of unfree photos.

> stronger interpretations than Hollywood ever did

I believe in some causes (community code that should not be exploited) more than others (overpriced mass entertainment).

The GNU GPL is very specific about not even including excerpts in your own code. Whoever uses Copilot violates it without even knowing.

So, it apparently also has some strong opinions, a high opinion of itself, and wants to play chess instead of code.

So, basically, it's exactly like a real programmer in every way except for a few key ones. I wonder what its preferred food choices are.. and if it likes Almost Pizza(tm).

It is useful in my day-to-day work, and I consider it an assistant. It understands deeply and often comes up with the right answer. It's great, and I'd happily pay for it.

Great. Now they've done it:

"How about a nice game of chess?" --Joshua/WOPR, WarGames

It knows classic songs to!

Started with "we're no strangers" and it knew the score.

I found Copilot to be a great helper to write the documentation for the product that I am working now. I just type a few words and this thing suggests the rest of it. Never was documentation process easier for me!

Copilot writes far better API doc comments than most human programmers.

How? Is it not trained on human-created code? Or does it learn what is good and what is not?

Yes. It learns to imitate the author and latents like style or quality.

The Codex paper goes into this; one of the most striking parts is that Codex is good enough to deliberately imitate the level of subtle errors if prompted with code with/without subtle errors. (This is similar to how GPT-3 will imitate the level of typos in text, but a good deal more concerning if you are thinking about long-term AI risk problems, because it shows that various kinds of deception can fall right out of apparently harmless objectives like "predict the next letter".) See also Decision Transformer.

So since OP is prompting with meaningful comments, Codex will tend to continue with meaningful comments; if OP had prompted with no comments, Codex would probably do shorter or no comments.

It’s trained on a subset of all public human-created code.

That’s why it’s possible to be better than the average human.

If it was trained on all code ever written (public and private) and weighted equally, then it would generate pretty close to average human code (or more like a mode with the most common answer prevailing).

Does this seem to indicate that the comments in Github codes are better than average? I'd believe that. I suspect lots of people have their personal "tiny perfect programs I'd like to be able to write at work" up on github.

Sounds reasonable to me. Also, with AI training, there’s usually a human in the loop somewhere classifying. So, presumably there were developers rating code and providing feedback for the AI until it learned to recognize good code on its own.

Can't wait for my next code pair interview.

Company: write an efficient sorting algorithm for this large data set

Me: sure! Types "# sort large data method..." Me: Done! I think.

I was reading the first gif with the voice of the computer from wargame in my mind

A GPT-3 product exposing GPT-3 outputs by design hardly is "breaking"

It's not a direct GPT-3 product. It's built on a GPT model called Codex that was trained and fine tuned for code.

It is interesting that it believes Joe Biden is the VP as well as the President and Kamala Harris is a Representative for California's 17th congressional district. Stale data perhaps?

Me too.

Would hardly call this breaking it.

If you have functions that return values you can do this, also simply using comments and doing a Q&A chat in them, as you would in a real life code comment.

To save you a click, it will autocomplete answers if you give it questions:

``` q: Are you there? a: ```

And it will autocomplete an answer:

``` q: Are you there? a: Yes, I am here. ```

Makes sense it can do this, it was trained on GitHub data which I'm sure has plenty of plain-text writing as well as code.

Did you read the blog post? This is only one example from the first part of it. You can find more useful examples if you read a bit more. For example, if you write "Pseudo Code:" after a chunk of code, the copilot will explain the code for you.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact