Hacker News new | past | comments | ask | show | jobs | submit login
Using GPT-3 to explain how code works (simonwillison.net)
385 points by simonw on July 9, 2022 | hide | past | favorite | 147 comments



For the past few weeks, I've been collecting and refining prompting tricks for getting GPT-3 to emit and understand various types of non-prose output. Many of these are discussed in the Twitter thread the OP links to, but I'd like to highlight a few of my favorites here too.

- GPT-3 can produce simple SVG diagrams (https://twitter.com/goodside/status/1545883023719239689) from an English description, or incrementally modify a given SVG diagram (https://twitter.com/goodside/status/1545888869207400454).

- GPT-3 can produce diagrams using Graphviz dot notation (https://twitter.com/goodside/status/1545109709035900928), including diagrams that incorporate real-world context (https://twitter.com/goodside/status/1545112039147847682) not given in the prompt.

- It can emit and understand HTML and Markdown, including table markup — e.g., "Produce an HTML table giving the members of the Beatles and their instruments"

- It can accept/emit multiple named files, with arbitrary content, simply by embedding those files as values in a larger JSON/YAML document. Here, (https://twitter.com/goodside/status/1545848754883919872) I demonstrate completing the file tree of a trivial Python package given a README.md.

- It can understand the raw JSON file contents of .ipynb files, successfully following directions given to it in markdown cells, emitting multiple code cells, etc.

While all this is fun, as many people have pointed out the model has a bad habit of "hallucinating" — producing highly specific, wrong output. Avoiding this (https://twitter.com/goodside/status/1545793388871651330) is more of an art than a science. Be highly skeptical of any output you get from GPT-3, and search for verification methods where you can.


This is very cool.

Two uses come to mind: 1. Alternative variable name suggestions. I am writing code, and I try to figure out what to name a variable that about a data frame which is my performance data which takes the maximum of random samples and batch size leaving data size and thread count, and all I can come up with is unbatched_speed or something. I can think of a lot of scenarios where a context sensitive thesaurus / autocomplete would help. If it's obvious to an AI, maybe it'll be obvious to a junior dev?

2. I expect there are a lot of cases where not technical people could understand code, but get intimidated because it looks like code. An AI to say what the code for the business logic that they care about, but using prose.


#1 is feasible now for short (<4K chars or so) blocks of code. Just paste the code in an indented block as you would in a comment, then after the block ask in English for a list of suggestions for what the variable could be renamed to.


2. Is a great idea! I bet that could be useful.

Maybe a way to do the same with formulas. I always get hung up on the odd notation and the symbols that aren’t defined anywhere.


2 sounds like auto-generated long-form documentation? That would be a major win for devs and non-devs alike.


I tried writing a tutorial with GPT-3 a few months ago with quite impressive results: https://simonwillison.net/2022/May/31/a-datasette-tutorial-w...


"For the past few weeks, I've been collecting and refining prompting tricks for getting GPT-3 to emit and understand various types of non-prose output"

I'd like to see it get good at continuing poetry.. It's pretty awful at that, from my own experiments.


It seems to understand a fair bit about themes and rhythm but it just cannot do rhyme. Something to do with the token encoding.


I enjoyed this one enough that I’ve committed the sonnet to memory: https://mobile.twitter.com/xkcd/status/1513922269122281473


Has anybody tried code reviews? As in "GPT-3, please do a code review of this code".


For small examples, yes, it works:

Prompt:

  Perform a code review of the following module, identifying any bugs:
  ```
  #!/usr/bin/env python3
  def fib(n):
    """Return n-th Fibonacci number."""
    if n <= 2:
      return 1
    else:
      return fib(n-2) + fib(n+1)
  ```
GPT-3 completion:

  The bug in this module is that the n+1 parameter in the return statement should be n-1.
But don't expect miracles on real-world files, because GPT-3 struggles with long-term dependencies in text (e.g., remembering what happened ~4000 characters ago).


Seems to work for a factorial function also

   Perform a code review of the following module, identifying any bugs:
  ```
  #!/usr/bin/env python3
def factorial(x): """This is a recursive function to find the factorial of an integer"""

    if x == 1:
        return 1
    else:
        # recursive call to the function
        return (x * factorial(x+1))
  ```
A: The bug in this code is that the function is attempting to find the factorial of an integer by recursively calling itself with an incremented value. This will cause an infinite loop.

But if you unindent and then also perform the n+1 error it only will see the unindent issue.

  Perform a code review of the following module, identifying any bugs:
  ```
  #!/usr/bin/env python3
def factorial(x): """This is a recursive function to find the factorial of an integer"""

    if x == 1:
        return 1
    else:
        # recursive call to the function
return (x * factorial(x+1)) ``` A: The code will not run because there is a syntax error on line 6. The indentation is incorrect. The return statement should not be indented.


This is a fascinating use case that wouldn't have occurred to me at all, quite cool, even if it "doesn't know anything about anything at all".

It even does pretty good with Verilog, which I assumed it would have issues with, admittedly some of this is just "google-able", but even so. It can even create a reasonable attempt at a testbench (even though it doesn't quite get it right), which is crazy to me.

*Explain this code:

module my_module(a,c,y); input a,c; output y; assign y= c ? a : 1'bz; endmodule*

This code is a Verilog module that defines three inputs (a, c, y) and one output (y). The output y is assigned to the value of a if c is true, otherwise it is assigned to 1'bz.

*What is 1'bz ?*

1'bz is a Verilog constant that represents a bit value of 'z', which is used to indicate a high impedance state.

*Why is high impedance used ?*

High impedance is used in digital circuits to allow multiple drivers to share a single bus without contention. When two or more drivers attempt to drive the same bus at the same time, a contention can occur and the bus voltage can become undefined. By using high impedance, drivers can avoid contention and allow the bus to remain stable.

*Give me a testbench for this module*

module testbench; reg a; reg c; wire y;

  my_module dut(a, c, y);
  
  initial begin
    a = 1'b0;
    c = 1'b0;
    #1;
    a = 1'b1;
    c = 1'b1;
    #1;
    a = 1'bz;
    c = 1'b1;
    #1;
    a = 1'bx;
    c = 1'b1;
    #1;
    $stop;
  end
endmodule

*Create the same testbench but add validation of the output.*

module testbench; reg a; reg c; wire y;

  my_module dut(a, c, y);
  
  initial begin
    a = 1'b0;
    c = 1'b0;
    #1;
    a = 1'b1;
    c = 1'b1;
    #1;
    a = 1'bz;
    c = 1'b1;
    #1;
    a = 1'bx;
    c = 1'b1;
    #1;
    $stop;
  end
  
  always @(*) begin
    if (c == 1'b0)
      assert(y == 1'b0);
    else if (c == 1'b1)
      assert(y == a);
    else if (c == 1'bz)
      assert(y == 1'bz);
    else if (c == 1'bx)
      assert(y == 1'bx);
  end 
endmodule


Wow. I cannot fathom how it can work this well, though I think the validation is incorrect, the first assert shoukd check for z and not 0. Still, it's scary to be honest.


This is frighteningly impressive


This is already starting to feel like the Star Trek voice computer interface. You ask it questions, follow-up questions about anything.


> The (?xm) notation is a modifier that allows the regular expression to span multiple lines and to use comments.

This is incorrect.

(?xm) enables the VERBOSE (X) and MULTILINE (M) flags.

From the docstring:

  M  MULTILINE   "^" matches the beginning of lines (after a newline)
                  as well as the string.
                  "$" matches the end of lines (before a newline) as well
                  as the end of the string.

  X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.


The scary part about good AI is not knowing how good it is, i.e. not being able to trust it because you don't know when it'll be wrong (nor does the AI itself know).


Same could be true of human programmers though. Except those on HN, of course ;)


The X flag also allows the written regex to span multiple lines, i.e. it counts newlines as whitespace. Interesting confusion there.


Hah yeah, good catch - I'd missed that incorrect "allows it to span multiple lines" statement.


I've been doing a lot of experiments with gpt-3 on twitter.

Works with mermaid js

https://twitter.com/Bsunter/status/1535062559069376512

Great for making study guides

https://twitter.com/Bsunter/status/1535730704138444801

I also wrote a plugin to interact with gpt3 in the logseq text editor and some examples here.

https://github.com/briansunter/logseq-plugin-gpt3-openai

I have a few posts on my blog

https://briansunter.com/blog/gtp3-openai-logseq-notetaking/

https://briansunter.com/graph/#/page/how%20to%20use%20gpt3%2...


Self-plug: I built Stenography(https://stenography.dev/) that does this for any codebase using smart parsing. The VSC code extension automatically writes code documentation


This looks cool, but I’d be more tempted to sign up if I could see examples of the documentation it can produce unassisted from an example codebase. My biggest concern is that the documentation would lack global context in a large file tree, and only be suitable for explaining small, self-contained files. The time investment to see if this is true or not is a bit too high as it is, so anything you can do to showcase the quality up-front might win you more signups.


Man, you're cover of redbone made me want to try it out haha


lmaooo, thank you XD


A NN is just a fancy search engine: the weights are a compression of its training data; it's prediction algorithm is a decompression+search. These "large models" are effective because the weights run to 100s GBs, so compressing TBs of data for searching.

In any case, my question here is: how do you not go mad in this techno-religious environment where people (seemingly desperately and earnestly) equivocate correlations in written text with thinking?

I feel at once overwhelmed, despairing, and a sense of need to point out that it isn't true. It's the most obscene form of wishful thinking and superstition.

HNers seem capable of identifying pseudoscience when its an article in nutrition (namely, eg., that correlations in the effects of human behaviour are not explanations of it)... but loose all sense when "AI" becomes the buzzword.

Correlations in *TEXT* are not models of language, intelligence, thinking. This is pseudoscience. And the most extreme and superstitious form: thinking that correlations are meaningful because we find them meaningful.

A person using language uses it in response to an environment, so "Pass me the salt" is said on the occasion that one needs salt. The "language model", `language = person(...)` includes the environment as input (and many other things).

Correlations in pixel patterns of the night sky are not a model of gravity. Correlations in text are not a model of language. Correlations in the detritus of human behaviour are not models of intelligence.

But to my question: how does one cope in an environment of this kind of wishful thinking?


> In any case, my question here is: how do you not go mad in this techno-religious environment where people (seemingly desperately and earnestly) equivocate correlations in written text with thinking?

We can not unequivocally explain what thinking is (in a human sense), and what makes it vastly different from Cephalopod thinking or a ridiculously large NN 'thinking'.

Until we reach a body of knowledge in Neuroscience that can give such an explanation on is it that makes human thinking somehow special (other than scale) then it is equally religious to assume that human consciousness or human creativity are somehow unique (in nature) and irreproducible (with technology).


That isnt relevant to whether NN models which record correlations in text are a model of language; they aren't. It isn't an open question.

This is an argument from ignorance: you dont know what caused the universe, so maybe god did.

Well, we do know much more than enough to regard any use of NNs today as having nothing to do with intelligence; and likewise, that any model of correlations in symptoms of human social activity (text, books, etc.) as not relevant to modelling/simulating intelligence.

A neural network model is just a set of averages over training data, which in this case, are simply recording correlations in word frequency. There's no neural "Neural network", it's just some mean()s.


> neural network model is just a set of averages over training data

We could claim the human brain is just 100 million neurons arranged in a mesh were each neuron has their own 'activation potential'.

Neither your or my statement are wrong. But again until we can pinpoint what (human) intelligence is, then claiming squid or neural networks are unable to be intelligent agents is just as emotional as the people you criticize.


Right, but it isnt. The neural network algorithm is just one in the whole set of non-parametric curve-fitting algorithms which approximate a dataset with a function whose structure is given by averages over the dataset its given.

It doesnt matter what algorithm in this family we choose, they all just produce correlative models of the data given. So if we give a NN a library of books, it produces a function whose structure is just correlations of words in those books.

To believe that the structure of this function has anything to do with language, the brain, or anything else is pseudoscience. It isnt close to science, it isnt an open or a philosophical question, it's homoeopathy.

Language use in humans is not a function of correlations of word frequencies. We know this, as much as we know where mars is.

No human is shown a trillion books and arrives at an internal model of language corresponding to word frequencies in those books. We know that it would be impossible to do this, since words refer to the world -- and correlations between word frequencies dont.

Your comment here defending this is equivalent to a homoepath saying "we dont know everything about chemistry". We dont know everything, but we do know that diluting with water doesnt make substances more efficaious.

And we do know that the correlational structure of text in books captured with any algorithm whatsoever, has nothing to do with modelling language. We know any such system cannot use language, as at least, it has no model of how words refer to the environment.


I fundamentally agree with your larger point that the map is not the territory when it comes to language.

However:

> Language use in humans is not a function of correlations of word frequencies.

I would argue that this is untrue. For example, people pick up on common expressions (and other behaviors) of those around them. If I hang out with people that say "catch my drift" a lot, I will probably also start using "catch my drift" more. I find it very plausible that "my internal model of language" is in many (but not all) aspects driven by the word correlations I observe around me.

I do wonder about the absoluteness of your statements though. Would a bed-ridden person that has no sense of touch/sight/smell etc. except hearing (i.e. only able to interact with the world via language) be "not human" to you? If not, how is "the environment" different for it and GPT-3 (other than context window)?


A model of gravity can be used to generate abitary night skys, but correlations of pixel patterns in sky-images isnt a model of gravity.

When I say "not a function of" i mean, explicitly, that the model does not have a term which depends on these correlations. Just as F=GMm/r^2 does not have a term for "image pixel weights".

Real Language is what wrote all those books, as much as gravity is what made the night sky. Pixels and word frequencies are symptoms, a few of an infinity of symptoms -- distant effects, whose correlations are not models of their causes.


There’s no such thing as real language. Language is loseless compression of perception to allow collaboration of agents. NNs also do loseless compression (from already compressed data - words). I think with more raw “senses” their compression will get better. Only missing thing is some memory trick. I would even bet that most humans use heuristic thinking and “logic” is just a way of validating legitimacy of these heuristic proposed patters. Once someone replicates this in AI it will get scary. Already things like “Let’s think step by step” are going in this direction. Hierarchical generation of actions might also help (and model understanding that these actions could be used to cause effects). I agree NNs aren’t enough for RL agents but it’s great bootstrap for world model component of agents that probably somewhere between cat-humanity level


there is certainly loss in the compression of perception to language


>We could claim the human brain is just 100 million neurons arranged in a mesh were each neuron has their own 'activation potential'.

You actually can't because, unlike neural networks (we know exactly what they do), we don't have anything even close to an exhaustive view of the functioning of the human brain.


An unimpressive mechanism can produce impressive results. The results listed in the article are simply jaw dropping to me, even though I understand the relatively simple mechanism that produced it.

How can you not be impressed by something that would have been considered inconceivable just a few years ago?


It's just a search engine. It's never been 'inconceivable'.

At no point since digital computers were invented was it inconceivable that we could continue prompt phrases by searching through similar phrases, near such prompts, in digitized text.

It was thought computationally infeasible, in just the same way that a harddrive would never be TBs, and so wouldnt be big enough.

But there's no magic here, and nothing suprising. It's presented by charaltans who have money to make as an achivement that it isnt. This isnt intelligence, and has nothing to do with it.

It can complete phrases by having seen TBs of similar ones no more suprisingly than google can. It is as limited as that process implies, namely, radically limited.

Likewise, it was never inconceivable that one could 'play' chess by searching through billions of chess permutations and selecting a simple high-value path. It was thought infeasible given, say, an abbacus to do it with.

We have never thought totally dumb solutions to these problems were unimaginable. They were just so dumb, that the only route to using them would be to somehow run a Ghz processor. Well, since we now have those, dumb partial solutions to dumb problems are possible.


> It can complete phrases by having seen TBs of similar ones no more suprisingly than google can.

Hard disagree here. Google can't do what GPT-3 does and it has access to a lot more than 350gb (size of GPT-3 model). I doubt you literally believe that.

> We have never thought totally dumb solutions to these problems were unimaginable. They were just so dumb, that the only route to using them would be to somehow run a Ghz processor. Well, since we now have those, dumb partial solutions to dumb problems are possible.

Did the GPT-3 architecture exist 10 years ago? Could an ANN from 10 years ago have produced comparable results, given access to the same hardware? I doubt so.

It basically seems that in your worldview, anything below "human level intelligence" is just unimpressive? Seems quite a high bar.


It isnt in the category of intelligence, so it being impressive is just whether it's algorithm is impressive. It isnt.

The mathematics of NNs are Victorian, it's just regression. The NN architecture is largely irrelevant: that only determines how easily the regression converges. The strategy is the same: just compress the data into a correlative statistical model, and estimate using statistical associations.

That's the wrong category of solution, and as a category, it's anchient. The hardware of modern digital computers is impressive, NNs are an absurd joke -- as is any correlative model of data -- a magic trick carried entirely by the performance of CPUs and HDDs.

Newton did not predict the existence of an undiscovered planet by finding correlations in the stars of the night sky. He first produced an actual theory of gravity.

Intelligence is theorising, ie., it is an explanatory process. Correlations in the frequencies of characters in books (we wrote!) is the basis for a neat trick.

There's a tremendous amount of pseudoscience in the claim that these are models of anything at all. Inasmuch as correlations in patterns of pixels in the night sky is not a model of gravity, and provides no means to "predict" anything other than pixel patterns of the night sky.


I know how artificial neural networks work. To me, the fact that they're based on relatively simple maths is not that relevant. A lot of simple mechanism produce complex results (e.g. the relatively simple laws of physics have produced human brains).

Back to my point, your bar is pretty high. Artificial intelligence to the level you describe will basically be a "world as you know it is now obsolete" type discovery. It will likely go down as the most impactful discovery of all time. Most of us have a lower bar for what counts as impressive.


"Impressive" isnt really a category for me, but also, I havent mentioned that my issue concerns how impressive anything is.

I am, yes, anti-wonder; I do think people get hopped-up on wonder and that lowers their threshold for reasonable belief and they end up in (religious) absurdities.

My issue here, in these terms, is that the "wonder" associated with the magic trick of modern ML is preducing a gross cognitive bias in which people believe this is an example of intelligence; or otherwise, an example like us in some significant sense. This leads to pseudoscience, charlatanism and a religious techno-utopian faith.

At the heart of it is a wonder-scam in the vein of all miracle-based shysterism. The fool looks at the miracle, is impressed out of all reason, and concludes miracles exist.

For sure, I dont really trade in wonder and I have never found it to do anything other than profoundly distort people's beliefs.


You are ranting about a bunch of things that I've never witnessed on HN or anywhere really (except perhaps for that one guy from Google whose claims were quickly debunked on HN and elsewhere). No one claimed that GPT-3 was sentient or possessed human level intelligence. The more common "religious/pseudoscientific" claims I've seen around here are those that completely dismiss the possibility of human level AI because of some miraculous-like property of the brain.


The question of "human-level AI" isnt relevant to the field of AI as it exists.

It's like someone in crypto talking about a future crypto-based financial system whilst at the same time being unable to processes basically any transactions, stabalise any currency, ensure any regulation, deal with any fraud, etc.

Or likewise, an MLM health company talking about cures for cancer all the while hocking vitamin supplements.

The question of what properties animals have which enable them to implement intelligence isn't relevant to whether correlative statistical techniques which capture word frequencies are "intelligence" -- inasmuch as vitamin 'science' isnt relevant to curing cancer.

This gross level of chalatanism, wishful-thinking, scamming, delusion and out-right fraud is overwhelming.


Who exactly is claiming to have solved human-level AI? Can you give some examples of "gross level of charlatanism, wishful-thinking, scamming, delusion and out-right fraud"? I genuinely haven't witnessed any of that. In my experience, most AI people acknowledge that the timeline for AGI is unknown and that it will most likely involve yet to be discovered algorithms, not just throwing more data/processing power at it.


>most AI people acknowledge that the timeline for AGI is unknown and that it will most likely involve yet to be discovered algorithms, not just throwing more data/processing power at it.

Try telling that to Sutskever or Karpathy (or indeed a bunch of "rationalists" on twitter, at least a subset of which works in AI) because both of them have made the claim you assert you haven't seen.


> HNers seem capable of identifying pseudoscience when its an article in nutrition (namely, eg., that correlations in the effects of human behaviour are not explanations of it)...

For me the real question is how well can GPT-3 identify pseudoscience.


We know that correlation is not thinking, but it is good enough for it to help or to be used as a starting point which is why people appreciate it as code documentation in this case or code completion in Github co-pilot.


Sure, I mean I only explained my view so that the question would be taken seriously. For me the question is the reason I commented.

I think I answered it myself... I have been seeing people's fervour to attribute intelligence to AI as either charlatanism or misunderstanding... but I think, now, it's actually wishful thinking (and hence essentially religious).

Trading in the usual superstitious foundations of religions: "AI works in mysterious ways", "correlations in symbols are meaningful", etc.

I still don't know why people wish that AI were intelligent; but at least now I have a clearer sense of it. There's some utopian faith behind it which creates this odd urgency to "require" machines to really be intelligent. I'd be interested in knowing why this wish exists.

I can never quite put my finger on how irreligious tech people end up in these tech religions. I guess there's some attachment to there being "a heaven somewhere", which with tech, will be heaven-on-earth.


I don't think it is important that AI is intelligent what is more important is that it is useful and that it solves problems. AI Intelligence would just be a means towards that goal.

As human beings, we slowly evolve from one generation to the next but our tools can evolve much more quickly and it can give us in a sense, superpowers. Cars were invented and allowed us to travel great distances while Self-Driving cars (which will require some sort of AI) will reduce car accidents and provide mobility for everyone.

AI has the potential to greatly improve the lives of a vast number of people and so it might be looked-upon as some sort of "saviour" and that has some religious connotations but most people still would treat it as just a tool but possibly a life-changing one.


> NN is just a fancy search engine

What makes you sure that your thinking process is different?

What's the difference between a model and complicated correlations?


Neuro-scientists and cognitive scientists and developmental psychologists and linguists spent the last century, give or take, looking at this question.

The most basic, most broken down, most simplified answer is what is called "poverty of stimulus".

A NN has an abundance of stimuli, orders of magnitude greater than any biological being could ever possibly take in. Finding enough correlations (enough for productivity) in such abundance is not actually that surprising. What is surprising is kids being competent at speech after less than 2000 days of stimuli, often really low-quality stimuli.


While I agree that ANNs are probably a poor model for how the brain really works, your comparison isn't necessarily a fair one. While ANNs start their learning with an empty/random state, the human brain starts with a bunch of connections that are pre-encoded in DNA.

With that said, my guess is that we'll probably have to take a few more hints from the biology of the brain before we are able to achieve "human level intelligence".


You seem to have completely missed that NNs run on physical hardware.

A contemporary CPU and/or GPU is 'a bunch of connections that are pre-encoded' - just not in DNA, but in silicon.


> A contemporary CPU and/or GPU is 'a bunch of connections that are pre-encoded' - just not in DNA, but in silicon.

No, CPUs/GPUs are not relevant if we are interested in the speed of learning in relation to the quantity of stimuli processed. You could even compute an ANN's learning algorithm with a pen and paper and it wouldn't change its "learning speed" within that definition. Pre-trained weights would.

Regardless, this head start is probably not sufficient to explain the disparity between the brain's capacity to learn and modern ANNs. ANNs are probably just not a very good approximation of how the brain works, for now at least.


I read this four times now, but I don't understand what you are saying.


Can you be more specific? What I meant is this: if we postulate that the brain really is just like an ANN, it would be more like an ANN with pre-trained weights, thanks to evolution. In contrast, an ANN is typically initialized with random weights. A pre-trained network learns a lot faster than a randomly initialized one.


What's the context of this? The article appears to be research on inputs to get GPT-3 to explain code.


The HN comments here in response. Previously, I might have just replied to some explaining how NNs work... but I'm hoping to give up on that. I just need "a way out"


ignore them. artificial intelligence is for those who need it.


This as an assistant/teacher could be incredibly empowering for people learning to program.

I'm sceptical of AI writing programs for people, but I can easily imagine such AI guiding/unblocking non-programmers enough for them to actually program.


It’s a double-edged sword. From the article:

GPT-3 doesn’t actually know anything about anything at all. It’s a huge pattern generator. You can’t trust anything it says, because all it does is group words together into convincing looking shapes based on text that it’s seen before.

Once again, I’m reminded that tools like GPT-3 should be classified in the “bicycles for the mind” category. You still have to know how to pedal!

They’re fantastic tools for thinking, but to actually use their output effectively requires VERY deep knowledge—both of the subject matter in question, and of the way that the AI tools themselves work.


The types of error made by the model in the article are exactly the sort of error that I (senior engineer / decade of experience) would make if asked to describe how something works and I didn't want to spend too much time on the details. The difference is, I can usually give a decent estimate of how sure I am of something, and know how to zero in on the truth when it's important.

Maybe a confidence level for a given explanation, along with some sort of "here's where you can go to learn more" would be useful? No idea if language models would be good at that kind of meta reasoning.


There’s already a per-token confidence score given by GPT-3 actually. As for citing actual sources, retrieval architectures like RETRO are becoming popular for this as they can tell you which sources they used for the current prediction and are actually faster thanks to including a fast KNN lookup over the dataset.


Dunning-Kreuger notwithstanding? ;)


And, in particular, GPT-3 will be very easily misled by plausible-but-incorrect inputs -- like programs written by amateur developers. I definitely wouldn't trust it to find subtle logical errors in a student's computer program, for example.


What is thinking if not pattern generating?


It's missing the part where you're not sure what the answer is, so you look it up, do an experiment, or ask someone.

That's a bit much to ask of a fiction generator, though. It doesn't know it's not okay to make things up. All its training is about making things up whether it knows the answer or not.


Look into RETRO, which greatly reduces the model’s tendency to confabulate by teaching it to query a document database known to be truthful, and justify its answers with specific references: https://www.deepmind.com/publications/improving-language-mod...


That looks like "if confidence < %some number%, then try to expand the dataset using known related sources". Load a library of short book descriptions into such an AI, and it should reproduce this behavior. Though it also needs the capacity to learn. Otherwise you could put a snippet that automatically queries it for better materiel about the subject when its confidence is low, so it just tells you where it thinks you should look. That seems doable now.


I'm no machine learning expert, but agree that it seems doable. I think rather than an if statement (which seems like a bit of a hack), it would be more principled to somehow train it on when it's worthwhile to do a search, sort of like a multi-armed bandit problem.

Either way, it's not just scaling up, it's changing the algorithm.


> It doesn't know it's not okay to make things up. All its training is about making things up whether it knows the answer or not.

Sounds very human to me.


Humans are sometimes curious enough to look for answers.


This model has already spent more time learning than most humans ever would...

Imagine if we gave it more power and funding. Maybe redirect all of Congress's salary to it?


We use far more variety of inputs than GPT-3.

GPT-3 has more textual training data than any human could read in a thousand lifetimes, but we base our judgment on much more than that.


My take is that pattern-matching is thinking. But it's low-quality thinking. High-quality thinking is logic. And higher still is causality, which is to logic what calculus is to algebra. I.e. if logic studies the relationship between x and y, then causality studies the relationship between dx and dy. And causality is what we actually want, because causality is power. E.g. causality is what lands astronauts on the moon. When folks like Judea Pearl complain that current AI isn't truly thinking, they're complaining that current AI can't reason logically/causally.


Things we have that current AI models do not:

Intent: We can have a goal which directs our thinking. This changes what we come up with much more flexibly than experience, even if experience is often needed for good results.

Temperment: We can be angry, tired, excited, relaxed, etc. while thinking. This isn't always good, but it is a way we differ.

Self-awareness: A very difficult term to define, but we can (hopefully) take a step back and discard our current thought if we realize we are falling into one of our usual bad patterns of thinking.


Intent and self-awareness do come from experience.

The goals we want to achieve are tied to what has been going on throughout our lives thus far. We don't invent our own values out of a vacuum, we tend to adopt them from other people, and rank them depending on how our nervous system works.

Self-awareness is the ability to reason about the process of reasoning, plus, in your example, a recollection of past examples of incorrect reasoning that disrupted the process of achieving our goals.

So basically current models need a memory and the ability to change their own weights in response to new data to even start approximating humans, but that looks like a doable task, in principle.


I agree that in a sense it’s probably only a matter of degree. However, take for example its explanation of the xm modifiers, where it only explains the x, but gives the impression that it explains both modifiers. It doesn’t seem to have any awareness that it provides an incomplete and potentially misleading explanation. It would be interesting to see how it reacts to “What you explained is only the x, what about the m?”.


Indeed the brain is pretty good at this, so much so that it is annoying when you are trying to generate a variety of textures by combining a small set of textures. This of course extends to audio patterns, temporal event patterns (e.g. post hoc, ergo propter hoc) etc.

That may be the cornerstone of "intelligence", but that's only one part of it. I think another one is making or finding relations between the parts of the whole.

Life is the continuous adjustment of internal relations to external relations [1]

Once you have internalized relations of the real world, you can start to hack them in your head, run simulations of the results and take action.

[1] https://www.thoughtco.com/famous-education-quotations-herber...


It's not enough to state that the human mind is a pattern matcher as pretty much everyone of the "scale is all you need" crowd does. You have to prove it too.

Otherwise it's just a variant of Russell's teapot.


Even if you prove it, it's not useful.


You aren't actually serious, are you?


I agree, while i do not think AI is at the level where it can replace developers, and i doubt it will ever be, I am sure that AI can increase productivity and make development easier, like copilot


> As with everything GPT-3, this answer sounds very convincing! But is this actually true? I would expect the SQL query optimizer to be smart enough to optimize the LEFT JOIN to the point where it would run efficiently. I think GPT-3’s answer to that question is actually very misleading.

I really enjoy how well it gives wrong answers. The internet being polluted with GPT3-like text models is going to send us back to the time when verifying anything online as truthful is difficult, if it hasn't already.


Maybe my next move should be to start a consulting shop: “We Fix GPT-3 Code”.


Good luck with that. I don't think that these things are fixable. They're basically giant grammatical confabulation engines. When you want grammatically-correct random noise, great. Don't fly a plane with them, please...or at least not one I'm on!


No method is fool-proof for GPT-3, but there are tricks that suppress its tendency to hallucinate, such as demonstrating in a k-shot prompt the possibility of false assumptions in a question. I demonstrate this here: https://mobile.twitter.com/goodside/status/15457933888716513...

Models more advanced that GPT-3, such as LaMDA, have entire subsystems specifically dedicated to “grounding” output in truthful information. Hallucination is at least partially a solved problem, but the methods haven’t disseminated broadly yet.


Unless it’s reading and (in some sense) understand FDA and CDC drug approvals, ct.gov, pubmed, and NCCN and other treatment guidelines, and realizing when drugs are suddenly deprecated as well as suddenly approved, it’s going to commit mal practice. You could do all that, but then you don’t have a language model, you have a domain specific medical AI.


Not sure where medicine even entered the discussion. Obviously GPT-3 is not a doctor and shouldn’t be used as one. That’s a much harder problem than suppressing the tendency of GPT-3 to confabulate/hallucinate fictional memories.


It’s not obvious. Medical advice is among the top searches on google - possibly the top. And is, by that very same virtue, among the first things people are going to ask what they hysterically believe to be an AI.


There’s a big leap from “some layperson might misuse this for medical advice” to “code generated by GPT-3, for any purpose, cannot be reviewed and fixed by expert human programmers,” which is where this thread started. I’m not sure what you’re arguing, or if you’re even trying to support that original point.


We should probably cut this off, but let me try to explain what I had in mind, which might be none sense, but was certainly not well explained. Too briefly, presumably we can agree that it’s even hard for experts to check and fix code created by other humans effectively. Now put this in a dangerous domain. I originally said aviation, but (confusingly) switched to medicine for reasons not worth going into, but we can replace “program” with “treatment regimen” and “expert programmer” with “specialist physician”. The fail I was reaching for is where what is hallucinated by the LLM looks really convincing (has great grammar), and is close enough to correct that it isn’t glaringly wrong - and these models are good at doing those two things! - but by virtue of the fact that they can’t reason, they aren’t actually “thinking things through” - which is why, of course, you need the programmer/specialist. But the hallucination is so good (yet wrong, per unreasoned) that users (patients) are taking the word of the model. You think that can’t happen, but we do it all the time with results from “Dr Google”. A friend who is a doctor says that patients regularly come in with inch think binders of stuff they printed off the web that he has to talk them down from. We’re headed for a near future where instead of being talked down from stuff you merely printed off, you’ll have to be talked down from an extremely convincing wrong treatment hallucinated by Dr HAL. Anyway…never mind.


I think "put this in a dangerous domain" was what the parent was specifically not doing.

I also wonder if it's GPT-3 itself arguing as the text is kind of strange. :)


He meant fixing the output(s) not the model that produced them.


I'm pretty sure abrax3141 understood this. Some things are just unfixable because they're "not even wrong". You basically have to start from scratch.


Could they do this for legal documents too?

I'd hate to see just the programmers put out of work by GPT-3.


A buddy of mine works at a startup doing exactly this: https://www.advocat.ai/

Unsure if they use GPT-3 specifically, but the core idea is the same.


Hey also try out https://text-generator.io

the same querying works but it will save you a load on gpt-3 cost https://text-generator.io/playground?text=SQLite+create+a+pa...

codex is technically free if you have access but soon it wont be so keep text-generator.io in mind there too as it can generate code from descriptions etc too at a wildly more competitive price.

Had to use a bit of the "Create Table" syntax, but it did a create table if not exists which is nice... it missed the content field though


can't find the model size or a description of the dataset on the text-generator page


I feel like we’re not appreciating how amazing this is.


Many people here on Hacker News are realizing that a significant part of their hard-earned skills are now available to anyone for a few cents of compute time.

Once these systems scale up to attention over long pieces of program text (instead of just a few thousand tokens) most of software development will be near minimum wage, almost skilled labor.

You didn't mind when taxi drivers became obsolete. "I'm not a taxi driver, so I don't care!" We'll see how you feel when your economic value approaches zero.


I think they're very impressive but they still make obvious mistakes that a human wouldn't. I think it's far more likely that they'll become tools that programmers use to increase their productivity than outright replacing programmers.

For the next 10 years at least. Beyond that who knows. If we get strong AI then we have bigger things to worry about than one sector's jobs.


Do you realize that writing the code is maybe ~25% of what a senior software engineer does?


There are many tasks that do not require much intellect. For example, "here is a web store design mockup I have drawn. Generate the code for it. Integrate it with payment and delivery services". Today you need to hire developers, testers and other people. What if such trivial tasks could be performed by a neural network?

Another example is phone support. AI is already being used there, for example, in some banks (in Russia) you are first greeted by a robot if you call, and only if it cannot answer your question it will let you talk to the human.


Your example is not at all as trivial as you may think when you delve into the details. And if you don't want to delve into details, there is already stuff like shopify.


We’re really in trouble when gpt can do stand ups.


It may not be there today, but I think it's obvious that AI is going to basically make all creatives obsolete. At some point I expect it'll be able to:

* Create movies, music and books on-demand

* Design interiors, buildings, electric circuits, web sites, build pipelines, software architectures, etc.

Software devs may be one of the last industries to be obsoleted, but it'll happen. About the only thing left will be the executive function of decision making, but even that's likely to devolve into:

"AI, where's a gap in the market?"

"Jonny, there are gaps in blah, blah, blah. Products that do x, y, z are in demand"

"OK AI, design and print me a prototype"

"Printing in progress..."


I remember people online saying that automation and AI will free up our time to be creators instead of consumers. Looks like that's in jeopardy now. I guess back to consuming content for me.

I feel like this could be one of the biggest technological changes I'll see in my lifetime but there isn't a lot of hype around it like say VR/AR and Web3. If this does end up exponentially improving, life is going to feel like magic.


Tangential question: Is GPT-3 (and similar tools) being kept up to date with new information? Or would it basically have to be retrained from scratch to properly incorporate new data?


There's a mechanism you can use to add your own training data on top of GPT-3. This tutorial talks about using that to generate song lyrics (about half way through): https://wingedsheep.com/songwriting-with-gpt-3/

I don't know how often OpenAI build new versions of the core GPT-3 language model.


What would be highly beneficial is for the AI to understand changes so when it spots an old pattern in my code, it could suggest the new pattern. I'd pay for a Copilot that did that.


Copilot: “You’re using yesterday’s JavaScript framework of the day. Please rewrite your app in today’s JavaScript framework of the day.”

Jokes aside, I wonder when we’ll see an official language linter (go vet, rust-clippy, etc.) that is an ML black box instead of a traditional handcrafted deterministic rule-based engine.


This is fantastic, I've been enjoying myself thoroughly with Dall-E 2 recently and similar to this love trying to imagine how the training method / training data must have led to the output. It's truly amazing times we live in, but it will take a lot more work than people 70 years ago thought to get to a level of polish.


those who don't have GPT3 access and use GitHub Copilot, if you install copilot labs beta in VS code, you can get code explanation inside the editor itself, also bonus it offers language translation to other programming languages


I asked GPT-3 about several things ranging from creating for me a simple JS canvas (and embed it into it's own self-contained HTML doc) with a red bouncing ball. It interpreted that instruction to have the ball go up and down, starting at a diagonal and going from left to right, bouncing off all sides of the canvas.

I then decided to ask it about Creator controversies which it gave surprisingly accurate and knowledgeable answers that pretty much mirror my own thoughts about a complex controversy that spanned over two years.

It's kind of crazy, a little worrying with just how much the AI could gather from a few nasty reddit comment threads.


We have a product with a VS Code extension (IntelliJ and Chrome) coming soon that does this: https://denigma.app


How the flying f*ck did they collect all the data for GPT-3?



That work started back in the late 80s, early 90s, and people have been building and organizing corpora at an increasing scale. Cleaning a corpus is a hell of a lot of work. I guess they've been using older work to bootstrap that.


Scrape the web. Same way Google do it.


Slightly-curious, slightly-unnerved, slightly-paranoid question here:

Is there any chance that this code snippet was sampled from repositories that contained associated commentary/discussion, guiding GPT-3 to produce similar explanations?

Or is this genuinely an explanation that it can produce without context local to the code in question?

And what number of people are able to determine the answer to my first question?


>Is there any chance

There is a chance.

To get a better answer, you have to specify what exactly you mean by 'sampled' and by 'without context local to the code in question'.


It's very impressive what AI can do for developers to help us code.

Imagine pair programming with an intern who memorized your database schema. You don't need to go back and forth between schema and your code, you can keep your focus at the cursor. It's great for automating the boring stuff.

I would love if Github provided an API for Copilot. Imagine integrating Copilot with your shell.


This can make learning at lot more pleasant. Especially RegEx.

Another area of huge interest is legacy code, or low level languages like assembler, COBOL, and some C stuff. Also almost everything without proper documentation.

Well played.


Now I wonder what would happen if we finetuned GPT-3 on stackoverflow. Would it become better at code ? Good enough to assist any beginner in learning ?

That would be huge


One of the meta-challenges as I see it is “if we had a data set comprised of only the best information from the best experts we could make an AI that beats them all”.

Because I think that’s both true and entirely unrealistic. Like, how would you quantify who the best experts are, and how would you verify that you’re not training on wrong data from those best experts? Maybe they had an off day or a specific agenda.

Even browsing through verified SO answers there are still answers that are outdated and answers that are just plain wrong. The greater the experience needed to answer a specific question, the more unlikely it is that there is a clear and correct answer given.


SO is far from perfect, but of all the websites out there, SO has strong signal in this regard thanks to the moderation and metadata available.


This is exactly the idea behind GPT-3 Codex, but using GitHub instead of SO. It does, indeed, become much better at programming than the baseline model.


Just imagine being able to hit ctrl+space in your editor to add "this question has been marked as a duplicate." :P


I'm sure the GPT-3 training set includes Stackoverflow already. Would be weird if it didn't.


Would be interesting to produce changelogs or commit messages to be edited, but it would probably be abused more than it was helpful.


Can it find and explain vulnerabilities such as SQL injections or buffer overflows?


It gets you in the vicinity.

https://i.imgur.com/96HfJRb.png


Yes.


Will this ever be released? The fun stopped at step (1) Get an API token.


This would be great for blind people, or you could program while biking.


As a blind programmer I'm trying to understand how I should find this more useful than anyone else?


I think what they're saying is, you can quickly understand the contents of a file by having it summarized for you audibly. Versus having each token read out to you. Then you can locate the part that you're trying to program faster.


Not sure what that has to do with being blind except the audio part, and we already have text to speech systems to read the actual code

Programming while biking with ai is like relying on a really buggy early stage auto-drive ai system to take you places down winding, slippery roads while you sleep at the wheel. It might get you there most of the time, but I’d might bend you around a tree too


If you're a blind programmer, would you rather listen to a summary of each function in a source file or listen to every token on 10x speed and figure it out yourself? I believe that is the point that they are making. Not sure about the programming while biking though, I typically need to concentrate on one task at a time (which is what most multitasking is realistically, just quickly switching between one task at a time).


Again, what does this have to do with being blind? Whether you can see or not, summaries are nice to have; and whether you can see or not, you still need to read the code in case the summary isn’t accurate, as it typically isn’t


How can one use GPT3? ,Apt install GPT3' would be great


FWIW, various open language models are available on https://huggingface.co

GPT3 isn't available; GPT2 is, as well as BLOOM which is (supposedly) as good.

You can try them online, and you can download them. Downloading them isn't as easy as downloading as app, though: the models are pretty large and need some expertise and lots of compute to run. Though Hugging Face does offer to set you up with cloud providers.

(I haven't actually try any of this, so I don't know how easy it is in practice. I've only tried the "playgrounds" that are available with some models)


You need to give OpenAI your credit card and use their API. GPT-3 is not publicly available, and even if it were the hardware requirements for running it would be far, far beyond a typical desktop/server.


You can try it for free for three months. I wrote about how do to that here: https://simonwillison.net/2022/Jun/5/play-with-gpt3/


This is simply anazomg




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: