Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: A new way to use GPT-3 to generate code (and everything else)
285 points by goodside on Aug 20, 2022 | hide | past | favorite | 83 comments
Hi HN,

One of the things that frustrates me about Copilot is that all tasks posed to it must be in the form of a completion. By writing clever comments you can get it to generate a few lines of code or a short function body, but you never get coherent long-form generations just from mashing the tab key.

I’m working on a different approach. Instead of requiring you specify your code generation task through stilted comments, you can use GPT-3 to fill in what I call “instructional templates”. They’re like f-strings, except the English goes on the inside and the Python goes on the outside. Additionally, each instruction’s location and surrounding context can aid in interpreting it, allowing instructions to be impressively terse.

I’ve collected 10 examples of the method on a Twitter thread here. Most code examples are in Python, but I also demonstrate generating CSV, NDJSON, R, Markdown, and HTML: https://twitter.com/goodside/status/1559801520773898240?s=21...

I also have a few examples of more creative, non-program output in HTML and Markdown in this thread: https://twitter.com/goodside/status/1560953991722418177?s=21...

Interested in any feedback, especially from anyone who’s tried to apply my method to their own problems.




Seems like this also works with C++. This was my first time using GPT3, and first attempt at this, no fine-tuning, no cherry picking at all:

    Replace the $$dollar-quoted expressions$$ with the content described therein, or follow the directions they contain.

    main.cpp
    ```
    $$c++ includes for output$$

    $$Vec3 struct with three float members x, y, z. constructor. dot product implementation. $$

    $$main method which allocates two vectors, then prints the dot product of them to standard output$$
    ```

    Answer:

    main.cpp
    ```
    #include <iostream>

    struct Vec3 {
        float x, y, z;
        Vec3(float x, float y, float z) : x(x), y(y), z(z) {}
        float dot(const Vec3& v) const {
            return x * v.x + y * v.y + z * v.z;
        }
    };

    int main() {
        Vec3 a(1, 2, 3);
        Vec3 b(4, 5, 6);
        std::cout << a.dot(b) << std::endl;
    }
    ```


I had no idea GPT-3 could write code! It seems to do pretty well even with a one-sentence prompt:

  Write a C++ program that computes the dot product of two three-dimensional vectors.
---

  #include <iostream>
  #include <cmath>
  using namespace std;
  
  int main()
  {
      double x1, x2, x3, y1, y2, y3;
      
      //Get input for vector 1
      cout << "Enter x1: ";
      cin >> x1;
      cout << "Enter y1: ";
      cin >> y1;
      cout << "Enter z1: ";
      cin >> z1;
      
      //Get input for vector 2
      cout << "Enter x2: ";
      cin >> x2;
      cout << "Enter y2: ";
      cin >> y2;
      cout << "Enter z2: ";
      cin >> z2;
      
      //Calculate the dot product
      double dot_product = x1*x2 + y1*y2 + z1*z2;
      
      //Print the result
      cout << "The dot product is: " << dot_product << endl;
      
      return 0;
  }
EDIT: It even does Rust, I'm blown away! (I'll forgive the compile error, it's close enough)

  fn main() {
      let v1 = vec![1, 2, 3];
      let v2 = vec![4, 5, 6];

      let dot_product = v1.iter().zip(v2).map(|(x, y)| x * y).sum();

      println!("The dot product of {:?} and {:?} is {}", v1, v2, dot_product);
  }


In general, anything that has a “textbook” solution is easy. What it’s doing here is more recitation than synthesis. Where it becomes harder, and where my method is necessary, is when you need to specify the structure of the solution yourself because the model can’t get there on its own.


Exactly, that's why GPT3 generated things that were not included in task description in C++ example like reading the numbers first and then printing them. Same thing in Rust, here vectors are static, while in C++ you were asked. All because of examples on which the model was trained.


the reason it does well is because you gave it a question that might come up in coding interviews.


I think that's amazing on its own. Rather than having to do a leetcode hard on the fly, you can just ask for the solution. You'll still need to know the problem space in order to properly optimize it, but it can save a lot of time.


It’s incredible. I have used it to make several arduino programs for me, controlling different hardware and connecting them. It’s just amazing.


> Replace the $$dollar-quoted expressions$$ with the content described therein, or follow the directions they contain.

Is this a natural language meta-instruction prompt to GPT-3? If so, that seems kinda impressive. Does the model conceptualize this sentence somehow, or 'merely' recognize similar prompts from some specialized training?


Here's one of the craziest examples I've seen. It's just a page of instructions, and GPT-3 follows them https://mobile.twitter.com/goodside/status/15575245464120524...


I have just a question: how many prompts did it completely fail for each prompt it answered brilliantly? 5? 10? 100? That's what these "twitter threads showing gpt3 doing astonishing things" fail to show.


Many! In this example, my question was explicitly, “How many diverse tasks can I stack into a single generation before it becomes unreliable?” If you scroll down in the thread, I explain that these questions are on the “golden path” of tasks GPT-3 does well. There are any number of simple tasks I could have given it, like writing a sentence backwards or summing a list of 10 numbers, where it would fail every time.


Is there any way to get it to respond the same way when something is outside the golden path? So for example, if you gave it the backwards sentence task, it would respond with "I don't know how to do this" or really any way of programatically evaluating that it failed, without needing to know what the task itself was.


Knowing whether or not it’s giving you a sensible response is one of the things that are hard for gpt-3, unfortunately. It has no concept of failing.


On the contrary, doing a two-stage generation where the second stage simply judges whether a generation is correct can help a lot. It works even better if you give it several generations and let it choose whichever is the most truthful. I wrote a basic example of this here that uses my own confabulation-suppressing prompt in the first stage, but simpler variations of this exist in the published literature: https://twitter.com/goodside/status/1559586486705602562?s=21...

The hallucination-suppressing prompt it implicitly uses the output of is here: https://twitter.com/goodside/status/1556459121834168320?s=21...


Yes. You can, with effort, condition it to respond sensibly with phrases like “I’m sorry, I don’t know how to reverse strings,” or “I’m sorry, I can’t do any math calculation that a human couldn’t do in their head.” But in doing so you damage its ability to do some tasks it’s actually capable of, e.g. reciting a memorized answer to “What is the fourth root of 625?” Its memorization abilities are insane: It seems to know, for example, the exact MD5 hashes of all single-character alphanumeric strings. Much of the arithmetic it knows is probably similarly memorized, and it’s hard to clarify for it what aspects of that memory are safe to use.

The initial problem that got me interested in GPT-3 is suppressing confabulated answers to the Hofstadter-Bender questions published in The Economist. I eventually found an apparent solution but I’m yet to carefully validate it: https://twitter.com/goodside/status/1556459121834168320?s=21...


> Does the model conceptualize this sentence somehow, or 'merely' recognize similar prompts from some specialized training?

I don't think the model is big enough to Chinese Room prompts like this. It has to "conceptualize" them (in the least loaded sense of the word that works).


Nice! In general these are better if you run them at the lowest possible temperature. I.e., try temp=0 first for deterministic output and then raise slowly if you need to cherry-pick a better generation.


Wow, I had no idea this could be done!

Since we are on the topic of code generation, I had a question. I built this joke script called Rockstar [0] which generates fake git commits resulting in fully green GitHub commit graph. In each commit it adds gibberish, in the last commit adds a valid code. I wanted to know if there’s an easy way to generate realistic looking code which I can use in each commit? I can’t expect users of the script to use OpenAI or any such API service. Something which can be used to generate code locally would be sweet!

[0] - https://github.com/avinassh/rockstar


This feels, not ethical.


Decorating the commit history calendar has been around since forever: https://github.com/gelstudios/gitfiti

Fake code generators have been around since forever: https://hackertyper.net/

Combining them seems like the logical next step.


HackerTyper isn't a generator, but is actually from the Linux kernel. It's kernel/groups.c, but just a really old version.


This would explain why I've seen groups.c code pop up in various TV shows as filler when someone is writing code on the fly.


This feels ethical.

We have broken, emergent incentive structures in our system. Tweaking those can be good or bad. It's not like a cheating on a test.

If I were to modify it, I'd make it draw a picture :)


Might be worth explicitly staying you feel it’s unethical so it’s possible to respond to your position.

I for one am able to think of numerous ethical ways it might be of use.


Could you share them?


Sure, for example, as way for a researcher to generate realistic GitHub profiles to combine with realistic resumes to do field research on labor market discrimination.

Example of such research, which is common, ethical, and completely legal to do:

https://www.shrm.org/resourcesandtools/hr-topics/talent-acqu...


ethical use: you use it, and the squares on your github.com profile, which is a website that hosts git repos, turn green.


While I would be the first to say those squares are meaningless without context, if someone led others to believe they were the result of anything other than automatically generated posted and didn’t have a legitimate reason like the one I posted, that might potentially be fraud, especially if it was used as a basis for future economic exchanges, such as employment.


Isn’t my GitHub profile mine? Is it unethical to manipulate my own data?

People manipulate their identity all the time, flashy IG posts, financed fancy car, combing their hair, wearing makeup.

But code commit history is the line?


Thank you for making your framing clear with examples, and your POV with "code commit history is the line?"


Using GitHub green squares as a basis for offering employment is as stupid as using the color of their hair. Don't play stupid games if you don't want stupid prizes.


I find it fascinating that it seems like there's this emerging field of expertise around how to best interact with these gigantic LMs. I have no idea if this is like a passing fad during a gawky adolescent phase of the models, or if this is just a new thing that some people will always be at the cutting edge of.

I gather that there's loose some precedent for the latter with the chess computer stuff: I've read that serious chess players heavily incorporate computers into their training and even that hybrid human/computer teams often outperform either/or teams. I'd love if someone who actually knows chess commented.

Generating code via model sampling seems to have different, or at least exaggerated imperatives around "few-shot" tuning. One might wish to generate natural language for any number of purposes, but there is a probably a stronger "better"/"worse" gradient for code, and much like human language, excellent code is rare as a fraction of all code not only overall, but even by the same company or even by the same author. So you probably want the overall contours of like "this code compiles" from a big corpus, but to tune up the "this vectorizes well" eigen-tensor from Lemire's repo.

Crazy times.


Human absolutely bring negative value to a chess match for at least 10 years now (when pairing with a computer).

Even for Go AI, where it is only ~6 years old now, I'd still believe that human bring 0 value to a computer. 2-3 years ago, there was still some precise sequence where human might conceivable be dealing with (due to the nature of MCTS, human was better at reading deep narrow move sequences -- in go player term, human was sometimes better at life & death + some ladder sequences). Nowadays, the network are just way stronger than any human that we can't really follow what is going on in serious hour-long matches of the machine.


Thanks for the informative reply! You sound quite knowledgable about this kind of thing so I'll pose a follow-up question(s).

What game or broad category of game is the least dominated by machines these days? When I first saw the AlphaStar/StarCraft stuff it seemed that there was still plenty of room for humans to be competitive in a general sense, but that was some time ago.

It's hard for me to imagine that 10 years from now there will be a game that isn't machine-dominated, but I could be wrong on either side: maybe that's true now, and maybe there's a set of games that still has DeepMind totally stumped!


Disclaimer first: I'm not a ML practioner, so this might be wrong, though I'm guessing not too blatantly.

In a broad stroke, I think all games should and could be dominated by machines nowadays -- though because it is no longer an interesting problem, no one really tried to do it anymore. The last big challenges were dealing with incomplete information game (poker, starcraft) and continuous state space game (starcraft, dota etc.), which was solved ~2-3 years ago. There is still challenges (AI still favor short-term rewards, while human has better long term planning), as well as technicalities (the machines doesn't really play like human: if you can control individual units, certain things like overkill doesn't happen anymore and certain units suddenly becomes overpowered). But those are just obstacles to playing a perfect game, you can still beat human reliably even with those limitation.


> Human absolutely bring negative value…

I have people skills; I am good at dealing with people. Can't you understand that? What the hell is wrong with you people?


It sounds like somebody's got a case of the WFH movement is gutting middle management ;)

In all seriousness though, this stuff is sobering to say the least. It's dangerous to extrapolate from 80% to 100%, but it's entirely possible that these things get good enough to take a meaningful bite out of software jobs.

It would be supremely ironic if after a decade of thinking taxi and truck drivers were going to be put out of work it turns out computer programmers have their jobs threatened by generative models sooner!


It would be ironic indeed, but the whole thing with "taxi and truck drivers will soon be made obsolete by AI" was also based on extrapolating from 80% to 100% (IIRC, Uber was saying they'd be completely driverless by 2025).


After looking at the outputs for anything reasonably complicated, well...

If this steals our bread, then we deserve it.


Stackoverflow is probably just going to be filled with "What prompt do I need to write to get my AI to output the code I want?" questions instead.


It reminds me of Vernor Vinge's "Rainbows End", where one of the basic classes they have in school is how to correctly word your search queries for best results.

We might actually end up with a fancier version of that.


I've been following Riley on Twitter and he's a constant source of fantastic GPT-3 tips, recommended: https://twitter.com/goodside


Thanks! Your blog post on using GPT-3 dialog to explain SQL queries was a big inspiration for me to start posting my prompts publicly: https://simonwillison.net/2022/Jul/9/gpt-3-explain-code/


Just as cybersecurity analyst jobs are getting reduced to comparing risk score numbers, maybe programming jobs will be 'reviewing' machine-generated code in the future.


I think it will be more like pair programming. Already kind of feels like you’re pair programming with an intern with copilot.


Is it in your plans to do such thing with actually open model?(something like Codegen from Salesforce or BLOOM).


No. The method relies heavily on the peculiar fine-tuning of the InstructGPT line of models, which are trained specifically to follow MTurk-style prose instructions. I imagine achieving similar results using a non-InstructGPT model would be hard, but I could be wrong.


I just copy and pasted 3 random Leetcode problem prompts to GPT-3. It successfully generated Python code that passed all test cases for 2 out of the 3.

Problems passed: - Two sum - Text Justification edit: Newlines


Thanks for sharing this. I've been playing around with GPT-3 for a bit. Have you tried comparing this method to using the `insert` mode in the Playground?

https://beta.openai.com/playground?mode=insert

On a side note, I learned that the limitation on the number of tokens was often too restrictive to do anything fun with code generation. Have you run into this issue too?


The Insert API is much less powerful, because you can infill only a single location and you’re limited to communicating the infill content purely through context, without any instruction. The Edit API is more directly adaptable to this, and anecdotally it does seem to work but I can’t vouch for its reliability.


You should make this into a website generation playground


Is co-pilot now mandatory to be paid? I used a trial for a long while, at least enough to miss it in my workflows once it went paid. Is this really the case? Last I checked it was like 12 dollars a month? Is it possible for it still to be free or are there free plug-ins for intellij that leverage it?

I loved it for some scaffolding, quick drafts or explore some language features, but, not enough to pay monthly for it (yet!)


The most comical thing about it going paid was all the hit pieces about it went away.


No, they are still there. Nothing has changed. Why would there be new ones if nothing has changed?

GitHub (msft) is still just copy pasting code from other projects with no regard for copyright. Whether it’s paid or not doesn’t matter much.


It matters to me for my personal use cases and basically for its scaffolding capabilities, because, while I was finding it useful and interesting, I don't think it's worthy of a monthly sub.

What the comment above me meant is that probably this is a shared feeling by many other people because the initial hype (like all the blog posts etc) died quite fast once it went paid.

I can't imagine how much it cost to train the thing but oh well, I guess it's irrelevant for me for now since I won't be using it...


if you take a coding interview and answer the question by feeding it into GPT-3, does that mean you pass the interview? it must do really well since all it takes is memorizing the solutions to a large body of meaningless challenges.

the implication here is that if GPT-3 can solve your coding question, you are hiring people good at memorizing solutions and not skillfull engineers.


> the implication here is that if GPT-3 can solve your coding question, you are hiring people good at memorizing solutions and not skillfull engineers.

How is that any different than what it is now? Moreover, this type of rote learning is what has been rewarded most in most academic examination processes; SAT prep tests are doing exactly that: cramming students in order learn the exam process and deductive reasoning via multiple choice by taking many test exams.

Honestly, the only tech job I have ever received was extended to me because I was headhunted while working on my startup so my POV vastly differs from most with how on-boarding should go, and every time I see this white-boarding non-sense is the norm in the interview process it make me wonder what exactly it's trying to achieve if not just gate-keep and subject people to the same hazing process the hiring managers and project leads went through when they joined: people just say they'll just grind on leetcode rather than build something useful as this is the main barrier of entry and it dissapoints me so much that it's just been normalized over the years.


Clearly false. Humans do not solve problems the same way computers do.


What I would really like is to get it to craft commit diffs of an existing codebase in response to a request.


GPT-3 also understand code to some degree so it can run the code as instructed.

https://mayt.substack.com/p/gpt-3-can-run-code


This is using standard prompting I think? It'd be neat to try with the full 'fill in the blank' (where the blank is in the middle of the input) generation technique LLMs can support, might work even better!


This definitely isn’t standard prompting — as far as I know, nobody was aware of this technique until I found it this past week. Its precedent is the “format trick” that Boris Power (at OpenAI) showed me, which is essentially my “instruction templates” but where the only template form is an uninformative numbered list. The trick is used purely as a way to consistently format the output to multi-part tasks without providing fully worked examples. As far as I know, it’s never been extended to contextually informative templates, or to achieve code/document synthesis, as I’ve done here.

An example of the “format trick”, roughly as Boris described it to me, though in OpenAI’s examples the answers were simply numbered: https://twitter.com/goodside/status/1556729629263933440?s=21...

If someone at OpenAI corrects me and says they’ve known about my method, I’ll stop claiming to have discovered it.


Reminds me some of Django or jinja template tags.

Except the tag names are free form and the source is generated rather than defined.


This is a very clever piece of prompting. Thank you for the idea. Great discovery!


should i cancel my sub to copilot?

alright how do i access gpt3 so i can write it like this?


The OpenAI API. I’m using text-davinci-002 (the default) for all of these, with temperature=0 for reproducibility/quality.


More like, should I withdraw my college application, or at least switch away from CS.

Obviously the answer is not only "No", but "Hell, no, this is just starting to get interesting!" But a lot of young people who are primarily in it for the coding career prospects should probably reconsider.


So only people who have rich parents should go into CS. Got it


For grossly-invalid values of "Got it."


Posting screenshots to Twitter has to be the least convenient way to share code online (short of actual trolling).


"Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html


I include OpenAI Playground links for all but the first several of these, which capture not only the exact prompt but the settings used in the generation. I don’t use gists because you need multiple, non-contiguous highlighted spans of text. Also the Playground formatting is recognizable in a way that establishes context quickly and invites people to read the text.


FYI, I never click twitter links, that's 1 data point. Not sure how many are like me here.


Thanks. I appreciate not everyone likes or is willing to use Twitter, but I’ve yet to find a more convenient or accessible channel for this content. I could start a proper blog and worry about my own prompt formatting, but that complicates my workflow a lot — especially since most of these results come directly from my phone in my spare/travel time.

Eventually I may consolidate my better findings into a blog post of some kind.


Use open ai to generate the blog for you! :) anyway thanks for posting it here and not only on Twitter. Curious to learn more now


Clickable version of links:

Python, CSV, NDJSON, R, Markdown, and HTML examples: https://twitter.com/goodside/status/1559801520773898240?s=21...

More creative, non-program output in HTML and Markdown: https://twitter.com/goodside/status/1560953991722418177?s=21...


Isn’t there a different domain I can swap in so I can actually read this in a browser without immediately getting the Twitter “paywall” / login prompt?


Replace https://twitter.com/ with https://farside.link/nitter/ to get redirected to a working nitter instance.


Try replacing twitter.com with nitter.net or twiiit.com in the URL.

The latter is a proxy for various Nitter instances, which is useful if nitter.net itself is overloaded, which is happening more and more these days because obviously it's such a useful service.


In this case everything is in images, so the following should work:

https://pbs.twimg.com/media/FaWHN2oXkAAPISD.jpg https://pbs.twimg.com/media/FaWHN2nWQAEERHQ.jpg https://pbs.twimg.com/media/FaWHN2pWYAIbJJ4.jpg https://pbs.twimg.com/media/FaWHN2oX0AACDC6.jpg

https://pbs.twimg.com/media/FaYtV6WXEAkTG83.jpg https://pbs.twimg.com/media/FaYtV6SX0AMHbYh.jpg https://pbs.twimg.com/media/FaYtV6YXkAE0FtH.jpg

https://pbs.twimg.com/media/Fac0K14aAAIdU5p.jpg https://pbs.twimg.com/media/Fac0K13UEAEb2eq.jpg https://pbs.twimg.com/media/Fac0K1xUEAY_ZAR.jpg https://pbs.twimg.com/media/Fac0K19akAUgzk1.jpg

https://pbs.twimg.com/media/Fai8ou4WAAApNaD.jpg https://pbs.twimg.com/media/Fai8ou3WIAU2AOs.jpg https://pbs.twimg.com/media/Fai8ovXVUAAE2K_.jpg

https://pbs.twimg.com/media/FajBEnPXwAAceIK.jpg https://pbs.twimg.com/media/FajBEnuVsAA4885.jpg https://pbs.twimg.com/media/FajBEnOWQAA9x7C.jpg https://pbs.twimg.com/media/FajBEnwUcAADGMX.jpg

https://pbs.twimg.com/media/FajjhfWX0AAVBNg.jpg https://pbs.twimg.com/media/FajjhfTWQAED_kB.jpg https://pbs.twimg.com/media/FajjhfUWAAApBjT.jpg https://pbs.twimg.com/media/FajjhfVXoAAKoK-.jpg

https://pbs.twimg.com/media/FalEEs-WAAAHo_J.jpg https://pbs.twimg.com/media/FalEEs-WIAEtrUW.jpg https://pbs.twimg.com/media/FalEEs_WIAAIAlP.jpg

https://pbs.twimg.com/media/FalQzNPX0AAFUvr.jpg https://pbs.twimg.com/media/FalQzNNWYAAj4ed.jpg https://pbs.twimg.com/media/FalQzNOXkAEFB5F.jpg

https://pbs.twimg.com/media/Falk4AyXEAArsb7.jpg https://pbs.twimg.com/media/Falk4AyXgAEQUeO.jpg

https://pbs.twimg.com/media/Fal0dGKX0AIK4w6.jpg https://pbs.twimg.com/media/Fal0dGGWQAABXxd.jpg https://pbs.twimg.com/media/Fal0dGKXoAAiTqC.jpg https://pbs.twimg.com/media/Fal0dGIWQAAo3GU.jpg


nitter.net


This is really cool, commenting so i can easily find this later to check out


(Unless your comment adds to the thread, please get familiar with the how your profile links to “upvoted submissions” or “favorite submissions” work — or find another way to log post of interest to you; other than a generic comment for yourself.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: