Building games through natural language using OpenAI’s code-DaVinci model

kromem · on March 18, 2022

This is one of those things that's so incredible and mind-blowing I really want to share it with friends or family, but WHY it is so impressive is locked behind a high enough sophistication that it would mostly be lost on them.

Having written a script a decade ago about a future in which software issues would be solved not by debugging or programming, but by finding the right way to communicate concepts to AIs, it's wild to see those nuances emerge.

One of the most interesting details in the post is the bit about asking for a function to create an array rather than the array itself.

Another was its existing 'semantic' (even illusory) knowledge of the Matrix rain.

It's going to be wild seeing this develop over the next few years. I'm sure we'll soon be seeing: specialized discriminators acting as code linters (even for human produced code), efforts at having GPT-3 write more modular instructions for Codex from generalized statements, and a recursive refinement as Codex plus the selection process of humans supervising it re-enters the open source dataset which will go on to train future iterations.

The thing it seems so many evaluating the tech right now overlook when predicting its future is the compounding rate of improvement as opposed to the more linear rates common across past technological parallels which relied on limited human resources.

qayxc · on March 18, 2022

I am still greatly disappointed by the insistence on end-to-end black-box models.

In the end, as impressive as these results are, they are fundamentally trending in the wrong direction. All the benefits and certainty (e.g. security, correctness, and reproducibility) provided by theorem provers and model-driven systems are thrown out the window in favour of fast but potentially wrong or insecure results.

The worst part of this development is the psychological aspect - humans have a tendency to rely on machine generated results and view those as superior. The disconnect between working code and correct or secure code respectively is going to widen using this approach.

A glaring example is found in the blog post: the image manipulation example (7.) contains an error that the author failed to even recognise or mention. Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern. This is clearly neither a mosaic nor image manipulation.

It is a very impressive tech demo, but generating actual software that can be trusted and rigorously checked against requirements will end up using a formal description (i.e. programming language, theorems, or modelling akin to UML) anyway.

charcircuit · on March 18, 2022

>in favor of fast but potentially wrong or insecure results.

Formal methods don't eliminate wrong or insecure results. Formal methods tell you that a program matches the specification when certain conditions are true eg. no bit flips, the computer does not crash, the kernel doesn't kill your process, that allocations succeed, that your program can make progress (the kernel can decide to never schedule your program), etc. You can have bugs in writing the specification where the specification does not match your intention. Even your intention of a system may have vulnerabilities in it. If your specification can't generate code, your code might not match the specification.

Using formal methods slows you down compared to things like testing which can get us most of the way there in less time. Systems can be designed to be robust such that if a machine fails the system keeps on running. If a end-to-end black-box model can get you most of the way there with a sufficiently low number of bugs it may be worth it to use. Time is a limited resource and being 100% correct is not necessarily better than being 99% correct and having extra features.

>The disconnect between working code and correct or secure code respectively is going to widen using this approach.

Not really. People are not going to just start ignoring bugs when they run into them because the software they are using happened to be machine generated.

>Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern

It worked fine for an image I just tried. Just like the prompt it "convert[ed] the image to a 32x32 mosaic." There was no checkerboard, but it may be worth noting that it converted transparent pixels to black.

qayxc · on March 18, 2022

> Formal methods don't eliminate wrong or insecure results.

Yes, they do. That's the entire purpose of a proof. Of course formal methods cannot prevent or even detect wrong specifications, but that's no different from generated code either.

> Using formal methods slows you down compared to things like testing which can get us most of the way there in less time.

But that's the point - formal methods are slow if people have to apply them. Automated theorem provers exist and can work on generated ASTs, so why not add the step and create a hybrid system that verifies the generated result?

>> Instead of turning the uploaded image into a mosaic as intended, the generated code simply creates a fixed-size black-and-white checker-board pattern

> It worked fine for an image I just tried. Just like the prompt it "convert[ed] the image to a 32x32 mosaic." There was no checkerboard, but it may be worth noting that it converted transparent pixels to black.

The code may have been changed - it works for me now, too, when before it definitely didn't.

charcircuit · on March 18, 2022

>Of course formal methods cannot prevent or even detect wrong specifications

A wrong specification can give you a wrong or insecure result. That was my point. Formal methods aren't a sliver bullet and your system still needs to be robust to failures.

>so why not add the step and create a hybrid system that verifies the generated result?

Because the time spent writing a specification is time wasted if there ends up being no issues with the generated code.

dibujante · on March 18, 2022

People are going to generate black-box code, launch it as an MVP, and then hire engineers to iron out bugs or even do whole rewrites if the product gets traction. It's all going to just fit back into the same standard model.

jgalt212 · on March 18, 2022

There's no way to un-black box it. There's just too many parameters. Call it black box or lack of model explainability. It's effectively the same thing.

bitforger · on March 18, 2022

Interesting how you still need to have some intuitive sense of what's going on under the hood. (You can't say "make Zelda," you have to ask for an array of symbols and manipulate them.)

In that sense it feels like this is still programming, but at a higher level of abstraction with a weird fuzzy compiler. Now we can go from natural language -> JavaScript -> assembly etc. rather than just the last two.

Mediocre programmers use APIs, while good programmers know what's behind the curtain and can debug them. I suspect this will stay the same, no matter how many layers of abstraction we add.

ShamelessC · on March 18, 2022

> Mediocre programmers use APIs, while good programmers know what's behind the curtain and can debug them. I suspect this will stay the same, no matter how many layers of abstraction we add.

The skill of both such categories (API developers and developers who use API's) is defined by the ability to know the _least_ amount of complexity needed for a given set of requirements. You may be appealing to some "deeper" sense of what it means to be a programmer, but in terms of what companies are willing to pay - if you get the same job done in a way that is easier to do in the future, you should be rewarded for that, because it saves your own time and the time of anyone who will need to work on that program in the future.

I think this is (only mildly) lacking in nuance. The ability to use AI for this task is surely limited at the moment - and people who know more about programming are certainly more capable of using these systems. As we go forward though, it's important to be able to admit that if an AI can produce a solution faster (and you have easy access to said AI, not a given), then you may be wasting time trying to "roll your own" in pursuit of being a good programmer.

On the other hand, until this AI-assisted experience is democratized, you're correct that it is a good idea to have engineers around who know this stuff from first principles. For now, I'm not terribly concerned that those folks will go away.

andybak · on March 18, 2022

> Mediocre programmers use APIs, while good programmers know what's behind the curtain and can debug them.

Personally I'd like to not know what's behind the curtain until I have to. Which category does that put me in?

Filligree · on March 18, 2022

Probably in the second category, if you’re able to look behind the curtains on demand. A lot of programmers can’t.

michaelbrave · on March 19, 2022

Programming will never go away but those ease of use abstracted layers could bring in a new crowd so to speak. Much like how graphic design became much more common place and somewhat easier to learn once photoshop became common, experts still exist but you also get people in a garage making T-shirts now when before it wasn't much of a thing if that makes sense. An easy to use higher abstracted layer of coding could do similar, and create a new kind of less technical programmer class.

verisimi · on March 18, 2022

Yes.

Could AI generated code mean the death of coding?

I'm wondering if, ultimately, you can get rid of the language as a part that you think about at all. Why not allow the AI to create a language that best suits it? Perhaps this would be hard to read for a human, but who cares? In fact, this would be a good thing for whoever owns the AI.

The issue then will become - as you say - to represent the problem well at a higher level of abstraction. Representation of the problem and knowing what a 'right' answer should be.

qayxc · on March 18, 2022

> Could AI generated code mean the death of coding?

A rose by any other name...

"Coding" is the formalisation of ideas, algorithms, requirements, and constrains. This task is and will continue to be challenging. Whether you use "prompt engineering" or a formalised language doesn't matter all that much.

I'd be more interested to see whether such black-box model can solve programming problems like returning the best (by some criteria) N items from a read-only medium using a limited amount of resources (e.g. x amount of RAM and t milliseconds).

Given the immense amount of training data it's hard to distinguish a clever search method with some mixing and matching (i.e. copy-paste-programming) from general problem solving abilities.

trh0awayman · on March 18, 2022

This - along with GPT - are great ways to create originality detectors, something desperately needed.

The generators get all the attention, but we should be finding ways to use these as discriminators, so that we can find innovative and original projects.

I would love to get a list of Github repos or Steam games ranked on originality/chronologically. Things that are innovative within their own time. There are people making fascinating things, but it takes days, weeks, months to comb through the wreckage to find them.

I have no faith that these models will ever write Slaves to Armok 1 or Finnegans Wake or Dead Stars or original works in their own time - but I think detecting them might be within reach, which is far more useful currently (or at least within my lifespan).

I also think that human programming languages look cool for a demo - but ultimately, there should be programming languages that neatly interface with NNs or whatever - rather than pure text manipulation. I'm sure a lot of resources get sucked up into that alone, modeling syntax, etc. There needs to be a programming language that AI would use, probably directly manipulating an AST of sorts (unless I misunderstood this model, and it's already doing that).

Qworg · on March 18, 2022

Original != Good - you'd need a discriminator for "goodness".

0ldskool · on March 18, 2022

I don't think the poster said anything about goodness. Goodness is subjective, originality implies creativity. I personally also want to see more original works in books, movies, tv, and games. After getting into any medium for awhile you realize that most stuff is just a rehash of existing versions & ideas you hadn't come across before (usually because its just a bit older).

Seeing something that could be argued to be "bad" or "dumb" yet completely unique would be my preference.

DantesKite · on March 18, 2022

I'd be curious to see what the upper limit of this is. Could it for example, be trained to optimize video games? I think of the magic fast inverse square root optimization in Quake that dramatically reduced the cost of calculating angles.[1]

I bet there's all sorts of non-intuitive optimizations one could do in modern video games that are otherwise too tedious for most programmers to perform.

[1] https://en.wikipedia.org/wiki/Fast_inverse_square_root

Cthulhu_ · on March 18, 2022

> Could it for example, be trained to optimize video games?

In a sense, it already is; nVidia's DLSS [0] and AMD's FidelityFX are AI-driven technologies that allow games to be rendered at a faster, lower resolution, then using AI / ML technology to upscale it to HD or 4K resolutions without upscaling artifacts; the technology fills in the blanks based on a lower resolution frame. Apparently applying the AI upscaling is faster than rendering at full resolution.

[0] https://www.nvidia.com/nl-nl/geforce/technologies/dlss/

[1] https://www.amd.com/en/technologies/fidelityfx-super-resolut...

avaer · on March 18, 2022

It can't do that from scratch yet; these kinds of optimizations require nontrivial mathematical understanding and informed judgement of trade-offs.

But it is capable of knowing your function is an inverse square root and inserting a known optimized version.

mark_l_watson · on March 18, 2022

Cool article.

While I have been using GPT-3 via OpenAI’s APIs for about a half a year and I very much also appreciate using GitHub’s CoPilot because it saves me time, I wish for much more research into hybrid AI systems that are multi paradigm: deep learning, symbolic AI, new types of RL learning, breakthroughs in scaling conventional search, etc., etc.

There is so much work to get to the point where AI systems can effectively do counter factual reasoning, autonomously develop better models of the world, etc.

Symbolic AI as I learned it in the 1980s and deep learning in the last ten years are all great first steps, but we have a long way to go. Assuming parallel work in AI ethics, I don’t think there are any real limits on how much this technology can improve our lives.

adamgordonbell · on March 18, 2022

I've been playing around with using gpt3 as a research assistant and it can work surprisingly well.

It's tricky to get the prompts right I think and you won't necessarily get novel insights, more like the distilled common wisdom of an area.

You can ask it to pretend to write the response to a subreddit. And you get an approximation of a subreddit filled with the type of experts you want, instantly answering your questions. Although they occasionally just spout non-sense.

cl42 · on March 18, 2022

Interesting.

Anyone have ideas as to why this works so well with JavaScript specifically? I tried to include similar commands in Python (i.e., use a prompt that implies Python based on commenting style) and it doesn't even write code, but instead keeps adding new comments.

ngcc_hk · on March 18, 2022

Read the first page. Seem interesting but not interest in shooting game.

Given that game of life is … can it generate that abd sone if the patterns.

Also can it play go, chess, bridge … etc.

If not is it inherent or just this model.

Not a game developer and hence just question.

spupe · on March 18, 2022

Spooky. Is there any existing tool that can do anything close to this at the moment?

Would have liked for the author to discuss a bit more the time spent optimizing the input, and his success rate.

monkeydust · on March 18, 2022

Yea this. I remember doing some demos recently at work using OpenAI Codex and showcasing how easy it was to write SQL and Python given some natural language requirements.

The bit I didn't really say (I was working a particular angle!) was that I spent a fair bit of time on the prompt design. Changing a word here or there could lead to a drastically different outcome. Over time I got better at learning how to engineer the prompts so the code fulfilled my intention but it was a learning process for sure.

cleerline · on March 18, 2022

is there anyway I can try this for myself? that is take the instructions and get the in quotes compiler to output the game

NiekvdMaas · on March 18, 2022

It seems to be generated using this: https://beta.openai.com/codex-javascript-sandbox

stavros · on March 18, 2022

I'd imagine so, OpenAI has a playground and API on their site.