Hacker News new | past | comments | ask | show | jobs | submit login
Building a back end using only OpenAI Codex (codeball.ai)
88 points by zegl 3 months ago | hide | past | favorite | 35 comments
I've published the sources for the code generation and the code that was generated on GitHub: https://github.com/sturdy-dev/codeball-todo-mvc

I've been experimenting with merging prompts together, with a goal to write the full backend in a single prompt.

On the form:

> 1. Setup a flask web server

> 2. Add a /add endpoint

It works reasonably well, but it seems like it's loosing some precision in the prompts... The person that coined the term "prompt engineering" was right, it's really important to learn what words to use to get the AI to do exactly what you want it to do.




This needs a new format for source code. We could call it, oh, Literate Programming. Check in only the prompts to version control, expand them into code during CI, then file bugs when new releases of OpenAI cause regressions.


We cannot expect the code expansion to be deterministic with regards to the prompt without a severe reduction in AI capability.

The utility of these prompts comes primarily from the fact that the AI is aware of a huge amount of context and can therefore infer what a prompt is "meant to do." If a prompter had to exhaustively specify the context it would be no different than coding in any normal programming language.

That context necessarily changes over time. The same sentence 10 years ago might easily have a different contextual meaning than it does today.


I think there is a middle ground. My work on a deterministic, ai-free code gen tool uses DSLs in a more abstract, declarative space. The details are handled in the templates and extra config from the input.

Prisma, Atlas, and OpenAPI-generator are similar, with increasing complexity of input and DSL, respectively.

I do like your point that context of natural language as input can change over time. I imagine it also would if trained on different source code, or even different target languages and technologies.

I'm thinking that these AI could be simplified if their target was one of these middle ground abstractions in a DSL, letting fewer (expert) humans write the code via templates


Like we just write gerkhin / cucumber all day?


It's very cool, but from an auditing perspective, it's a nightmare. As a reviewer, I can't reason about the code in the same way that I could reason about human code, since there is no coherent formulation of the accomplished task. I can't say "why did it apply CORS to the entire flask app?" and expect reasoning that will fulfill my objective as a reviewer.

So while it could help blast out large swaths of code quickly, it still needs an expert at the wheel to be accountable for the changes to reviewers.


> I can't say "why did it apply CORS to the entire flask app?" and expect reasoning that will fulfill my objective as a reviewer.

I'm not saying you're wrong, but can't you just ask the AI to include a comment explaining why it chose to apply CORS to the entire app? You can just keep asking it questions and maybe its reasoning would check out for most of them.


> just keep asking it questions and maybe its reasoning would check out for most of them.

But the AI isn't reasoning... is it? Perhaps it could give an explanation, but you couldn't (currently) conflate that with any actual understanding of why it did what it did?


It's not reasoning like a human, but if it's using code it memorized from the past it might be able to string along comments it memorized from the past into something that is relevant for the context it's being asked for.


As was the case before.


This is cool. Some thoughts:

> beware: sometimes Codex writes code vulnerable to SQL injection. When that happens tough, I was able to prevent it by adding "safely" to the prompt.

Oops.

Anyway, is "the program" actually the prompts? Should that be committed into source control, so future you and others can figure out how the code was built? How long will it be until we can trust Codex enough that the Python code doesn't need to be committed? "Codex, create an Android CRUD UI for this OpenAPI document."


I wonder if the vulnerability could be detected at the embedding layer.


Codex, create a dating site around the stable marriage algorithm. Kthxbai !


This is mindblowing. The copilot autocomplete was very impressive but actually editing the existing code is incredible.

I really want to see some examples of failed prompts and attempts to ask it to cache sqlite connections.


I like how the captions scroll along to match the code samples, it's a fun reading experience.

Got scared for a second (most of what I code is CRUD backends!), until I tried to see it from the perspective of a novice, where all of this is impenetrable anyway.


Why did you use sveltekit as the front end (as opposed to just svelte)? Typically SK is used when you want to have both front end and back end in the same app.


It's mostly what I'm used to these days, codeball.ai is written in SK. I didn't end up using it, but SK also has a nice client side router!


The typo in the submission (like it's loosing [should be losing] some precision) is both inadvertently amusing (losing precision could well be described as being loose) and raises the question of how Codex would deal with missed typos in instructions.


My top-level comment is about a possible code typo that would appear much more serious

https://news.ycombinator.com/item?id=32587425


The code on GitHub does not exactly match the post. In particular, the last section about adding seed data is shifted up a few lines on GitHub, into the first database call, making me wonder if it was a stored procedure or a bug.

Did you have to correct output for the post?

https://github.com/sturdy-dev/codeball-todo-mvc/blob/main/ap...

https://github.com/sturdy-dev/codeball-todo-mvc/commit/17992...


Nice find! I tried to be careful to make sure that everything aligned.

The prompt used for the post seems to have been "before_first_request, before conn.close: if the tasks table is empty, add three rows"

I'm updating the post and the sources!


I think a video going through the series of prompts would be super interesting too


Any instructions on how one goes ahead and play with openai codex themselves?

Is this closed? Beta? or .. ?

As a kid I used to dream to talk to the computer and it would make code happen as a repl. This appears to be close to it.


If you're a student, copilot is free I believe


> it's really important to learn what words to use to get the AI to do exactly what you want it to do.

Now instead of spending years learning to do all that nasty troublesome coding you can just spend years learning to exactly phrase what you want the code generator to do. Wait.... is this an infinite loop joke?


Kind of, sometimes coding with Codex is like having to debug/code review code from a developer that has just been through a 1-month "learn to code" bootcamp.

Nothing against bootcampers, is's just that their output is non-intuitive...


How much time did it take to wrote app?


It’s hard to say, since I was writing the blog post in parallel as I was making the app. But not too long, maybe an hour or two? I’m not a Python/Flask developer, so I guess that’s not too bad.


You can also use gpt3 to write most of the blog content. I'm willing to bet soon this is going to be everyone's workflow.


Why would I want to read a blog post written by an AI about a program written by an AI besides novelty? The moment an input in a medium becomes primarily AI-generated is the moment I stop using said medium.


Because you would not be able to distinguish whether a paragraph is written by AI or not. It's not novelty, it's utility.


As I said, if I can't distinguish it, I will stop reading "it" at all. There is no utility in reading text that is not human-generated.


Thanks for info. I am gathering information about efficiency of using code generators. Btw. Nice work!


I have a non-AI code generator if that would be of interest to your info gathering. It's definitely meant for developer efficiency over the entire software lifecycle. Happy to tell you all about it if you are doing interviews, calls, or the similar.

https://github.com/hofstadter-io/hof | https://docs.hofstadter.io


yup its really cool


This is something interesting. Thanks for posting such content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: