For me, a critical part of "doing a good job" is how the implementation fits into a larger system. ChatGPT is not very good at that.
I tried asking it to build a basic state management component for a TypeScript/React app. It offered a class-based component. I asked to use a closure instead. It offered a closure but skipped some of my other requirements that it has previously added (like types). I asked to add in-memory caching. It added caching but removed something else. I asked to create a context provider based on this component, it created a context provider but skipped some parts from the state management implementation.
Basically, it sort of works if you can hand-hold it and pay attention to every detail. But that barely saves me any time. And the code it generates is definitely not production ready and requires refactoring in order to integrate it into an existing code base.
A good trick here is to use the API rather than the web interface and give it access to functions that let it edit files directly. Then you don't have to do the iterations where you ask it for something new and it forgets something old, it just updates the code in place, leaving all the old stuff and changing/adding the new stuff.
Not the GP, but I've been working on an open platform [0] for integrating OpenAI APIs with other tools (VSCode, JupyterLab, bash, chrome, etc) that you might find interesting, the VSCode integration supports editing specific files / sections etc.
Also worth taking a look at Github Copilot Chat[1], it's a bit limited but in certain cases it works well for editing specific parts of files.
Yes. Cursor at cursor.sh. I'm happily paying for it, and it works great giving answers based on your codebase. It generates great inline code, but doesn't have file and multi-file generation (yet?).
The way to tackle that is with RAG and local embeddings so that you can set examples for the code conventions you prefer. ChatGPT isn't going to do it without a lot of manual copy/pasting, and most tools I've seen are not that great. I just use a bunch of hacked together scripts and a local embed db with my code as an assist and it's worked pretty well.
I'm going to go out here on a limb and say... maybe you're doing it wrong.
ChatGPT is very very good at what you're describing (integration between well defined interfaces and systems in code), especially GPT4.
If you get one bad result, does that mean it sucks? Or... does it mean you don't understand the tool you're using?
The power of AI is in automation.
What you need to do is take your requirements (however you get them) and generate 100s of solutions to the problem, then automatically pick the best solutions by like, checking if the code compiles, etc.
...
This is a probabilistic model.
You can't look at a single output and say 'this sucks'; you can only (confidently) say that if you look at a set of results and characterize the solution space the LLM is exploring as being incorrect.
...and in that case, the highest probability is that your prompt wasn't good enough to find the solution space you were looking for.
Like, I know this blows people's minds for some reason, but remember:
prompt + params + LLM != answer
prompt + params + LLM = (seed) => answer
You're evaluating the wrong thing if you only look at the answer. What you should be evaluating the is answer generator function, which can generate various results.
A good answering function generates many good solutions; but even a bad answering function can occasionally generate good solutions.
If you only sample once, you have no idea.
If you are generating code using chat-gpt and taking the first response it gives, you are not using anything remotely like the power offered to you by their model.
...
If you can't be bothered using the api (which is probably the most meaningful way of doing this), use that little 'Regenerate' button in the bottom right of the chat window and try a few times.
That's the power here; unlimited numbers of variations to a solution at your finger tips.
(and yes, the best way to explore this is via the api, and yes, you're absolutely correct that 'chatGPT' is rubbish at this, because it only offers a stupid chat interface that does its best to hide that functionality away from you; but the model, GPT4... hot damn! Do not fool yourself. It can do what you want. You just have to ask in a way that is meaningful)
I would say where it is beneficial to use thr ChatGPT UI is with the Data Analysis mode, where it has access to a code environment. You upload a few files, ask it for implementation and it will happily crunch through the processes, validate itself via the REPL and offer you the finished files as downloads. It's pretty neat.
I tried asking it to build a basic state management component for a TypeScript/React app. It offered a class-based component. I asked to use a closure instead. It offered a closure but skipped some of my other requirements that it has previously added (like types). I asked to add in-memory caching. It added caching but removed something else. I asked to create a context provider based on this component, it created a context provider but skipped some parts from the state management implementation.
Basically, it sort of works if you can hand-hold it and pay attention to every detail. But that barely saves me any time. And the code it generates is definitely not production ready and requires refactoring in order to integrate it into an existing code base.