Hacker News new | past | comments | ask | show | jobs | submit login

I did a similar exercise recently when I needed to make a fairly basic rest API and CRUD frontend using 2 frameworks I wasn't particularly familiar with. I used GPT4 to generate ALL the code for it. I'll write a blog post about it soon, but a quick overview was:

I suspect it was slower than just writing the code/referencing the docs, and would be much slower than someone could do if they were experienced with the two frameworks. I had to be very specific and write a few long and detailed prompts for the more complex parts of a the application. It took around 5 hours to make the application, with a lot of that time spent sitting waiting for the (sometimes painfully slow) ChatGPT output. In a framework I'm more familiar with I think I could have easily got it done in under 2 hours

It was definitely useful for making sure I was doing it the correct way, kind of like have an expert on call for any questions. It was also very useful for generating perfectly formatted boilerplate code (some frameworks have CRUD generation built in, but this one did not).

It was a fun experiment, and I found it useful as a learning/guiding/generation tool, but I won't be using it for general day to day development any more than I currently do. For most instances it's quicker to just learn the framework well and write the code yourself.




> It was definitely useful for making sure I was doing it the correct way, kind of like have an expert on call for any questions.

I've found it to be shockingly good at this. I end up asking a lot of questions like, "what is the best directory structure for a project on {foo} platform?" or "What is the idiomatic way to do {x} in {language y}?" It has the advantage of having seen lots of projects in every language, and for some questions that automatically leads to a good answer.


Be careful and don't trust all it says. Sometimes it invents API functions which are not there, or doesn't see existing. And always very confident till you point it.


I'm finding that code is the area where hallucinations matter the least... because if it hallucinates an API function that doesn't exist, the mistake becomes apparent the moment you actually try to run it.

It's like having an automated fact checker! I wish I had the same thing for the other kinds of output it produces.


It will, however from time to time insert lines and variables that do nothing, but could result in bugs or confusion if not removed. I’ve encountered these hallucinations a few times. Overall, I agree with your sentiment, but I think it’s important to note that running isn’t always the indicator or correctness we think it to be.


You're still supposed to read it, like you hopefully wouldn't blindly paste a big code block from SO. A useless/unused line or variable doesn't seem that hard to spot?


Of course, but ease of detection can vary relative to the complexity of the code being returned. GPT-4, correctly prompted, can produce some pretty complicated stuff. But it also hallucinates in ways that are more subtle than one might think. The example I’m thinking of, it created an unused variable in a set of fairly complex ML training set up scripts that I mostly caught because I was familiar with all the proper inputs. But the unused variable was quite plausible if you were not familiar, new to the domain etc.


Compilers automatically detect unused variables. Unused variables are the last of your problems. You should be far, far more worried about all the misused variables.


I've noticed some other funny things, maybe harmless but undesirable.

One example was really really hard to spot. Once I queried GPT 3.5 for a function to do X, and it did pretty well, when I looked closer though, it had wrapped 90% of the code in an unnecessary if statement. I looked at the code an thought something was off until I realized.

My point here is, if that was easy to spot, who knows what else people are missing because even in a simple case, unless you're actually trying to spot issues, you likely won't see them.


Don't IDEs highlight variables that aren't read/written to?


In this case “unused” might mean declaring or initializing a variable and then assigning or reassigning a value to it later. In this case, it is technically used, so most linters won’t pick that up. But it actually does nothing so it’s wasted cycles.


No decent compiler would waste a cycle on that.


If it imagines a function it's fine. You can fix it in one prompt. But when it doesn't see it may move in the wrong way and produce limited solution. And you wouldn't know.


You would know in a similar way to a colleague of yours having done this code the previous day, or you joining a company with legacy code. It either has tests or you can't really trust it anyway regardless. You can ask the AI to also write tests for you, inspecting tests is usually much faster than understanding all the nuances of the code.


> because if it hallucinates an API function that doesn't exist

Yes, absolutely agree.

And also I’m no fanboy but when it does this and I only notice because the code doesn’t run, half the time I’m thinking to myself that the API function really should exist because it’s so logical!


And then it’s like, “thank you for telling me about this, I’ll remember that for next time,” which is how a human ought to respond but not how ChatGPT actually learns.


Hah, yeah that's so frustrating. It's memory is reset every time you start a new conversation, but it doesn't make that at all clear to people.


It also resets inside a single conversation if you go beyond a certain number of tokens, and as far as I know there is no warning whatsoever to the user when it happens.


The obvious next thing to try would be to continuously fine tune the model on these conversations, so it actually fulfills the promises most of these models make about "learning continuously". I haven't yet seen any actual implementations of this, though. I'm sure someone's tried it, I wonder how it went.


Which models make promises about "learning continuously"?


I’ve had statements along that line from Bing, dolphin2.1 and I think maybe Wizard Vicuna.


Often I find that the hallucinations is how the API or lib should have been if it was more sane. Maybe someone could turn this into a virtual API critique.


I see it all the time. Even more annoying is when the API exists, but in a different class or with different parameters and outputs than you need, and the model would claim it does exactly what you need right until the time you try to use it and discover it can't work in your case.


I always get the classic “It really depends on your use case and neither pattern is exactly better than the other” when asking gpt about programming patterns


Put something in your system prompt along the lines of "don't waffle"


it has such a completionist fetish, I have found it works much better when you basically tell it to pick a side and not go out of its way to be balanced.


Even when it says that it always continues with "but" and gives me an answer.


this is how exactly how i use gpt4, i find it very useful


I use Sourcegraph a lot for this when GPT can't get a satisfying answer.


>I had to be very specific and write a few long and detailed prompts for the more complex parts of a the application.

This is my experience. You still have to understand programming: you're just typing it out in Natural English.


Conversely, doing so has helped me flesh out my thoughts on many occasions. As I ran into obstacles with errors or imprecise prompting, I realized my design had issues or edge cases I hadn’t take into account. Perhaps it would be better if I wrote out several paragraphs describing my intentions before taking up most coding tasks, but I hardly think my boss would be in support of this!


Like a rubber duck which provides feedback. but since it's coming from a rubber duck it's good to verify that duck knows what it's talking about.


> Perhaps it would be better if I wrote out several paragraphs describing my intentions before taking up most coding tasks

I often do things like this, and if the scale of the document was somewhat larger than you describe, it would be a design document.


My role unfortunately doesn’t allow for this. I do more “in the trenches” data science stuff. The value is quite obvious to me for a role where there is more space for this or in product planning.


Yes exactly. I had to be very specific and tackle the project in the same way I would if I was fully writing the code. First data schema, then models, then controllers with CRUD and views, then routes, then authentication, then test cases, then TS transpiling, etc...

It's definitely not something someone with zero coding experience could easily do, and I feel even a junior developer would struggle with anything even as complex as this. You have to have the concept and structure in your head, it's just writing the code for it.


> You have to have the concept and structure in your head, it's just writing the code for it.

I wonder how far you'd get with the technique of asking ChatGPT to lay out its plans first, kind of like the improvements in math questions you see when you ask it to write down its reasoning before committing to any answer.

"This is what I'm looking for: XXX. What are the different pieces of this that I'm going to need to create?"

"Ok, the first thing on your list you gave me was a database to store the data. What would the structure of that database look like?"

"Can you give me the code to create that database?"

Etc etc. i.e putting the actual code as the last step each time.


I suspect it would work surprisingly well! But still, I think you would have to have a fairly good concept of coding to do even that. For example, my mum who has zero concept of what a database is (beyond the dictionary definition) would not be able to piece together an application like this (sorry mum!). But a junior developer would probably be able to just about do it depending on complexity.

For me, using GPT felt very slow, but for a junior it might actually be faster than trial/error and Googling. Also, ChatGPT is only going to get better and better, so we can expect it to become quicker and easier to do such things.


This is the best form of prompting for generating code. You tell it to first generate a technical spec for solving the stated problem problem, consider multiple options, return for your review. You then use a trigger command like “build” to then implement, once you’ve specified any changes.


That's a great fit for scenarios where you do know programming but don't know the particular language and framework on which you suddenly have to do some maintenance or improvement.

For many things the available documentation is poor, and asking a bot is much more helpful.


What I think you're overlooking is that most people can only do a few hours of hardcore coding at peak productivity a day (3hours for me, maybe)

So you could spend 3hours babysitting GPT4 to write some code for you, but then you'd still have 3 hours of peak code productivity that day that you haven't "used up" yet


I’m the opposite, personally. I can code for 5 or 6 hours just fine if I’m “in the zone”, but I can’t deal with LLMs for more than an hour or two max, usually less. I find their sweet spot is when I need to ask one or two questions with one or two follow-ups, 5-10 minutes ideally. They can sometimes be a big win in small doses if you can keep it to that kind of interaction. But they are just draining to interact with for longer periods. For me they’re a lot like being a TA stuck in multi-hour office hours with undergrads once you get past a few questions. Just a really shitty slog.


If it were me, babysitting GPT4 would still spend my peak code productivity credits, as it's basically coding (in natural language).


It’s thinking either way. I would even wager that the trivial code that GPT writes may be easier to read for me, than some convoluted, human language description of the same thing, done with numerous corrections at every point.

The relative uniformity of code is a positive for human understanding as well, e.g. a^2+b^2=c^2 is easier to parse/transmit the idea over any “spoken” version of the same thing.


True. Although I did find the writing of prompts to be quite exhausting due to tedium and waiting for the output to be frustrating, so that would dig into my energy for peak coding. I would say it uses less concentration, but about the same amount or even more effort. But also it was a very narrow test, maybe for certain things (especially repetitive code or boilerplate code) it could be very beneficial.


> I used GPT4

Do you use the ChatGPT Plus version or the API? If the API, what do you usually use to access it?


You should actually time how long it takes you to write it yourself rather than guess. The results may surprise you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: