Hacker News new | past | comments | ask | show | jobs | submit login
How much would it have cost if GPT-4 had written your code (pypi.org)
65 points by yodon 6 months ago | hide | past | favorite | 89 comments

It's how much it would cost to produce the text files in your code. I think that most software engineers will agree that writing the code is the easiest part of the job. The hard parts are figuring out the problem, figuring out your data modeling, what is going to work and what it would cost, considering tradeoffs in speed, scalability, maintainability, etc. And, you know, getting the thing to actually run.

There's no doubt that AI is going to be a huge factor in coding. But we will have to change a lot of things for it to outright replace humans. I'm sure some people are already working on those changes...

The hardest part is supporting the platform for 7~10 years. With those lifetimes, writing code is like 5% of the effort, the rest is supporting it for years and handling all the nasty bugs that show up when thousands of engineers pound on it 24/7. They don't teach you about that in college, and I wish they did, but I doubt anyone would take a class on how to support software even though it's the most important part.

I’ll add that 7-10 years is rather on the short side compared to the systems I’ve been involved with in my career. 20 years isn’t that rare, and the more interesting challenges tend to happen after the 10-year mark.

It’s really only important to the companies that can afford to teach it. If you’ve got a product that’s lasting to stick around for 10 years - it’s probably a cash cow. You can afford to teach engineers at that point.

For a lot of the companies out there that are just trying to get off the ground - all this talk of supporting the system long term is irrelevant outside of major system crashes or whatnot.

I’ve got code barely touched in 15 years still in day to day production use.

Let’s see how it does when it’s constantly worked on to add new features, etc.

Features are the problem.

Stop adding features.

Make packages as best you can, such that, If you really want something added on, it's a separate package to add on.

That is a maintenance nightmare

And every few months someone wants to switch to a new framework or module for one specific reason and then the codebase is riddled with 10 different ways to do the same thing.

Why would it need new features. It was written to solve a business problem. The problem is still there, the solution still works.

I'm jealous. I'm sure there are lots of codebases where "write once and forget about it" is the lifecycle...

Isn’t it nice when requirements don’t change! :)

Even all that, difficult though it is, doesn't include the cost if understanding the actual problem, knowing the field well enough to propose a good solution, and coordinating and communicating with everyone the whole way through.

"Every problem is a people problem", and people-solutions are something that AI still doesn't solve.

I'm not saying that AI doesn't have it's uses, it certainly does, but I'm trying to highlight that purely technical solutions very often aren't good solutions to a lot if real-world problems.

When AI truly becomes helpful in the workplace developer jobs will probably change a lot but I think the biggest change will be in the size of the business scope. Most projects I've worked in could easily grow ten times if developers were ten times as efficient.

At the same time I've always wondered why IT problems haven't all been solved comprehensively already because so many problems seem so similar.

I can't really fit those two narratives together.

All IT problems are 90% similar... it's the last 10% difference that is 9000% of the work!

I think part of the problem is human networking limitations. I can learn to code very well and can become a core contributor on a project. The problem is my understanding doesn't scale well. If I get hit by a bus tomorrow that project could die. I could leave the project and it could flounder. Now, lets say you add 3 people as good as me on that project... the problem is the project will grow 4 times as large leading to every one of us becoming a critical failing point. Any time you want to bring a new core contributor in, it's a massive undertaking of time and effort to get them up to speed.

Yeah this rhymes with my experience. AI could perhaps turn this dynamic on its head if we let it into all the code bases. Now it can perhaps explain the background for all the nooks and crannies in the code. It could chip into any discussion in the corporate or community environment to tell you why another team has designed the way they've done. It can potentially act as a mediator to make a more cohesive whole.

But that all also kind of sounds a bit scary. It has to be built in a way that inspires trust.

> But we will have to change a lot of things for it to outright replace humans

Like having AI in the first place. I still haven’t seen any example generated by GPT that is not a naive translation (perhaps between languages) of some code already in its training set in one way or another. Which is a cool thing, we get a much more intelligent (though factually less punctual) search engine for code! But we are very far from it replacing any programmer worth their weight.

Everyone is freaking out about AI writing “common code” yet we have reusable code in the form of libraries for decades - and now frameworks. You will never have to write code that does what the library does ever again.

The hard part of software engineering should be figuring out the problem, tradeoffs, etc.

But the reality is this field has multiple completely different groups of people calling themselves software engineers.

There's those who basically work at consultancies/shops that get paid to push out tons of client projects using a standardized approach and their job is to write a bunch of code. Or those that work on in-house systems that get paid to largely implement pre-scoped work.

Then there's those that are actually designing systems, improving them, iterating on them, etc. You know, actually "engineering".

The fact that coding itself is fairly easy and the proliferation of bootcamps that teach coding (and now ChatGPT) has flooded the market with individuals that know nothing of good software engineering practices let alone CS and systems fundamentals.

I expect that's the group most exposed to AI risk. Pumping out lots of shitty self-contained code is something LLMs excel at.

I could imagine an LLM that brainstorms all the edge cases, data modelling, algorithm, etc... The final decision will still have to be done by a human, but having a companion LLM that gives you ideas, plays devils advocate and looks for problems would definitely speed up the decision process.

Yeah, but then you need the knowledge, experience, and decision making to act on that feedback. LLMs are also notoriously bad at things like edge cases, etc. and more training data won't necessarily help.

Unless it leads you astray and that costs more time than coming up with the code on your own in the first place.

Wait, sounds like you've rolled out SAP before, I'd love to pick your brain for tips other than "don't".

> don't

Just don't

A 'Tree of thoughts' if you will...

The hard part is figuring out other peoples code.

This… is not what people would have said 10 years ago. Remember the idea of the 10x developer? Even at a design heavy firm, 80% of engineers despise writing the design docs.

Come on everyone, we all know that "most of the work that goes into software engineering isn't writing code", that "it's impossible to create a working system in one shot", etc etc etc, and therefore this tool will not give a realistic idea of what it would cost to build a system with nothing but GPT-4.

No one here (including the author) is under the impression that you can just ask ChatGPT to "write a Unix-like kernel" and get Linux spit out the other side.

That said, I am really curious what it would cost, as a theoretical lower bound, to pay OpenAI to write the Linux Kernel. 10 grand? 100 grand? I have no clue, but I'm glad someone wrote a tool to make it easier to find out!

>I am really curious what it would cost, as a theoretical lower bound, to pay OpenAI to write the Linux Kernel

Billions of dollars.

Eh, what? Why in the living hell would it cost that much!

Because the actual Linux kernel cost that much or more. Of course most of these costs were volunteer labor or born by someone testing out a failure on their own companies time. To greenfield an OS kernel as comprehensive as Linux and with as much hardware support as linux would be one of the more expensive human endeavors ever.

But wait, why doesn't Windows cost this much?

Oh, but it did. Of course Microsoft has thrown a huge amount of the costs of on to users and hardware development companies too.

Is this that hard of a tool? Why is it a package? Seems like something someone with a few lines of code could do. OpenAI is transparent and fairly open about pricing per token… it seems like it’s just adding bloat.

> No one here (including the author) is under the impression that you can just ask Chat GPT to "write a Unix-like kernel" and get Linux spit out the other side.

I really don't see such a thing being far off. GPT-4 is already pretty good at writing small modules ("write a disk IO queue for my custom kernel in C"). With a little more work in allowing GPT-4 to test out code it has written and iteratively make changes, allowing it to use debuggers, benchmarks, and sanitizers, allowing it to write its own tools, and then put modules together, I think we could very soon be asking it to "write me a Unix-like kernel".

It’s that good in your experience? Enough that it is close to writing something as nice as the Linux Kernel we have now? How close is close? Reading comments like these makes me feel like I’m swallowing crazy pills. It legitimately makes me feel like I’m using it wrong. I do Android and iOS native development and some IoT and it gives me really wrong code very often. To the point that I don’t see much difference between chatgpt and gpt4. But you say it’s close to just “write me a unix-like kernel”

Personally I couldn't even get it to draw me a regular pentagon in CSS. It was happy to draw hexagons and call them pentagons. It was even happy to go back and fix its mistakes when I informed it that it was making hexagons, not pentagons.

It, of course, readily accepted that I was correct and it had indeed drawn a hexagon, but this time it'd be different.

This time, it'd draw a pentagon.

And... repeat.

It really struggles with arithmetic, so that's kind of a worst case problem for it though.

Sure, but understanding you need to have 5 vertices to make a pentagon isn't exactly high-brow.

Certainly not compared to making a Unix kernel from scratch.

I don't think high-brow / low-brow is a useful framework for understanding these models. They are good at certain things and bad at others and those don't correspond neatly to what a human finds easy and hard.

It's not close and I also see no big difference between GPT 3.5 and 4. Don't get hyped. I am sure it will eventually happen, but calling it close is very optimistic.

Not close to as in "it can nearly write it correctly", but close to as in "I believe within a small number of years, gpt-4-like tools would be able to write you a unix-like kernel from scratch and have it actually work, with no human input".

I think we already have most of the pieces in place:

* big language models that sometimes get the right answer.

* language models with the ability to write instructions for other language models (ie. writing a project plan, and then completing each item of the plan, and then putting the results together).

* language models with the ability to use tools (ie. 'run valgrind, tell the model what it says, and then the model will modify the code to fix the valgrind error')

* language models with the ability to summarize large things to small.

* language models with the ability to review existing work and see what needs changing to meet a goal, including chucking out work that isn't right/fit for purpose.

With all these pieces, it really seems that with enough compute/budget, we are awfully close...

It could also be a case of Tesla's full self-driving that's perpetually 2 years away. Progress in AI isn't linear so you can't extrapolate based on historic data. We really don't know if we're just a small step away from AGI or if we'll be stuck with the current crop of LLMs, with only incremental improvements over the next decade.

That seems to also miss the intentionality that goes into some things in the kernel as well… I understand now you mean when a feedback of LLMs are improved on. I guess fair enough there, no idea if that will work till we see it. However I think the problem of a Unix-like kernel is a lot less trivial due to the human intentionality that goes into some choices as well as bit-banging optimization.

> human intentionality that goes into some choices

Many choices are made at design time to make the right tradeoffs between complexity, speed, etc.

But with AI-designed things, complexity is no longer an issue as long as the AI understands it, and you no longer need to think too much about speed - just implement 100 different designs and pick the one which does best on a set of benchmarks also designed by the AI.

Current AIs can’t reason about trivial, 2 years old human cognitive things, let alone multi-million lines of code bases.

This is a misguided conclusion, as they are entirely different systems. Don't compare apples and oranges like this or you'll be very disappointed.

Can any of these do any real reasoning? I feel this would just result in some Goldberg machine of producing terrible shit the big majority of the time, but may be impressive in some edge cases when the star constellations are right.

I agree. You can also go to to github.com/torvalds/linux click on code -> Download zip and have the Linux Kernel. I’m not sure what having ChatGPT write it for you would give you. Maybe 10,000 hours of frustration after you realize how far from the mark you really are?

as a hobby osdever, this:

>I really don't see such a thing being far off. GPT-4 is already pretty good at writing small modules ("write a disk IO queue for my custom kernel in C").

is completely delusional.

I tend to agree, though with perhaps a much larger tolerance in the number of years "not far off" is. What may be interesting is seeing the emergence of hardware specific bootloader's, kernels and OS, rather than using boot/Linux/etc with it's support for all different SoCs, etc.

Rust embedded code for specific hardware seems like a regression, but it would make things remarkably smaller codebase for specific app deployment.

> allowing it to use debuggers, benchmarks, and sanitizers, allowing it to write its own tools, and then put modules together

That’s a huuuge if whether it can meaningfully reason about such. GPT-4 has improved a lot over 3, but I really wouldn’t call its capabilities reasoning at all. We often under-appreciate human intelligence

Isn't this akin to: "The cost of an iphone is $500" ? Sure, it's $500 in parts if you just quantify the final working result... but it ignores all prototyping, overhead, and manual work to link together code (e.g. assembling the phone)...

Maybe I'm misunderstanding just how good GPT-4 is at making entire repos of code... or exactly how this tool functions?

You aren't misunderstanding it. This is not a valid way to calculate the cost of your code.

It's even cheaper to get a monkey to type that code for you. Will probably cost a banana. Chances it will type the code you expect? Similar to GPT-4 for anything remotely non-trivial.

If a banana costs ~60¢ (depending where you get it!) and GPT-4 is 6¢ per 1k tokens… you’re better off using a monkey for everything > 10k tokens!

This seems to assume that GPT4 can actually generate the code of your repo. Good luck with that assumption.

Let's just all print $100 bills.. it can be done for under $5

How much money could a restaurant save by replacing the food with dirt.

It seems that the package calculates the lower bound of how much would it cost to write the same code with GPT-4. It does not take into consideration that even GPT-4 cannot write code at god mode and write it correctly on the first try.

However, it gets interesting when you realise that if writing the code with an LLM is dirt cheap, then 1000 iterations of writing the same code with the guidance of a skilled software engineer would still be cheap and probably faster. I can imagine a world where whole engineering teams are replaced by just one engineer with a code-generating LLM.

It cost me $7.61 to build this library[0] using GPT-4, much more than the direct cost of tokens required to reimplement the library.

The majority of the cost was the large context windows being proxied back and forth to OpenAI from my local machine, rather than just the additional code with each new message. Also, I had no idea how to do what I was doing, so even if I could tell GPT-4 exactly what to build, I wasn't even sure it was possible!

[0] https://github.com/jumploops/magic

What’s the purpose of putting this on the package index instead of just a GitHub repository?

Better discoverability and so people could very easily install and run. Git adds more overhead.

Pip supports installing directly from git anyway.

pip install git+https://...

Well, that's a fun exercise, but it's not the volume of code that has value, it's how it's put together.

What, you don’t pay your devs per kilogram of code? ;)

Volume is measured in liters or m³.

I meant fluid kilograms, naturally.

As a developer I “wish” chatgpt could take my job. Really hoping it happens sooner than later so I can get my product out into the world faster. Have tried many chatgpt and other A.I. tools to do this but all of these AI coding tools seem to have no understanding of the code for maintenance when dealing with edge cases and things that are hard to explain to A.I. tools. I wish we had these tools tomorrow but it looks like it is 50 years off . In fact using chatgpt makes a lot of products take longer to build i suspect as 99.999 % of the work is in understanding the code for small upgrades and bug fixes and these tools seem to produce alot of complex and unmaintainable code

A lot of conversation is needed to create that code. I guesstimate that the actual tokens are at least 100 times than the ones in the repository, even if one can convince ChatGPT not to explain every choice it makes.

> convince ChatGPT not to explain every choice it makes

I assume this is a major issue, because it tends to be more talky than my ex which is a real feat.

The entire premise of this thing is wrong too because what if you need to regenerate something? Which even with the mighty gpt4? Still happens a lot… It’s almost akin to me asking a co-worker how fast he types and calculating how much of his salary it is for him to generate code based on that without any other factors. Also pointed out but why is this even a package? It seems like the definition of bloat when this should have just been a blog post tool or something. It’s all completely unnecessary and seems to try and cash in on hype.

all your mony, as you never would have reached a complient program

maybe with GPT-6 or so, we will see, but not with GPT-4

Nice - now please share the prompt to get it to write my code.

now, was this written ... by GPT-4?

You mean before, or after the bankruptcy? /s

Can anyone give me some estimates instead of defending your jobs as coders?

What? Nobody is defending their jobs here. It’s explaining how non-trivial this is. What is the point of your comment? If you wanted pure token to money amount and are too lazy to calculate it your self use the tool in the link. That’s not how the real world even with an LLM will work. Let’s assume the LLM writes everything, even gpt4 won’t get everything in 1 go now. Requirements of software are hard to put into pure English… why write this snarky message. Adds nothing to the conversation.

These comments remind me of newspaper articles saying the internet will fail and horse breeders saying cars will fail. All want to deflect to how wrong this is when it should be like the Drake Equation and can still give you data and ranges. With time and AI evolution, the estimates will get more close to this.

“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” -Upton Sinclair

>when his salary depends on his not understanding it

"That don't make no sense!"

If you believe in the potential of chatbots to replace programmers, then the salary of every programmer depends on them not not understanding it.

Here's why they seem like a waste of time to me, even for writing bland emails and comments; what say you?


You misunderstood what so many people here wrote with even more snark and a bad quote. It could and probably will replace my job and I have no contingency plan in place leaving me at great risk. That’s a given it seems. What isn’t is giving an estimate on a dollar amount and of time. Especially with any scientific rigor. The tool in the link is silly and that is what everyone is talking about. Perhaps you should try to be less bias reading the comments here.

>It could and probably will replace my job

People tell stories about employees at big companies whose entire job is to send a small report to someone once a week.

Even in cases like that, it's illogical to fear AI, because if it really was so simple to automate, it would've been done already.

Either the storytellers are clueless about the true nature of the job, or else there are defenses against redundancy that aren't obvious.

> Even in cases like that, it's illogical to fear AI, because if it really was so simple to automate, it would've been done already.

This hasn't been my experience. There is a TON of low hanging fruit for automating workloads which exist in almost every large company. While not all automations are easy, there's no shortage of easy work in automation. Almost every financial organization I've worked at has had some amount of manual processing of overnight batch jobs.

>This hasn't been my experience.

What you mean by "experience" is unclear to me.

I agree with you insofar as everyone with even a little experience at a large company knows there are a lot of apparently low hanging fruit possibilities for automation.

But do you mean you have seen things that seem simple to automate, or have you tried to do so and found out what happens?

There is a lot going on below the surface.

Yes, plenty are absolutely trivial to implement. Things like "I move the file from this folder where it's dropped off by over night batch processing to this other folder and then I kick off the processing job" or "I copy these values into our master excel spreadsheet and take the results from the calculations and put them in this other system."

Extremely basic and low hanging fruit is all over the place.

>Yes, plenty are absolutely trivial to implement.

The most technically trivial change has a litany of obstacles involving people.

Off the top of my head:

1) Finding out a process exists

2) Finding out who knows about a process

3) Being able to communicate with such a person

4) Motivating them or other stakeholders to make a change

5) Making the new process independent of specific employees

It's like asking what the squareroot of yellow is.

How cheap could it cost would be calculated based on the smallest cost of input parameters required with the smallest code output while minimizing how many time something needs to be reprompted.

No one knows the minimum prompting required.

It's like asking how much does it cost to recreate google (a popular upwork fake project). However much money you want to pour into it but you are still not getting google.

its obviously an impossible question to answer with confidence but i attempted to put some fermi estimates here https://twitter.com/swyx/status/1657603239251181570

and based on kiinda conservative assumptions, ai coders start being competitive in 2028

Nice guesstimate, but I have an issue with this: "One junior dev will improve to senior dev in 5 years"

That's true only for the HR department. Because code and experience wise, someone that has been programming for 5 years isn't that much experienced than someone fresh from college. In 5 years you barely have time to get to know your editor inside out, and start to have a grasp on your first tech stack.

I consider someone to be an experienced developer one that has seen a couple frameworks and hype cycles come and go. Let's say 10 years of experience.

Also, AI has 0% productivity. Being able to churn out code without being able to reason about it in the grand scheme of things makes it worse than even a non-programmer in problem solving ability. AI has no problem-solving ability. Ignoring this issue makes the whole thing moot, because it assumes that GPT is actually AGI.

> I consider someone to be an experienced developer one that has seen a couple frameworks and hype cycles come and go. Let's say 10 years of experience.

sure you draw the line where you feel comfortable, but in the US job market, people get the senior title in 3-7 years, its just a statement of fact rather than opinion.

I think that is his point. At most places I've worked at in my almost 15 years of experience, a senior title is not a title based on time but based on goal setting. Some places do that better than others for sure, but it's not just a magic you hit 7 years now you are a senior. A senior engineer in 3 years??? That's a SWE who is still taking baby steps in my experience.

This seems sloppy too. Is this based on anything scientifically rigorous? Where I live even startups will take on less profit for a quarter to train new graduates who have never touched an IDE on purpose. The point of training a junior isn’t to be immediately competitive. It’s to take over your job as you move on to other roles.

Thank you!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact