Instead of thinking about codegen as a replacement for a developer, what if we thought about it as an extension?
In my filter bubble most people would already agree with the statement that generative code models are an extension rather than a replacement. It's not that revelatory of a statement.
Saying they are a replacement without any evidence to suggest this is already starting to happen seems like it's taking a larger leap of faith.
The "Story of Pixar" documentary gets about 30 minutes in and apparently a Disney exec kicks out the nascent computer animation department with the comment "if computers can't make movies cheaper than animators we don't want them"
I think in most cases, yes. But sometimes it can be very disruptive to the workforce, too. I think of what AI is going to do to the commercial driver, factory / warehouse, and food service industries, for example. That’s a huge labor pool to reallocate.
Not to mention ChatGTP is already has most of the skills needed to be a reasonable replacement for most of my customer service interactions. Not all, but somewhere around 70% I’d say.
Not trying to be alarmist, but I feel we’re going to have to rethink labor in both large and small ways.
Its not going to do anything to the commercial driver, factory/warehouse, and food service industries.
Those are physical labour based industries, and dexterity robots are very, very expensive and not cost competitive with humans.
Low end, phone/chat only customer service will be replaced very soon. But what's next in line is millions of white collar jobs, from medicine to law to accountancy to teachers.
You are like 2 years behind, the 'truck drivers will be replaced first' narrative has long been flipped on its head.
Physical labor industries very much at risk, along with white collar jobs.
example: Flippy 2 — a robot arm that works the fryer at fast-food restaurants — already deployed at Chipotle, White Castle and Wing Zone. It’s coming fast.
Because if reddit is anything to go off of downvoting is ultimately abused to mean "i don't agree with you" without putting in the effort to explain their disagreement. Or just being lazily critical, as you are
Super fair. Longtime folks on HN learn to take every supposed insight with a giant grain of salt. When it comes to predicting what will change what, we’re all just wildly guessing.
I thought these were some good ideas. In my experience I vacillate between "omg the singularity is here" and "this actually isn't that good for X specific task".
I very much trust the output of LLMs to be well-designed, but I don't trust things to just work, especially if the system is complicated. I experimented a bit the past few days doing a task myself (building an interface in an existing project), using AI assist, and trying to get AI to solve completely (GPT-4). The solve completely pathway failed and I found myself in an interminable loop. AI-assist was a solid experience.
I have ran 5 “test cases” where I would jump on a video call with some friends that are not software engineers but are very technically savvy, and would give them a simple-ish task.
They are allowed to use ChatGPT as well as Google. None managed to do it in 4 hours each, and that was with me giving them hints when the AI would inevitably get into a loop.
The task was to install docker and using docker compose host a reverse proxy with Træfik with self signed SSL, as well as web-server in Rust. All the rust app had to do was read the kernel version of the host machine and return it as html.
I have then ran the same exact test, but this time around, all you had to do was install docker, and with docker compose run a Træfik reverse proxy with self signed SSL, and two other containers, UptimeKuma and Audiobookshelf. No dice, nobody managed to do it.
Now that you mention it, they haven’t really answered my calls in weeks.
Joking aside, we actually had a lot of fun and they thought it was very eye opening, not only about AI, but also about what my work life is like.
The output of LLMs is ... rarely well-designed. Well-documented (with often incorrect documentation), well-formatted for sure, but profoundly not well-designed, unless you're asking for something so small that the design is trivial.
Even with GPT-4, if you ask it for anything interesting, it often produces code that not only won't work, but that couldn't possibly work without a major rewrite.
Not sure what you've been requesting if it's always been good output. Even when asking GPT-4 for docs I've had it hallucinate imaginary APIs and parameters more often than not.
Maybe the questions I ask are not as common? Given my experiences, though, I wouldn't recommend it to anyone for fear it gave them profoundly bad advice.
I’ve come to the conclusion that GPT produces code at a level of a new graduate at best. In actually getting it to solve something more or less on its own, it did ok on simple tasks and failed as soon as requirements became a bit more nuanced or specific. It’s also not very good at thinking out of the box, it’s solutions are all very clearly tied to its training data, meaning it struggled doing anything that strayed too far into the abstract or different.
However it’s been great at being my rubber duck and it’s been great as a tool for helping me eg write complex SQL queries — never without me being a key part of the loop, but as a tool to help me fill in gaps in my own skills or understanding. That is, it amplified my abilities. It was also pretty good at creating interesting metaphors for existing concepts, explaining terminology and even explaining bits of code I gave it.
My experience as well. Heavy GPT-4 use (for a variety of things). Great for boilerplate, great for retrieving well-known examples from documentation, saves a fair amount of time typing and googling, but often completely wrong (majorly and subtly) and anything non-trivial I have to do myself.
Great tool! Saves a ton of time! Not a dev replacement (yet)
Now I am definitely doing much simpler things I would wager than you folks are but I have found that with a bit of back and forth you can get pretty good results that work with only a bit of revision. I have found reminding it of the purpose or goals of whatever it is youre working on at the moment tends to make the output a bit more consistent
> with a bit of back and forth you can get pretty good results that work with only a bit of revision
The problem for me is that the "back and forth" and "a bit of revision" steps very often end up taking more time than writing the code myself would have.
Thats because you actually know what youre doing haha.
In all seriousness I am not a software engineer and GPT has enabled me to build things in a couple weeks that would have taken me months of effort to create otherwise.
I am sure an actual software engineer could have made those same tools in a day or two but its still incredible for my use case.
There are posts on X about how powerful GPT4 is. And the videos are really impressive. But then in my own experiences it’s only really good if you know what you’re doing and can carefully guide it into taking a single step in a process. Anything more and the failure rate explodes upwards. I love using it as a copilot (github copilot chat in vscode is great). But it’s so far from “singularity” that I don’t fear for my job as a programmer yet.
> developers, only 3% say that they highly trust the accuracy of these (AI) tools
I don't believe this at all judging from how I've seen developers using them the past few months. Seems a huge amount of trust in them. Although I'd say the trust is earned, it's shocking how often ChatGPT can completely nail a request.
I think they can be useful without saying you "highly trust" them. If you moderately trust them, you can still let them do grunt work that you'll know you have to look over before committing.
Use and trust are different things. I use Google Search, I don't trust the first answer it gives me until I confirm it. I use co-pilot, but I don't accept every request. I don't trust them to be right, or even commonly right. Even the code generation benchmarks for language models are sitting between 30-50% correctness for the absolute best models. Would you trust a lawyer that is right 30-50% of the time?
I'm quite sure that LLM style things will be smart enough to generate things like complicated decision trees and state machines from human explanations, plus test suites and explaining which cases have ambiguity and/or missing.
> I'm quite sure that LLM style things will be smart enough to generate things like complicated decision trees and state machines from human explanations, plus test suites and explaining which cases have ambiguity and/or missing.
I remember back in March, using ChatGPT to generate code, i was not impressed at all. I mean, the code was okey-ish, but nothing that revolutionary.
Fast forward 3 months, i learned to use the tool, now i use it all the time to generate code. There are some ways to use it, and really shine as a tool.
1) It works well for CLI tools. It knows thousands of linux commands and can use them flawlessly most of the time. Any recombining of existing command line tools, is easy as a breeze.
2) Try to use it on a language with an as strict compiler as possible. Rust is the most obvious and modern candidate. Untyped or dynamically typed languages like Python and Angular are far from ideal and should be avoided. Typescript is one more obvious language of choice.
3) Jargon of programming, and knowledge about libraries in the programming language are very important. Being as specific as possible about libraries, maybe even modules and functions makes all the difference.
Unfortunately the third point rules out any amateur trying to use code generation effectively.
The following days, i want to use GPT to create a notification system for HN comments, like a daemon running in the background, which downloads my comment page, saves all comments on a database, and for every reply, it sends a notification, using notify-send with the user and the first 10 words of the reply. Maybe a subtle sound effect as well, like gmail, or facebook.
Does a tool like that exist? I have no doubt, GPT will excel at this, not that difficult but still not trivial task.
Lol, awesome fragmede. The program has some defects though.
The username, of which the comments it is looking for, is hardcoded into the code. The standard convention for CLI tools, is to create a .rc file, like .hn_reply_notifierrc and save some configuration in there.
The way your program is configured, the next user will have to recompile the code with the new username. That's a big no-no!
It is so much fun to use GPT that way, that i cannot resist but generate my own program, and i will share with you on github the resulting code, alongside with the GPT conversation.
Oh yeah totally. Either from an RC file or as an argument on the command line. But for something I "wrote", in a language I'm totally unfamiliar with, in 2 hours or so, I'm floored. I'm a seasoned developer (in other languages) and how much I didn't have to dig into rust to get that far is mind blowing.
We've thought about this question a lot at Mito[1], where we're building a spreadsheet that code-gens Python code for you as you edit it. For us, it's been useful to decompose the question of "what code-gen is good for" into a few sub-questions that help us think about how generative AI approaches effect us:
1. Why is it necessary to generate code in the first place? Can you just skip to the "solution?"
2. Why is just writing the code by the hand not the best solution?
3. So you do want to do code-gen, does it make sense to do it in a chat interface, or can we do better?
As a Figma user, I'd answer these in the following way:
> Why is it necessary to generate code in the first place?
Because mockups aren't your production website, and your production website is written in code. But maybe this is just for now?
I'm sure some high-up PM at Figma has this as their goal - mockup the website in Figma, it generates the code for a website (you don't see this code!), and then you can click deploy _so easily_. Who wants to bet that hosting services like Vercel etc reach out to Figma once a week to try and pitch them...
In the meantime, while we have websites that don't fit neatly inside Figma constraints, while developers are easier to hire than good designers (in my experience), while no-code tools are continually thought of as limiting and a bad long-term solution -- Figma code export is good.
> Why is just writing the code by the hand not the best solution?
For the majority of us full-stack devs who have written >0 CSS but are less than masters, I'll leave this as self-evident.
> So you do want to do code-gen, does it make sense to do it in a chat interface, or can we do better?
In the case of Figma, if they were a new startup with no existing product and they were trying to "automation UI creation" -- v1 of their interface probably would be a "describe your website" and then we'll generate the code for it.
This would probably suck. What if you wanted to easily tweak the output? What if you had trouble describing what you wanted, but you could draw it (ok, OpenAI vision might help on this one)? What if you had experience with existing design tools you could use to augment the AI. A chat interface is not the best interface for design work.
ChatGPT-style code-generation is like v0.1. Github Copilot is an example of next step - it's not just a chat interface, it's something a bit more integrated into an environment that make sense in the context of the work you're doing. For design work, a canvas (literally! [2]) like Figma is well-suited as an environment for code-gen that can augment (and maybe one day replace) the programmers working on frontend. For tabular data work, we think a spreadsheet is the interface where users want to be, and the interface it makes sense to bring code-gen to.
Codegen like that can even extend across architecture tiers. For example, I can take a live database with said schema, connect JetBrains Rider with an EntityFramework plugin (using ASP.NET and C# here as an example, though similar solutions exist in Java and other tech stacks) and generate a set of entities with mostly correct data types and relation mappings automatically: https://blog.jetbrains.com/dotnet/2022/01/31/entity-framewor...
Not only that, but OpenAPI/Swagger codegen is integrated in ASP.NET so I can also end up with a web based UI to test any APIs that I might make, should I opt to create controllers that use those entities.
While most of the codegen I've seen in an academic context has been more or less a mess (broken Eclipse plugin based tools), practical approaches like this are wonderful - for the more boring and boilerplate stuff, I can basically draw a few boxes and get bunches of SQL and C# code that I can then change and/or fine tune as necessary, using the generated stuff as a basis (even though re-generating it would probably overwrite the changes).
LLMs feel like the logical next step: feed in a bunch of projects and language documentation and you can query the LLM with various questions about how to do something, to at least sometimes send you on the correct search path yourself without having to jump around 15 different documentation pages, maybe only 5 will suffice now. I use ChatGPT fairly liberally for my personal projects and while it's no silver bullet, it feels like a value add to me, since there's a surprising amount of boilerplate out there for the boring problems that I solve, with every framework and language solving the same stuff in slightly different ways.
This article is so misleading.... when someone talks about codegen, it usually means AOP, bytecode manipulation, compiler plugins and the sorts. Not "AI tools".
What is even worse is that this isn't even a proper article, this is just a thinly concealed advertisement.
Oh, and clouds are now where other people's computers live, and not full of rain and lightning, both of which aren't friends to computers.
Jason isn't my friend anymore, but a data interchange format.
Neither is Kate, she's how I orchestrate my containers.
Ghosting doesn't involve poltergeists or a Halloween, Zoom isn't a function on my camera, and fishing steals my data and doesn't get me delicious salmon.
Don't forget "literally" which people have rendered useless by using it to mean "figuratively". And even the dictionary allowed that misuse of the language. :(
Literally has always meant figuratively[0]. That is not a misuse of the language, that is how the language is used and has been used since at least 1769.
I don't know how you figure that. Up until the last 10 years, when someone said "I literally (did $thing)" it was a statement that they actually did whatever it was, regardless of how unlikely it might seem. That is the polar opposite of how people (especially the kids, who should get off my lawn) use it today. It is a misuse of language, plain and simple.
I've always seen it more as exaggeration/absurdity for the sake of humor than ignorant misuse of the word. Like in a cartoon when a character's eyes _literally_ pop out of their head. Of course, even if that's true, I'm sure the nuance would sometimes be lost in translation and help blur the definition in common vernacular just the same.
No, "literally" has been used to mean "figuratively" as a common English idiom for centuries, and there is plenty of documentary proof to that effect, including the specific year I mentioned (1769) being the first use of "literally" as "figuratively" attested to in print. This isn't something "the kids" came up with ten years ago, or that I just made up.
Here. Here's another article about it[0]. Like all language prescriptivists, you're simply wrong.
Ok bro. I don't know why you're being so hostile. But I'll just say that I disagree that your sources persuasively back up your argument. Hell, the second one is completely irrelevant. It's just trying to come up with an excuse to complain about (non-existent) sexism, and has no good argument whatsoever. Your first source is better, but I still don't agree it substantiates your claim.
Oh, and since we're apparently trading insults about linguistic views: like all linguistic descriptivists, you don't know what words mean. ;)
Yeah, I was really confused by the article at first. I understand that the term must have evolved, but I hadn’t heard/seen it used this way until now. I’m already skeptical of codegen in the more conventional sense, and definitely skeptical of AI code generation… so I guess now I can carry on as usual, with a slightly elevated “I’m probably not the target audience” filter?
The same— I'm in a long-time world of generated message headers for ROS, so "codegen" to me is that add_custom_command/add_custom_target dance you do with CMake to get it to run a non-compiler tool ahead of actual compilation.
The examples GP gave were java-specific, but the concept of codegen meaning "deterministic generation of code in a 'compile' phase" is universal—think of OpenAPI libraries, Go pre-generics, Babel, etc.
Yeah, sorry about that. If I was more familiar with native codegen tools I would have brought examples from them as well. Hopefully I still got my point across.
> This article is so misleading.... when someone talks about codegen, it usually means AOP, bytecode manipulation, compiler plugins and the sorts. Not "AI tools".
This is an odd take to me when LLMs are very, very good at generating code, which has got a lot of attention recently. Sure it may be a different beast from what we've currently identified as "codegen" but it remains a descriptive term for the code-generating technology.
Words mean things, especially when communicating in professional or semi-professional settings with terminology with longstanding meaning. To not do so is poor communication, and irresponsible depending on the stakes.
Codegen's long history of macro-fied (or similar kind of scripting) source writing matters here. It has the very important property of having consistent/deterministic output from a process that can be verified with very high confidence by rudimentary human inspection.
It's like calling autocomplete "codegen".
I truly love LLM-assisted coding. I would never call it codegen, and think it can even be unethical to do so when the stakes are high, because it gives it the veneer of trustworthiness that lends one to carelessly not audit it.
LLM-aided code writing should, IMO, be called "code assist", not "codegen".
I think this is more a matter of familiarity and habit - notice how there are several slightly different definitions of exactly what 'codegen' means in this thread alone. It doesn't really have some super-specific meaning, it's not, dunno 'lexical scoping'. It's still just shorthand for 'code generation' and it's not unreasonable to apply it to, well, most automated processes that generate code.
I don't buy a nihilistic "does anything mean anything anymore?" approach.
I'm basing my evaluation on Copilot. It functions as a highly context-sensitive and very useful autocomplete, so the "autocomplete" label is a complete, accurate, and precise description for it.
It does not fill the same role as writing repeatable macros, batch scripts in part of the build process, token manipulation, or joining a tabular dataset with templates...all components associated with traditional code generation.
In the most strict pedantic sense, yes, LLMs create lines code via their internal processes, and that could be called "code generation" at a technical level. But they serve different needs, with different techniques, and different interfaces.
In terms of common parlance, no PM or manager who is not pants-on-head dumb is going to suggest replacing scripts that generate code 1:1 with LLM output, unless the terminology itself has confused them into thinking they fill the same role.
Poor communicators, whether because they are overly pedantic or undertrained, may make this mistake. But in a team environment, if "codegen" is considered a valid name for LLM output, an effective communicator is always going to need to clarify which they mean, because the tasks are not interchangeable in the least.
My argument is that it's not a precise technical term of art. There are many of those but this isn't one.The evidence you can easily see in thread and in pretty wide usage on the internet. Your argument is 'this is nihilism' which is mostly vibes and grumpy vibes at that.
Ummm.... Awful code that often looks right at first glance, maybe.
Maybe LLMs can generate the kind of code that's really shallow in its complexity, but for literally everything I would call interesting LLMs have produced hot garbage. From "it doesn't quite do what I want" to "it couldn't possibly work and it's extremely far from being sane," though it always looks reasonable.
> Ummm.... Awful code that often looks right at first glance, maybe.
> Maybe LLMs can generate the kind of code that's really shallow in its complexity, but for literally everything I would call interesting LLMs have produced hot garbage. From "it doesn't quite do what I want" to "it couldn't possibly work and it's extremely far from being sane," though it always looks reasonable.
What is that specific technical meaning? A quick Google search suggests definitions for the term are all over the map; Wikipedia aliases it to "code generation," which is a whole family of processes (including program synthesis and model-driven development).
I'm not seeing anything immediately obvious to exclude AI-synthesized code from the "codegen" label.
With regards to the article, this is in the first part of it:
Codegen is the process of generating code automatically, based on a defined set of rules or specifications. There’s a wide ecosystem of codegen tools, including:
- Simple code completion in an integrated development environment (IDE), like Microsoft’s IntelliSense feature
- Templates for repeating code patterns, like code snippets in Figma
- Visual programming and no-code tools, like Bubble
- Modern AI-based codegen systems, like GitHub Copilot and Replit Ghostwriter
And how would that hurt the argument? It's not royalty where being offended in combination with power could be brought to bear. If you replace "AI vomit" with "the product of these fascinating achievements of the human mind and persistent experimentation" in a sentence, it chances not thing important about rest of the sentence.
"vomit" is the problematic word in the context. The comment author puts their bias on display.
Why should I take any arguments from an author seriously when they layer their bias on?
They could have replaced "vomit" with "output" and have had a better statement
This is besides the fact that they claimed I was referring only to AI output as what I mean by code gen, when I clearly listed many types of code gen. Again, they demonstrate bias and poor argumentation skills.
> Why should I take any arguments from an author seriously when they layer their bias on?
Because they're not referring to private experiences, but the the shared world. It's kind of rich to talk about "argumentation skills" while talking your personal need of taking arguments from "an author" seriously or not. Who cares? Then don't take them seriously, takes nothing away from them.
> I clearly listed many types of code gen
"Codegen is best for augmenting your design to development process, not automating it.", "AI-based code generation (codegen", "Instead of thinking about codegen as a replacement for a developer", "codegen can speed up your handoff workflow by making suggestions"
And so on. All throughout the article you use that word to mean one thing and one thing only. Or put differently, "Why should I take the arguments of someone seriously who doesn't know even know the article they wrote?"
> That's exactly the problem - codegen means a lot of things. You just talk about codegen and expect people to understand that you mean the AI vomit.
I took the "you" as more general here. It's not your comment (that came afterwards) mentioning codegen that is under discussion, but the article, and it it uses it that word exactly that way, and only that way.
I think you have something backwards in your logic gates, you do not get to decide if your argument is weak or strong
I am now sure you're holding it wrong, you're trying to make an analogy that does not work.
In the first case, an opinion is made about the quality of LLM output. Here, you are stating a mathematical fact two ways. There is nothing to debate about 2+2 being 4
> Source-code generation is the process of generating source code based on a description of the problem[9] or an ontological model such as a template and is accomplished with a programming tool such as a template processor or an integrated development environment (IDE). These tools allow the generation of source code through any of various means.
> Instead of thinking about codegen as a replacement for a developer, what if we thought about it as an extension?
Who would think otherwise? It seemed pretty clear from the beginning that anything that automatically generates code would act as an assistant rather than a replacement
Entire page crashes with "Application error: a client-side exception has occurred (see the browser console for more information)." if HTML5 autoplay is disabled.
In my filter bubble most people would already agree with the statement that generative code models are an extension rather than a replacement. It's not that revelatory of a statement.
Saying they are a replacement without any evidence to suggest this is already starting to happen seems like it's taking a larger leap of faith.