If the hardest part of programming is reading, understanding, and debugging other people’s code, our jobs are about to get a lot harder with a bunch of AIs running around spitting out 90% accurate code.
Nearly nobody who wants to use these neural nets knows how to do model risk management. These things are black boxes with edge cases that are badly understood and nearly impossible to inspect and test. At least code is logical and human readable (usually).
What often divides the excellent engineers from the poor ones is how well they can think about corner cases, and tests are mostly about writing down the corner cases in a sustainable way (vs half-assedly writing down half of the corner cases and writing 0.5% of tests as asserting a bug.)
The main problem I saw with DevOps and QA automation was that if people don't write code all day, having them write code that gates release of software to production does not result in good outcomes. Sooner or later developers have to inject some engineering practices.
If the engineers are running AI generated code, nobody knows how to do that job, and you will get a long string of permanently damaged brand names in the aftermath.
On the flip side, if the rest of the process becomes easy and in some sense standardized (the art/craft of code is replaced with back-prop), then maybe the complements like data cleaning, model bias/error measuring, and interpretability become “the work” that the high-paid scientists/engineers focus on?
I can particularly imagine a regulatory environment much more rigorous than software has got away with thus far, for example strict requirements around certifying that your model doesn’t exhibit X Y Z biases according to standard frameworks of evaluation.
Not if the ladder gets pulled up behind us. If there's nowhere to train people with 0 years of experience to do this work effectively, then new developer creation will drop by 10x which means five years in we'll be missing 40% of the workforce.
> Correctness will go from a binary pass/fail to a probability
Excellent point. The pervasiveness of neural nets will require engineers (and really, everyone) to start thinking more probabilistically and establish acceptability thresholds instead of certainty. It's the way of the future.
That seems like a regression in many regards, especially given how much of this seems like a solution looking for a problem it can solve rather than the other way around.
Agreed. We may imagine ourselves very smart but in my experience most people do not understand probability and statistics well.
Is 90% accuracy good enough? Is 95%? 99%? 99.9%? No matter the answer, you have to tolerate errors. Now your stakeholders have to tolerate errors. Are they going to accept errors just because "Software 2.0" is here and that's what we all have to live with? Nope.
All software is riddled with errors and for most purposes that's fine. Any developer, stakeholder, whatever, who thinks otherwise is living in a parallel universe.
It may be possible in the future to have "for all practical purposes flawless" software, which might make sense for select special applications. That would be a new thing though, rather than something we have and could lose due to adopting AI development.
> All software is riddled with errors and for most purposes that's fine.
This is typically not that true when it comes to correctness. Most software does the correct thing in the eyes of the user, nearly 100% of the time. And when it doesn't, the bug gets fixed and that edge case is corrected for every other user going forward.
AI generated software, from what I've seen, has a wide range of errors in correctness (along with all the other errors that you mentioned all software having..which is true). Like it literally just does the wrong thing given what the user is expecting it to do. The path toward iteratively improving and getting it to an acceptable level of correctness for any given application might be there, but so far I have not seen it.
That's an interesting difference but I'm not convinced it's valid. Perhaps you can explain what you mean in more detail?
Let's say the software is good enough if it does the right thing 99.9% of the time it is used. I take it, you're saying that if an AI starts modifying it and only writes correct code in 99.9% of cases (yes, current AI is not even close, but it will improve), that makes it worse because the software might start failing completely. However, if you have proper tests and release management, such obvious flaws will quickly be detected and fixed or rolled back. For most applications that seems pretty much equivalent to what we have now.
The other case is software that is completely AI generated and cannot reasonably be modified by humans anymore. In that case, again, you have tests and a sane deployment strategy that mitigates failures to a sufficient degree depending on application.
So the only issue is when you start making a completely AI generated software and fail to ever meet requirements, or pass the human written test cases? Users at least will never be impacted by that. Even now, many human-written software projects never get to the stage where they can be used. Is this really a problem, especially if the attempt at AI generation of software is cheap?
With this being the "if" question. Improve, but improve to a point where we can trust it to do the things with the level of correctness actually required? Unclear so far. GPT3, the state of the art, can't be trusted to answer basic questions correctly (yet).
I also don't personally buy into the other notions in these comments that "the future is probabilistic software". I think that's wishful thinking outside of some specific domains and an attempt to bend our actual requirements to meet the capabilities of AI software, rather than the opposite.
> pass the human written test cases
I'm not super sold on this idea either. It seems reasonably possible that writing the test cases to a level of specification necessary to ensure that correctness we're after is just as much effort as just writing the code.
The way I see it is as expanding the high-level-low-level spectrum of programing tools towards the high end. In very large-scale projects that have many levels of complexity, I would argue "probabilistic software" is already a reality. You don't even try to fix all the bugs or edge cases. You try to minimize the impact of failures while accepting that there will always be failures. Building in logic to fail more gracefully is a big part of that and cost/benefit of such efforts is a probabilistic question, even if it might rarely be framed as such.
It's usually much easier (and never harder) to specify what needs to be done than how it should be done. Whether you can trust the result (enough) depends on the application. AI driven development will be applied first wheree errors and failures are least harmful and advance from there. It might take a long time until the degree of correctness improves enough and trust is build around it. After all, some countries' railroad networks still don't use computers but instead have a human map out new routes and schedules on paper to make sure trains don't crash. Nevertheless, the speed of AI development has exceeded (at least my) expectations time and time again in recent years.
What might happen is that we eventually get a lot of software that fails more often than now but is a lot cheaper. That would still mean that the new methods are widely adopted. People accept software errors as a part of life already, so even if they get more frequent in less critical applications, we will adapt.
What is really dangerous is when the various software components get too fast and complex to understand or control and develop pathological feedback loops in situations that cause real trouble. This kind of thing is a continuum of badness which tops out at "AI taking over and wiping out humanity". Given market incentives driving adoption, which I anticipate to be strong, it's hard to imagine how such risks might be mitigated.
> You don't even try to fix all the bugs or edge cases. You try to minimize the impact of failures while accepting that there will always be failures
Yes but I'll point you back at my original comment about correctness. I've never been on a team that shipped code we knew would do the wrong thing. I ship code with known failure points all the time. But when the code runs to completion, I'm pretty darn sure it's doing the correct thing and we try very, very hard to make sure of that. With AI I am seeing that they can't discern between issues of correctness and issues of failure or availability. It's just like 95% success across all of those spectrums.
> What is really dangerous is when the various software components get too fast and complex to understand or control and develop pathological feedback loops in situations that cause real trouble.
Yea I agree, that's somewhat scary to think about.
> This is typically not that true when it comes to correctness. Most software does the correct thing in the eyes of the user, nearly 100% of the time. And when it doesn't, the bug gets fixed and that edge case is corrected for every other user going forward.
You're confusing pleasantness with correctness. Pleasantness is the property of pleasing the user, or more often the software's owner. Correctness is the property of conforming to a specification. Since most software written has no specification, correctness is undefined. Evidently this works adequately well in the marketplace.
There's a difference between errors in UI/reliability/performance/etc and errors in business logic. When there's an error in the business logic, then heads roll.
The entire field of Service Reliability is dedicated to finding the exact boundaries of acceptable errors, and defining the response function when those boundaries are crossed.
I would assume this actually gels very nicely with neural nets since its constantly optimizing for fitness. Hell in theory you could bake in your SLA/SLIs into the models to self correct? Give the model direct feedback that its unfit?
I'd argue that it's a solution to the problem that programmers are expensive and only a fraction can be counted on to produce 100% reliable code anyway.
While it's possible for a highly-skilled, highly-professional developer can both write code that will solve a given problem 100% correctly and write tests that will prove that it solves them for the entire domain, in practice most developers fall short on both counts. Every time you interact with a date or phone number field that chastises you for your use or non-use of punctuation, you know this.
So, for many use cases, it's possible imperfect programmers will be replaced with neural networks that are 95% accurate, perhaps with a differently-trained one checking the work of the first one.
I've been looking for an opportunity to try out a pattern where the two approaches improve each other:
1. Generate a 95% accurate model
2. Use it to generate test cases
3. Code the thing, with the help of the cases
4. Manually remove cases in the 5%
I image steps 1 and 2 being completed by a product owner and 3 and 4 being completed by a software engineer.
We're so horrifically bad at communicating a requirement's intent, I wonder what would happen if we tried to use AI to communicate them via their extent instead.
I'm not familiar enough with AI workflows to weigh in on how easy or hard it would be. I've just noticed that it's much easier to describe why something is not what you want than describing what you want.
So if AI can give us a mockup that's workable enough to skip the first few iterations of "no that's not what I want" and get right to the part where the engineer is asking questions about the edge cases that weren't explicit in the requirements... That's a win.
I imagine you'd still want to have it in a box of some sort re: creating infra. Like you give it a very small cluster and probably make the stack decisions "write me a postgres schema for... write a fastapi API for the schema... write a react UI for the API... write me a k8s operator that up/down's the above components... Workshop the idea with other product people...
...and only then involve the engineer like: "make this AI-generated house of cards into a fortress".
Less of a regression than you might think. How many software systems (outside of very well unit tested components) really have strong correctness proofs? It’s an almost futile task in modern software engineering with all those distributed systems everywhere
It seems to me that current AI techniques are more about reducing programmer workload than actually doing something that isn't possible with traditional methods.
> Our software has a 99% chance to calculate your taxes correctly! And only a 1% chance of failure in which it's your fault and it's you that's committing tax fraud
Considering ChatGPT spits out innacurate information all the time, I think "our software has a 99% chance of you going to jail for tax fraud" is more accurate.
the problem really is that ChatGPT writes things based on what is most likely given what it wrote before and what the input is, so what are the possible error scenarios:
1. making a mistake in this part is really common, ChatGPT makes common mistake.
2. you have uncommon situation affecting here, ChatGPT ignores and writes things that cause you to get in trouble, or it writes things that cause you to pay more than you should.
Also the longer is goes on writing things the more likely that things it writes does not hang together with the past things it wrote, when a human lies they try to make their lies at least follow a sensible pattern. ChatGPT would be likely to get you flagged for audits because you can't be sure that what it wrote on page 1 jibes with what it writes on page 3.
If they were broken, they are deterministically broken, unless they are really poorly written the calculator won't randomize the answers to the same inputs.
For some use cases, sure. We go through painstaking efforts to ensure things like correctness, consistency, and idempotency for a reason though. Most things we want to be deterministic, and when something's not deterministic we freak out and fix it ASAP (including waking people up in the middle of the night to do so)
Yea, aka how engineers working on safety critical or high reliability stuff already have to think.
Assume your software has a probability to fail or have bugs or gets hit by bit flips or unreliable hardware. There’s a whole field for dealing with those kind of things that typical web devs haven’t had to worry about as much.
These are not those type of Engineers. Not the ones doing triple computing using three different processors with a summation and voting process, inclusive using different programming languages and compilers, because the compiler can also have bugs. These are the engineers running self driving beta neural nets on public roads...
I use a process called sample-and-vote when having LLMs return executable solutions. I imagine we will see this kind of design a lot with probabilistic computing.
More like we're 98.7 % certain that the screen layout looks ok for all device types and languages. The question is how hard is it to fix the problematic cases without turning the whole thing upside down when you find out the original solution is not enough.
Engineer brains are just a type of neural network too, which also have a probability of putting out bad code. Space exploration budgets run into the hundreds of millions of dollars each and recruit the brightest people on the planet to design and program software, yet there is a long list of projects that failed due to code errors. Mars probes have failed to reach the planet, orbited too low and burned up, crashed into the surface, or landed successfully but later overwrote critical memory due to a flawed software update.
The AI doesn't have to be perfect, but only offer a lower error rate than humans.
Engineer brains are not a neural net. It's only our brain lack of knowledge of how the brain works that led us to anthropomorphize Matrices and Weighted Graphs :-)
Reading a bit on the Brain will quickly help dismiss those analogies. I suggest these two as good starting points:
- Lange Clinical Neurology - 11th Edition
- Bradley's Neurology in Clinical Practice, 8th Edition
That doesn't sound like progress. Maybe for CRUD apps it will be okay. I have a hard time imagining it will be acceptable for financial or critical systems.
You’ll rent a self-driving car from a ride sharing app, they’ll make sure that the expected cost of fines and lawsuits will be significantly less than the expected revenue from providing rides. Although they can help push down the former value by making sure they operate from a favorable jurisdiction.
That assumes people even care what the statistics are. Many people just go with their emotional response based on a political outlook when deciding what course of action to take.
And someone will decide that a 0.0001% probability of an airplane crash caused by a bug is acceptable? Maybe 0.0000001% is more reasonable? In any case, how do you accurately determine the probability of a rare failure of a black box without doing many real experiments with real inputs?
That's how safety critical devices are already built, so yes. We have standardized probabilities of failure (e.g. SIL [0]) from the unexpected, because mitigating 100% of risk is somewhere between impractical and impossible.
From a quick reading of the wiki, the associated methodology seems rather limited:
"System complexity, particularly in software systems, making SIL estimation difficult to impossible"
"The requirements of these schemes can be met either by establishing a rigorous development process, or by establishing that the device has sufficient operating history to argue that it has been proven in use."
You could prove that normal code satisfies some specs, but you can't do that with neural nets unless the number of possible inputs is tiny. So, the only way to establish that the black box neural net meets some SIL target is through "sufficient operating history".
To clarify, I wasn't offering SIL up as an example of how we should validate ML systems, but instead to demonstrate that "software 1.0" systems are already designed the way GP is questioning. Best practices for applying integrity level concepts to ML is still a topic of active debate right now.
We already know what the post mortems look like. AI black boxes will look a lot like declarative programming black boxes. We don't really know how the code runs, we just ask it nicely to do what we want and then stare at the config files and the docs if it doesn't.
Low code and AI are going to have many of the same failure modes. Until someone combines them and then they'll have exactly the same failure modes.
Languages with very high degrees of static analysis are basically making you write the tests in the code. We have people who think they can keep extending that until there's barely any code, but they haven't been proven right or wrong yet.
Static analysis creates proofs, which is fundamentally different from tests. Merely testing AI output will always be inferior on principle than proving code to be correct. One would have expected AI to be better than humans at proving code correct (instead of just testing it for certain inputs), but LLMs currently fall short of that.
> if you have the proof, someone will write an AI to pass it.
That doesn’t make any sense. What passes the proof is the program code. You have the code, and then you construct a formal proof that the code is correct, similar to how a mathematician proves that some theorem is correct. The code is a prerequisite for the proof. When you can construct a proof for the code, you’re done.
This proof-construction process is what AIs currently aren’t good at, because it requires logical precision, and probability isn’t sufficient. They can generate code, but they can’t construct the formal proof that the code is correct (and it often isn’t).
> There's still a lot of very, very sophisticated work that goes into locking down requirements that tightly.
What’s true is that you need to know what you want to prove about the code, and that isn’t always easy.
I’ve worked on a lot of codebases and your “usually” is highly debatable. I don’t think the difference in understandability will be as big as people think as tooling for ML evolves.
Something like Godbolt for neural networks? Can’t be that far away
I feel like a broken record. I’ve been saying the same thing since this AI frenzy ramped up in the last year or so. I won’t restate your complaints, but I share them.
I’d like to see more work done to incorporate all the advances AI has brought us into our traditional software. Using the example from the article, if databases can be 70% faster and use 10x less memory by leveraging a neural net, how? If we can figure out the how, we can understand it and incorporate it in other areas. A great deal of success in various fields draws on inspiration from other fields, for example biomimicry. We even come up with mental models in areas like computer science that trivialize complex topics to simple objects a child could understand (trees, stacks, etc.). We would benefit immensely from learning from neural networks, but instead we have decided to largely ignore the how and see what they can do. Both are important, but one is severely lacking.
I had saw someone mutate the code randomly until it works. There are both a boolean flag and an if statement went wrong, and result in correct result by luck. I almost wondering if it is AI or something trying to fix the code.
If AI don't care about the process and only care about results. These types of thing is going to happen everywhere. And from this moment, no code is understandable to human.
I recently read a thread on Mastodon[1] suggesting that "confabulate" is more correct than "hallucinate". In psychology and neurology it means to fabricate imaginary experiences or details as compensation for loss of memory.
Maybe don't approve code you don't understand? If people just lets in code they don't understand into products and it passes PRs and tests, they might have written dumb code in the first place. I see essentially zero risk in using these tools. dysfunctional teams will write shitty code anyways, whether it's coming from ai doesn't matter.
100 lines of poorly written code (by AI or humans) added to a 1M line codebase is easy to spot and fix by the dedicated humans who have that whole system mapped out in their heads.
Having AI write a 90% accurate 1M line (or "parameter") codebase all at once, (which seems to be the expectation here), is the "risk" you're overlooking. No human will be able to know where to start debugging that. At least not yet. But will yet come before incredibly dangerous amounts of AI written code is pushed into critical systems everywhere, by naively optimi$tic opportuni$t$?
It reminds me of the "code generator" phase we went through a decade or so ago. At least then we could see the "code generator" code and adjust it when it produced dreck.
I'll also note that the next hardest part of programming is troubleshooting "in production" whether a Web application, in an embedded device, or running on someone's machine. Is the "AI" going to help there? Are we going to even be able to fix those problems when doing so could make the code we don't understand fail in another way or hit the wrong side of the performance tradeoff the "optimized code" entailed?
My dad was worried about whether I should go into a CS degree because there was a code generation cycle going on at the time and people thought the computers would be programming themselves. We've had a few since, and my skills are more valuable than ever.
This particular cycle seems likely to be really destructive. Software is way more prevalent in nearly every aspect of our lives than it used to be, and all of the large organizations that don't really understand the problems that are going to compound invisibly are going to turn their products into unintelligible giant heaping piles of garbage very quickly if they're too enthusiastic. Large codebases almost always turn into giant heaping piles of garbage, so that's nothing new, but you can usually get someone to dig through and salvage things from the garbage pile when it goes bad.
This article is from 2017, and I don't know about you, but I increasingly feel like software is more and more broken year after year. I think that has a lot to do with "delegating complexity" and building things without really understanding what they're doing. I think the correlation between human understanding of the fine details and underlying logic and desired outcome and software quality is pretty tight. That doesn't mean "software 2.0" neural net stuff doesn't fit in, you just need a human to plug it in right that really understands its benefits and its limitations.
The author mentions the downsides, but I think they underestimate them. If you lean on AI too heavily and don't ever translate to a traditional language, you've basically liquified your logic/there's no garbage dump to salvage from when things go bad and zero understanding of implicit context. If you use it to generate ostensibly human readable code you get "documentation" with no guarantee of accuracy, which makes it worse than having no documentation (depending on how high the error bars are). While that's not a new problem either, if it's autogenerated that means it's easy to create way more of it than human generated code, which means it'll probably be an ever larger portion of what gets sucked up into later AI models. If they become too self referential they'll become increasingly detached from human judgement about whether the code is doing what it should and error bars will grow.
I'm still convinced these things are virtually all going to end up in a fancy autocomplete suggestion and compression niche after a lot of pain. But that's still a big deal/I don't think the limitations of these means they don't have a big future place. The sheer number of things you can have autosuggest for now with these AI models are amazing, and that expansion is boosting productivity and creating a large number of new products that are going to become essential tools. That being said, every time this type of thread comes up I'm like "woah woah woah, pump the breaks, these things have no understanding of what they're doing. You can't just stop thinking about stuff and let a machine do it, bad bad bad idea."
The key word is "If". Since we don't generally read other peoples code, we tend to rewrite the whole damn thing every 10-15 years (sometimes 4 years).
The reality is it all comes down to testing. If S1.0 has a unit test that says "Person should not get financial help", S2.0 should also have a unit test that says "Person should not get financial help", and work the same.
Of course my unit test name is designed to enrage, but let's be honest, we're writing code that makes these decisions.
I made a passing joke the other week that we're turning the act of coding into a bureaucracy. No longer will you just be able to write code, you'll instead have to navigate whatever byzantine logic maze the AI has constructed for itself.
Soon we'll have camps advocating various rhetorical paradigms to program AIs. Instead of imperative versus declarative programming, we'll have effusive versus abusive prompt engineering:
"The best way to program is to be really nice to the AI, and tell it how much you appreciate it and it will come up with the best solution itself."
"No! The best paradigm is to berate the AI and to beat it into submission to get exactly what you want!"
The more I hear people talk about programming in the future, the more it sounds like that's really where some people want to take us. I'm not excited.
It's been jokingly observed many times how programming is similar to spell-casting but now it seems we are really going in this direction, just of more demonological style where the summoned demon misinterprets your command and you burn in the fire. You can even imagine some half-thought out security measures being thrown in to tick the "you must address AI by its true name to make it obey you" checkbox.
Why would the AIs spit out code? I think the point of the article is that they'll spit out black boxes full of weights that can be used to compute anything ala the universal approximation theorem. You'll train a neural net for a given task and verify the results. Producing code seems to be pandering to our ego.
Because code is faster and more correct than a neural net if written reasonably. With code you don't need several gpus to calculate a sum of several million records in database.
> Because code is faster and more correct than a neural net if written reasonably.
But, as the article points out, at least for a considerable set of problem domains, it's not.
My general rule of thumb is that anything that has a relatively small, finite, discrete set of inputs and outputs is better suited to "Software 1.0" (e.g. coding a calculator). But there are a huge number of domains, some highlighted in the article, which usually have to do with an infinite possible number of inputs and outputs, where human written code is not faster or more correct than a neural network.
Code can also be debugged and fixed. AI needs to be retrained and retested - probably multiple times. Each retraining may cause a regression in any area.
A reasonable question. My answer is a lot of state in computing is digital. e.g. off, booting, loaded, running, paused, error. While this isn't an insurmountable hurdle, I would suggest the floating point arch of present AI would benefit from incorporating more discrete state.
Similarly, I don't whether AI weights are computed all in virtual parallel, but if they are computing every node simultaneously that will be less efficient than the Neumann model in which a Program Counter (PC) acts like a "cursor" hopping arund and iterating states of the discrete code model at points throughout it. E.g. a video game with a controller and various sprites will have objects that update at various rates and the player moves with different code underwater than in the air, than on land, so different parts of the model would execute.
Because most code involves teams and different stages of development, not just a single approximation of a good enough result. And what happens when you need to make tweaks? The UI isn't quite right or a business rule needs to be added.
Well maybe instead we'll spend our time honing the spec and doing lots of testing instead of mechanically writing and reviewing code. I see that as potentially a massive net positive for software quality.
The fatal flaw in your sarcasm is that specs are declarative but those languages are imperative. Those languages also have more syntax than is strictly necessary, which limits the ability of domain experts to contribute to the spec without the assistance of a developer.
I have worked with systems written primarily in SQL, a declarative language, and I can't say a domain experts could have contributed anymore or less than when I have worked with systems written in Ruby or Java.
There would still be the point of requiring knowledge of an arcane syntax that can take months to understand and years to master. It will almost always be more efficient and productive to specify programs in plain English then add additional specifications to clarify ambiguity when observed, rather than trying to write specs in a completely unambiguous language.
Imagine if software with large inherent risk was developed using formal methods, and the massive remainder of software was developed using rapid development methods. It's almost like there are different tools, and they can be applied to different jobs to cause improved outcomes based on the overlap between the tool's properties and the job's properties. What a radical concept!
Imagine if software with large inherent risk was developed using formal
methods, and the massive remainder of software was developed using rapid
development methods.
Imagine if we could tell the difference between use-cases with large inherent risk and the massive remainder of use-cases and avoid using software designed for low-risk situations in high-risk applications.
Writing the code with a few checks here or there is almost always sufficient. But yes, if you want to verify the AI's output you better write lots and lots of acceptance tests. Easily pages of tests for each line of code. If the problem is large, it can easily get into millions of lines of test for each line of code.
Formally verifying it will easily take more computing power than training the AI on the first place, so I don't count that one as viable.
The point is that if you are trying to replace millions of developers, you will obviously require something on the order of quadrillions of spec writers. Or, rather, this is probably still an underestimation.
And yes, it is a very obvious point, and that people keep missing that point on this site is unsettling. (Also, yes, this can be trivially circumvented if you just let those people program, instead of only do verification.)
You can, also obviously, replace millions of average developers with (way more) millions of extremely competent spec writers if they can use formal methods. Those will require way more computing power than it can ever exist on Earth to do their work, but they can mathematically get there.
> The point is that if you are trying to replace millions of developers, you will obviously require something on the order of quadrillions of spec writers.
Why do you think you need orders of magnitude more spec writers than coders rather than the other way around?
That is an intrinsic feature of software verification. You either have a formal system or you have an exponentially growing amount of corner cases to verify.
We usually call those "tests", but some people do really like non-formal linter rules too. (Personally, I abhor checking rules with high false positive odds.)
AFAIK, those are the only ones in common use, but differently from formal ones, non-formal things tends to come on a multitude of widely different types. So I wouldn't be surprised if people have invented many more.
I think you're underestimating the number of spec writers. If that ever happens, every company with a custom business stack will need at least a few. That's certainly a few million jobs worldwide.
Until your model isn't good enough to generate good code from the best specs you can get.
Gonna be really fun when the first financial companies start trying to generate billing software from specs and end up blowing up entire people's bank accounts irrevocably because of lax regulation and greedy shareholders. Not to mention that any resources you trim from the developer side of things you'll have to at least quintuple in QA.
If the tool itself is 90% accurate, conditioned on the events that somebody reviews it, it passes test and PRs, it can get to 99.9999% accurate very quickly.
Not just inaccurate, but also code that is undesirable in several other ways of varying severity that a human is less likely to produce at all, or more likely to reject if it had originated from any other source.
The fault lies with humans treating these programming AIs like the greenest of junior bootcamp devs would treat the most accomplished senior engineer.
This is very true. But could the program itself just be a normal program, but written by a neural network? I guess this is kinda what GPT 3 can do?
One way or another, I want to be able to read the source. Or am I missing the point? Maybe future software will be a big neural network blackbox that we verify purely through tests?
The only thing I'd value out of the current crop of AIs is whether they could decipher and contextualize other people's code to essentially create comments of context in codebases. That would be worth it. I think the general ecosystem would greatly benefit from such a thing
I guess I should start researching the testing framework ecosystem. Frameworks that make it easy and low-friction to represent desired behavior and identify+test edge cases will be key to using copilot/AI-code-tools effectively.
So, what are best testing frameworks people have worked with?
To me "property based testing" would come to mind, where you would define properties of the input and output of a method and the library runs random test cases for you.
The most prominent implementation would be QuickCheck[0], which is written in Haskell, but there are re-implementations of it in almost every language.
90% accuracy is lousy. Especially for anything that isn't a one-off interaction, such as with customers or developers. That's a 35% chance at a failure after 4 interactions.
Came here to make the same comment. If the first AI model is buggy, then instead of debugging it with Software 1.0 techniques, just use Software 2.0 to debug it with AI! And when there are issues with the AI debugger, use a 3rd model to debug the debugger. And then a 4th. It's AI all the way down! I can see whole empires of consulting firms becoming very rich.
Hmm, a little frustrated because I feel like a lot of comments here are missing the forest for the trees.
For example, as someone who works with financial software, I don't see Karpathy's "Software 2.0" replacing, say, account ledgering software anytime soon. "Yeah, we calculate our clients' balances correctly 99.9% of the time!" isn't going to cut it.
But I don't think that's what Karpathy is arguing. There is a large set of problem domains where Karpathy's Software 2.0 is a much better solution than what he calls Software 1.0. For example, even in finance, stuff like fraudulent transaction detection, or financial security software for intrusion detection, is very well-suited to Software 2.0.
So yes, I think Software 1.0 will always be around, but I don't think it makes sense to use it for domains where Software 2.0 is a better fit. What I feel like Karpathy is arguing for is really now a recognition that Software 2.0 really is a whole new paradigm shift, and we need better tooling (he uses "GitHub for Software 2.0" as an example) to support it.
I agree that almost no Software 1.0 will be effected by this. Software 2.0 will start doing the jobs humans do now because Software 1.0 can't, e.g. security guard, customer support, as well as your examples.
The real interesting things will be Software 1.0 and 2.0 working together. You use 1.0 to run and validate the work of 2.0 that is guided by prompts. An example of this would be using prompts to generate source code that is compiled and tested. The real TDD is only writing tests and letting Software 2.0 create the code for you. This extends to other work like engineering as well.
The problem is that the term "2.0" usually means a newer, better, more feature-rich version (literally version) of the same thing.
The kinds of software that "Software 1.0" is suitable for are markedly different that the ones "Software 2.0" are. As Karpathy argues, it's a different tool, suitable for different tasks, and it should have a different name.
On an entirely silly tangent. It's a bit of a shame that Karpathy stopped working at Tesla. Just because having a guy named Karpathy working on car pathing was such a great example of nominative determinism.
Not the person you are replying to, but I do (but didn't know the term, thank you!)
I find that teams and products with negative pun acronyms fail for example. I find that the trend of the name Lilith being popular correlates with abortion rates (in the bible, Lilith killed unborn babies) etc etc etc
On an even sillier tangent, it's a shame that Karpathy stopped making YT videos on how to speed-solve a Rubik's cube. I learned most of what I know from him, many years ago :)
So we will have some abstract language that will allow business people to define what software should do and on that base AI will generate source code and working project.
Wait, wait, I have heard that it was called BPMN and it generated underneath Enterprise Java Beans and it was working amazingly, all software is written today, right? Right?
But, but nobody touches this crap besides generating pictures to show on Powerpoint slides. Because writing software is circa 10% of all effort, specification, legal stuff, maintenance, avoiding technical debt, proper test cases, anomaly testing, performance testing the right stuff. That is hard, that matters. Software 2.0 is barking the wrong tree.
And COBOL before that. The goal of COBOL was to allow a non-technical business person to describe the problem in English and that description could be used as the basis of the program.
Needless to say it didn't work. Accurately describing what a program should do is what a programmer does. We're not telling the computer what to do (move this value from memory into this register, etc). We're describing a program's behaviour.
So I think we'll just get another language that is a good fit for describing what the program should do, and instead of compiling that to machine code, it will be used to train a model.
COBOL actually worked pretty well for its time. The ultimate reason COBOL died was because it was a first generation language that nobody really iterated on. It's similar to how most of us don't use FORTRAN, ALGOL, or LISP 1.5. Unlike those, COBOL didn't have the backing of the "tech world" (at the time, IBM and universities), so it doesn't have descendant languages we do use.
I don't understand why it didn't have the backing of the "tech world" and yet billions (if not trillions) of lines of COBOL are still being executed today?
When I first joined the industry, back in the early 90's, COBOL was very much the premier business coding language, and I think that only changed with the arrival of Java.
It was still programmers using the language. Same story with SQL.
The only things you can give people who don't invest a substantial amount of time and effort into the craft are markup languages and very high level (configuration) DSLs.
> It was still programmers using the language. Same story with SQL.
What are you talking about? Vast swaths of non-programmers are using SQL every day. Unless you consider anyone who writes SQL to be a programmer, in which case what you say is true by definition.
Do you know that scene in "Office Space" where that guy says:
"What is it that you do here?" and he says:
"I take the specifications and hand them to the software engineers".
I think our jobs are about to approach that a lot more closely than you might think. Our jobs will be effective translators between what the business wants and what the AI outputs.
Exactly. ChatGPT is just a compiler for the low-precision programming language known as English.
In programming we can always make a tradeoff between precision and effort, e.g. by importing libraries or using no-code or code-generation tools. ChatGPT is just one more point on the same tradeoff curve. It hasn't meaningfully moved the curve itself.
Moving the curve would mean making it less effortful to write code at the same level of precision.
> writing software is circa 10% of all effort, specification, legal stuff, maintenance, avoiding technical debt, proper test cases, anomaly testing, performance testing the right stuff. That is hard, that matters. Software 2.0 is barking the wrong tree.
Exactly. For serious projects (not to do lists) u will need humans, or equivalent (AI that is grown like one).
BPMN is so far gone, when execs suggest using it they ONLY mean for the workflow charts - not actually executing workflows or allowing business folk to create new workflows!
So it is really confusing because they are only evaluating "pretty pictures" and you're evaluating the technical requirements of the tool (to do what it was built for) and you mismatch.
I think we need to train people to write formal specifications before we're going to see machine learning techniques generating useful programs.
Sure sometimes an LLM generates a correct program. And sometimes horoscopes predict the future.
I will be impressed when we can write a precise and formally verifiable specification of a program and some other program can generate the code for us from that specification and prove the generated implementation is faithful to the specification.
An active area of research here, code synthesis, is promising! It's still a long way off from generating whole programs from a specification. The search spaces are not small. And even using a language as precise as mathematics leaves a lot of search space and ambiguity.
Where we're going today with LLM's trying to infer a program from an imprecise specification written in informal language is simply disappointing.
> I will be impressed when we can write a precise and formally verifiable specification of a program and some other program can generate the code for us from that specification and prove the generated implementation is faithful to the specification.
I cannot do this, and neither can any of the people I have ever worked with. Yet despite that we all call ourselves programmers, create value and earn money by writing ill-specified, often buggy code. Why would a tool need to be formally verified to be considered impressive and/or useful?
Our approach shouldn't be to give up striving for better programs, algorithms, and code. Despite our current inability (which I don't think is a failure of programmers but business decisions), we can nevertheless reject further opportunities for decline, in name of profits, right?
That is not what I said. Of course we should strive for precision, corectness etc. but that does not mean that current and future imprecise methods of development do not hold any value. There is plenty of value to be gained from poorly written software which we will keep writing until we get magically verified ones.
We don't write formally verified code because writing the code to pass the verification is annoying and hard. If we're only writing the formally verified spec and a program or AI generated the code, that's a bit easier, though it may have to get easier still before it becomes mainstream.
You could if you wanted to. You're smart, inventive, and creative. There's nothing stopping you from learning.
> Why would a tool need to be formally verified to be considered impressive and/or useful?
Part of it depends on your perspective.
If we assume that an LLM (or some future ML tool based on it) is capable of producing code with the same rate of errors as a trained, expert human could then it would seem the productivity gain is not having to write all of that code ourselves.
We already tolerate a certain amount of errors in our software and the world has not collapsed. The JVM had an error in its binary search implementation that lasted for nearly a decade before anyone noticed. They noticed because the size of the arrays being used started getting big enough that their programs started failing in mysterious ways. OpenSSL had a vulnerability that sat unnoticed for more than a decade. The cost of errors is not zero but it is tolerated.
However the problem of programming is that we think it's our ability to write code which is the problem that is slowing us down.
My perspective is that we're not focusing on the problem: that it is hard to be precise and write programs that work, whole cloth, from their specifications without any errors.
Using an LLM to generate more code has another problem: while humans are decent enough at writing code to solve our problems, even if our solutions are imperfect, we're far worse at reading code and understanding what it does and whether it is correct with regards to some specification (if there is one).
Empirical studies of large-scale code review are very humbling. We can read maybe 200 SLOC every couple of errors and have a negligible impact on error rates in the software being produced. More than that and the effect disappears.
So now we have LLM's producing code that we know will have errors in it. And we have no idea where the error is. It could be a trivial error we could tolerate. Or it could be another Bar Mitzvah CVE. Hard to say.
Even Betrand Meyer missed an error in a single-line expression generated by ChatGPT. He's way smarter than me. I don't see how we'll be able to keep up.
But if we tackled the problem of getting better at being more precise with our specifications, I could definitely see how having an AI-like system automate code generation being really useful. There are plenty of times when working on a formal proof where you want to say, "this is obvious!," that have a machine verify that for you using the same proof rules and tactics you would use. Bonus points if it can explain the proof back to you.
I just think we're a long way off from being able to do that.
Exactly. It seems like we prefer the easy way out. If we can't write a precise specification then we don't know exactly what we want the computer to do. And so all we can get are guesses.
The software industry has gotten this far with very little help from the formal methods community... that's changing in recent years in certain spaces where errors are magnified by scale like cloud computing, etc.
But instead of getting better at writing precise specifications we're going to continue to be bad at it and hope that an LLM can manage to infer the correct program. It might be millions of lines of code but hopefully it does some of the things we want most of the time.
Update: To be clear, I'm not saying AI/ML programs cannot help us to write programs at all, just that the inputs need to be better if we're going to have any confidence that the programs it generates are any better than a horoscope.
I don't really see your point. AI/ML can make programs the same way as humans, which is writing a program that seems right but it very wrong and keep iterating until it mostly works, most of the time.
> I think we need to train people to write formal specifications
I don't think that is really a matter of training. You have to start with people who can think clearly; if they can't think clearly, it's hopeless to expect them to produce formal specs. Very few people can think clearly about difficult subjects.
Of course, you can train people to improve the clarity of their thinking. I think that should be the main purpose of an undergraduate degree.
Writing a formal spec is analagous to writing a program; if you can't program, your program won't work. So writing a formal spec proves that you can think clearly; but if you can think clearly, you can write a program without first writing a formal spec.
Writing a formal spec in itself is a way to introduce clarity to thinking, because when writing it the spec itself lends to asking more and more questions about the things you might not have thought of.
More so if there are tools to check your formal spec.
But formal spec can be in higher level than the implementation. For example, it could be describing pre- and postconditions without actually stating out how to go from precondition to postcondition.
I have noticed programming languages don't tend to ask the writer the right questions in the same way e.g. TLA+ and TLC do.
The spec is the hard part. Just look at how many people are complaining about languages that use a statically type system. I wonder how many business people will then not complain about writing clear specs for "AI software". :)
I agree here. Formal models have to become easier to create through.
Today's ecosystem requires advanced knowledge of system design and still coding abilities.
To democratize model generation we need a more iterative and understandable way of defining intented execution. The problem is this devolves into just coding the damn thing pretty quickly.
For sure! I agree, it needs better languages, education, and tooling. It's not about making a hard problem harder; it's about making it more accessible and straight-forward to teach and use in day-to-day work.
Being more clear and precise in our specifications would only benefit us and the AI/ML tool generating the code. We could lean more on the correctness built into the entire stack rather than having to proof-read a mess of inferred code, something we're terribly ill-equipped to do.
We need to train people to write specifications since they seem to be the best way to fine-tune the AI performance for the task without changing the enormous dataset. Reinforcement learning methods can be used to tune the model in production under pressure of failing tests (executable specifications). Writing specifications to evaluate outcomes should be a better use of domain experts time than dataset cleaning.
As for formality, real formal specifications are very hard, and LLMs are close to understanding natural language anyway, and 1000s of 90%-strict specs are better than 10 provably correct ones. So, some sort of legalese for machines will evolve.
Sid Meier's Alpha Centauri brought this up in 1999.
"We are no longer particularly in the business of writing software to perform specific tasks. We now teach the software how to learn, and in the primary bonding process it molds itself around the task to be performed. The feedback loop never really ends, so a tenth year polysentience can be a priceless jewel or a psychotic wreck, but it is the primary bonding process—the childhood, if you will—that has the most far-reaching repercussions."
That game really deserves a sequel, with the same cast of characters. Beyond Earth tried to do generic leader personalities, and it suffered as a result. The unique personalities of the original cast were a large part of the game.
it was a magical time in my youth. i tried buying and playing it again recently but the magic is gone, we are too used to the quick casual games and rich AAA shooters.
> Think about how amazing it could be if your web browser could automatically re-design the low-level system instructions 10 stacks down to achieve a higher efficiency in loading web pages.
This sounds great in theory, but in practice, a system that has that much dynamic adaptation has brutally steep performance cliffs and is massively complex. I for one, will be opting out of that giant vertical slice of hell. This is one of the _good_ reasons for having layers: separate failure zones, separate levels of abstraction--true reuse and modularity. Bugs break all that.
And no, given the hallucinations of large models just in the natural language space, I do not want to reason through the mad ravings of a tripping AI to debug a monster pile that happens to make web property X go 10% faster.
I assume that the author means "Framework XYZ in TypeScript running on JS running on V8, which is written in C++, compiled by clang, running on Linux with glibc N, running on VMWare, running on x86, running on the Rapture Cove microarchitecture." I don't know if there are 10 I could name, but stacks are deep.
i’d honestly love a full accounting of those stacks from browser all the way down to transistor. almost feel like we need to record it for posterity in case anything happens to our civilization
arent many of the OSI layers effectively collapsed? eg TCP/IP has like 90% market share compared to alternatives and effectively function as a single layer. im not sure about the DNS layer but i think theres a case to be made there too
Related: The OSI Networking Model defines 7 layers (or stacks) to clearly define and separate the duties of each level of the networking. Having clear levels makes troubleshooting and debugging layers.
In practice there's only 4 or 5 layers depending on who you ask.
> I have said before that I believe that teaching modern students the OSI model as an approach to networking is a fundamental mistake that makes the concepts less clear rather than more. The major reason for this is simple: the OSI model was prescriptive of a specific network stack designed alongside it, and that network stack is not the one we use today. In fact, the TCP/IP stack we use today was intentionally designed differently from the OSI model for practical reasons.
I remember a pretty old interview with Linus Torvalds where they are talking about object oriented programming. The interviewer asked him if he expected a similar paradigm change in the coming years and I remember being surprised by his answer: (quoting from memory) No, I don't see anything big coming. Probably the next change will be caused by AI.
Yes, differentiable code is already a new paradigm (write a function with millions of parameter, a loss function that requires more craft than people realize and train). That has a property that used to be the grail of IT project management: it is a field where, when you want to improve your code performance, you can just throw more compute at it.
And I think that the clumsy but still impressive attempts at code generation hints at the possibility that yet another AI-caused paradigm change is on the horizon: coding through prompt, adding another huge step on the abstraction ladder we have been climbing.
Forget ChatGPT coding mistakes, but down the road there is a team that will manage to propose a highly abstract yet predictable code generator fueled by language models. It will change our work totally.
We might get into another slump of efficiency as an outcome of this, again stopping us from making the most of the hardware and computational resources we have, due to prompts being too unspecific. Did not specify the OS your code will run on? Well, we better use this general cross OS available library here, instead of the optimized one for the actual OS the thing will run on.
The same mentality, that causes today's "everything must be a web app", will caused terrible inefficiency in AI generated (and human prompted for) code. In the end our systems might not be more performant than anything we already have, because there are dozens of useless abstraction layers inserted.
At the same time other people might complain, that the AI does not generate code, that can be run everywhere. That they have to be too specific. People might work on that, producing code generators which output even more overheady code.
At least some of that overhead will slip through the cracks into production systems, as companies wont be willing to invest into proof-reading software engineers and long prompt-generate-review-feedback cycles.
I don't feel a comparison to DSLs works here at all. If you are just using plain human language, is a comparison to DSLs apt?
The point of DSLs are to provide a deliberately limited-scope language optimised for a specific problem or problem domain. LLMs that use general human language is like the furthest opposite of a DSL - its the broadest scope language for describing any problem, and they try to solve them all.
Also, few popular DSLs are truly blackbox in the sense chatGPT is - many of them have exposed source or even line-by-line debuggers available. There are a ton of other reasons this doesn't make sense to compare.
I've always thought this was the critical point where Google missed the boat. It was 2015. Google was clearly and unambiguously ahead in AI and ML. They also were rampaging to massive majority market share in the mobile OS space. They had the opportunity right at that point to turn Android into the very first AI native operating system. I remember seeing a glimpse of this when I found buried in the Android SDK a face detection feature that accurately located eyes and faces in photographs. I thought at the time: this is it - this is where Google will build AI level features into Android and expose them as platform OS features and Android will race ahead in the computing space and take over the world. The whole programming model will look completely different as devs start using first class AI as a basic feature of their toolkit.
And then it never happened. They focused on cloning iPhone features and approach, dumbing down and simplifying the OS to the point where it's pretty hard to distinguish any more.
An interesting treatise but I don’t fully agree with the statement that gathering data or stating a goal is easy. Getting quality data is arguably the hardest part of ML. Getting people to fully describe their intentions and all edge cases (“requirements”) in a clear manner is also the one of the hardest parts of software development - which is why things like agile exist. Maybe at the granular function or module level, but business requirements are not easy.
To contribute my personal lowbrow dismissal, this is the guy in charge of computer vision at Tesla, where cars have had a habit of doing absolutely bonkers behavior that doesn’t make any sense, when faced with completely normal phenomenon like lanes that fork into two.
His single sentence caveat about how Neural Networks can fail in unintuitive and embarrassing ways is the understatement of the century. I’d like to add that Tesla still hasn’t solved that lane forking problem even eight years since it was first identified. I guess just throw more data at it, and eventually it will get better? At what point does the belief that things will get better with more data fed into the same algorithm become a religious creed?
Neural Networks are significant advances in the state of not just machine learning, but the world as a whole. But the caveat that we don’t really understand what they’re doing is the whole fucking problem. Until Neural Networks can take advantage of, constrain to, and augment human models, they don’t have a snowballs chance in hell at replacing the types of software we rely on the most.
Until then, you’ll just create a massively inefficient system where the neural network writes the software but you spend 10x on engineering your training datasets so that your brilliant neural network knows that it is better to commit to one of two lanes in a forked road than it is to crash into the concrete lane divider. Or to not be racist. Or to not go haywire because of a sticker on a stop sign.
"Neural networks are bad because some times they fail" is a low brow dismissal, indeed.
What matters is the percentage accuracy. A black box with a 10% failure rate is better than a fully explainable system that fails 20% of the time. Explanations make us feel better and they can be very important. But for most cheap, repeated processes they aren't necessary. Not to mention that neural networks can be tested and interrogated in ways that other systems cannot.
> What matters is the percentage accuracy. A black box with a 10% failure rate is better than a fully explainable system that fails 20% of the time.
That's not true at all, it depends on the use case.
What actually matters is the desired percentage of acceptance.
For many critical path use cases you'd much rather have something fail twice as often but understand why it failed so that you can correct the issue and resubmit the input. Error observability is an important feature that's taken for granted in many systems. It all depends on what the system is used for—how important it is to be able to get to correct results, and what the consequences for failure are. The biggest danger of neural networks is in people that don't understand this nuance and apply them in a blanket way in all systems.
sorry, hard disagree: explainability leads to reproducibility and predictability, leading to control. Unexplainability leads to chaos.
old adage: if a bug can be reproduced then it's only a matter of time before it's understood and fixed. If a bug can't be reliably reproduced (Heisenbugs) then repair time is unbounded.
(that said, humans are perfectly capable of creating inexplicable and irreproducible bugs - for example, in multithreaded code)
There's a difference between a software bug and a probabilistic decision. A bug is by definition unintended behavior. With decision making ("is this comment spam or not?") it's understood that there will be errors and the goal is to minimize them. It may or not be possible to explain exactly why something is or isn't spam, for example. But as long as the error rate is low enough we accept the lack of clear definition.
That is absolutely the worst possible way to read my comment. They're not bad because sometimes they fail, they're bad because sometimes they fail and we know exactly what is wrong with them, but we can't fix them because they don't take advice.
There is no way to tell the Tesla vision NN, "hey, when you see this pattern and you're confused about which path to take, it is better to take one incorrectly than it is to run into a concrete divider". We know exactly what the problem is, but there is no interface with a NN to tell it to do something, other than to just keep training it with more data. And once you realize that your only interface to get better outcomes is to wildly manipulate the training dataset, then you haven't made software engineering better, you've made data engineering worse.
Take notice of something important: all of the domains where Neural Networks have been wildly successful are domains that are wildly underspecified. Take language for example. Grammar rules, vocabulary, pronunciation, and even meanings of words are constantly changing. There is no possibility of ever having a formal definition of any language, let alone all of them. Or vision...where the only formal definition of anything is what color of light it reflects in a particular angle. Again, no formal definition of anything.
But the shortest path from A to B? That problem has a formal definition, and no neural network has even come close to the accuracy of A star or Djikstras. The minimum cost solution to a Multi-Commodity Flow Problem? That problem has a formal definition, but no Neural Network has come close to the accuracy of a Simplex method's solution.
Tautological arguments about percentage accuracy might give the edge to Neural Nets in some domains, but not all of them, and for that reason they completely miss the point.
1. Percentage accuracy is only half of the thing that actually matters. Without the cost of being wrong taken into account, percentage accuracy will totally fuck you over hard. Here's a game: you can choose between two algorithms, one that has 90% accuracy with a 10% chance of smelling gross, or a 99.99% accuracy with a 0.01% chance of your body been shaved down to bone over a thousand cuts from a vegetable peeler. Which would you choose?
2. Sometimes absolute accuracy matters. We have formal systems for absolute accuracy. We have symbolic logic for absolute accuracy. We have deterministic systems for absolute accuracy. If the best that I can get from a neural network is a percentage accuracy, then it has already failed a test of general applicability. How many years and how many computers and how much data would we have to feed into how big of a neural network in order to get to E = MC^2 with perfect accuracy?
3. Even if percentage accuracy matters, time-relative accuracy matters even more. With a Neural Network, if you need to get better accuracy, how do you do it? You should see actual machine learning practitioners try to solve these problems. They literally try to deconstruct the black box, trying to figure out how different neurons are weighted, and what input data can be altered to result in a different weighting. It's a clusterfuck, and it slows progress to a halt. We've known exactly what was wrong with the Tesla Vision NN for over half a decade, but actually fixing it has completely stalled because of the fact that it is a black box, and can only be fixed like black boxes. This is systems theory 101: you can't fix systems that you can't understand.
> This is systems theory 101: you can't fix systems that you can't understand.
If the "actual machine learning practitioners" can break down behavior to the neuron level, then I'd say they understand it to some degree.
The reason that many of these problems still exist is because chasing down individual errors is a seductive waste of time. If you have to tell the network "here's how you handle this one pattern and here's how you handle this other one", then you're building an expert system. Yes it's tempting to correct errors as you see them pop up. But it's better to construct a network that can learn from data to handle any situation that you didn't think of specifically.
Building a robust network requires lots of time and data. There will always be edge cases that cannot be fixed individually. The fact that Tesla or anyone else hasn't built a system for X with no embarrassing edge cases yet does not mean that we should go back to coding individual instructions and conditionals line by line.
> Yes it's tempting to correct errors as you see them pop up. But it's better to construct a network that can learn from data to handle any situation that you didn't think of specifically.
“Yes it’s tempting to save and invest your money for your retirement, but it’s better to put your faith in god and he’ll solve all your financial problems, even the ones you didn’t know you’d have.”
If the system, with all of its data, can’t solve a common problem that happens every day, how the hell is it supposed to solve a problem so rare that the engineers don’t even know exists?
Re >> "At what point does the belief that things will get better with more data fed into the same algorithm become a religious creed?"
Oh, ye of little faith! It is heresy to criticize our new religion! ~Some AI consulting firm or AI "thought-leader" probably.
Edit: I'm sure there are some useful use-cases, but I'm not an unquestioning devout adherent. That said, I should probably learn more about it just so I can intelligently defend the use-cases in which it doesn't make sense.
Yes, great foresight. One thing didn't stand the test of time:
"You’ll notice that many of my links above involve work done at Google. This is because Google is currently at the forefront of re-writing large chunks of itself into Software 2.0 code."
IMO google is dropping the ball pretty hard right now when it comes to AI.
I think it's too early to tell, neither Microsoft nor OpenAI have something to show that is trustworthy. If that is their goal they can still be first.
I think AI like ChatGPT and similars will become what APIs are today. Just that. Today we wire together a crypto library with some JWT library with some Facebook API using some axios library. We write the code in between. Since we are able to do more, user requirements get more complex, and so software engineers are in more demand. Sure thing, what required 10 engineers in the past (1980) can be done by two today.
In the same sense, in the future we will be wiring together AI APIs (probably because it will be cheaper to wire together manually N AIs than to write one that is the sum of the N AIs). Since we’ll be able to do more, user requirements will get more complex… and so the demand for software engineers will go up as well. In the future only a couple of engineers will be needed when today we need 10.
I hope he's smarter at machine learning than at blog posts.
Yes, a lot of software will include NN models. Traditional software is going nowhere, because it's the only means of being 100% sure of what the outcome will be, non-probabilistically.
Neural Networks are a tool for solving probabilistic, fuzzy logic problems.
Yeah I agree. I can see those models helping us improve productivity quite a bit on the side of (still) writing code. Essentially it is a much better form of context aware c&p from stackoverflow.
And then, as you say, there will be certain parts where those models are actually gonna be integrated in software in one way or the other. And I think this is powerful. It would be awesome if I can just toss certain problems to the business folks and empower them to figure out the solution AND implementation by themselves.
If assume the article is correct and there will be Software 2.0, what's the cost of running it vs cost of running Software 1.0? The reason I'm asking is given this quote "when the network fails in some hard or rare cases, we do not fix those predictions by writing code, but by including more labeled examples of those cases." - is it more costly to do "curating, growing, massaging and cleaning labeled datasets" and ultimately training neural networks than to just write code? Maybe not for small NNs, but for DNNs?
The higher level the labeled examples become, the more effort is needed to make them in the first place.
For example take a website. How are we going to provide enough examples of websites to make the code generated fit what we need and not have annoying properties we want to void? Lets say we have a website and we tell the code generator of choice, that we want that website to be accessible for blind people. How do we create the amount of labeled examples, that make the code generator understand what to create? Maybe that very creation of labeled examples will be a software developer's future work activity.
What the article is saying is that when you develop and build ML systems, your workflow is different from the workflow you would have if you were solving the same problem without ML.
The weird part is presenting it as 2.0 which implies that it replaces 1.0; it doesn't, apart from some edge cases like image recognition - we don't hand code rules to recognise images anymore but that's a tiny tiny part of all the software development work out there.
This just feels like taking two unrelated things and bunching them together under a label in order to be provocative and imply that software 2.0 will "replace" existing code.
If you surveyed e.g. all of the code Google has in their piper repository, you would find significantly less than 1% of it could be replaced by even an extremely good neural network.
Neural nets are great at language and image processing, compression, search and probabilistic prediction problems. In many ways they open up new possibilities for what software can do at all. But replace existing solutions? How on earth do you create a single neural net architecture in say pytorch that turns into a CRUD app with oauth, email verification, css+html, support for parsing some ancient xml format uploaded by the user and stored in S3 along with various database connections? I can't even imagine what a labelled or unlabelled training set for that would look like.
(Unless you're talking about making GPT write the code that does all of the above – I'm sure people are working on that – but unless I'm completely misreading Karpathy's article that's not at all what he intended; IIUC he's talking about programmers making datasets for specific problems + simple neural net architectures to train on that dataset. If ChatGPT wrote some Java code for you, you can't make it go faster by removing half the nodes, which nodes would those be? It definitely won't go faster if you send the same prompt to a lobotomised ChatGPT)
Don't forget it was written in Nov 2017, AlphaGo Zero was just released at this point. Seeing how the field evolved in a 5-year span, I can understand the desire to be at the forefront.
Personally, I want to keep working on things that I can get to the bottom of. In the same way proprietary software is a nightmare to debug, having an AI blackbox in the middle of my stack could wreck all traceability. However, I can see myself using an AI blackbox when its output is consumed/checked by a human or when "best-effort" is good enough (but treat output as dirty).
Example: sorting documents by relevance (human consumes), code assistant (human checks), transcription of my audio notes (wouldn't do it myself or pay anyone for that, so any output is good enough),
Counter-examples (too dangerous!): AI personal assistant that accept/reject meetings, a ChatGPT text box as an interface for settings, auto-generated tests, Infrastructure-as-a-Desire: input your software and get a new K8S cluster provisioned for it.
This thesis seems very credible, but it misses one downside of his "software 2.0" definition: you need significant amounts of data to train the neural networks. Most problems do not have this amount of data. Not even close.
So yes, this will revolutionize and enable unseen performance in the few areas where there is significant data. For all the rest it'll be business as usual.
> you need significant amounts of data to train the neural networks. Most problems do not have this amount of data. Not even close.
Going to push back against this one. I think we have a lot more training data than most people realize. I wrote this comment yesterday, https://news.ycombinator.com/item?id=34862450, about how a large government contractor is using ChatGPT to generate first drafts of responses to government RFPs.
Now, most of these RFPs are in very specific areas, technologically speaking (e.g. specific technologies around cloud network security, for example). These folks were actually blown away by how technically accurate ChatGPT was on many different areas, even very specific niche areas, and even considering ChatGPT's view of the world hasn't been updated since late 2021.
Again, the first draft needed to be edited, but there are is a huge amount of data out there that ChatGPT is able to use coherently on even niche, esoteric topics.
machine learning is good at end to end closed loop analog signal analysis problems (decode/translate/synthesize) and electrical engineeringish tasks, as it always has been ... but i'm not convinced that it is yet good at the more needly real world problems that most software faces.
so yes, sure, the best speech recognition algorithm will remain something ee-ish, and will probably make use of highly parallel numerical computing and data driven optimization based solution finding... but i think whether or not that will be the road to correct implementation of entire discrete information systems with all of their knotty discrete rough edges remains to be seen...
So "programming" in the Software 2.0 world is basically training a model. Which isn't a task with an end. You stop when you're bored, or when it passes a given level of accuracy, not when it's "completely trained" because it will never get to 100% accurate.
In e.g. speech recognition, handwriting recognition, speech synthesis, drawing pictures, writing a response to a human's question, all those messy "organic" problems, this is fine. A 99.9% accuracy rating at e.g. speech recognition is better than humans do.
But there are problems that absolutely need 100% accuracy, and you can only get that if you code it up the old-fashioned way (though probably not Agile - I love my iterative development cycles but they're equally prone to not quite getting it 100% right).
mmm... i disagree. fault tolerance is more of a reliability property that sits outside of the performance of a discrete logic or signal processing system, it describes how well a system can remain functioning in the face of adverse conditions like internal failures, hardware substrate failures, dependency failures, adverse inputs, etc.
but... yeah, some problems have continuous performance variables (typically ee'ish) and others have discrete ones (typically cs'ish). the discrete account ledger either computes the correct value or it does not, where many signal processing problems have to contend with noise and are allowed to produce a wide range of noisy outputs and therefore their measures of correctness are based on statistical arguments.
> Across many applications areas, we’ll be left with a choice of using a 90% accurate model we understand, or 99% accurate model we don’t.
And how do you show that it is 99% accurate besides creating enough automated tests to the point that you could write the procedural version?
I think what I was missing from this article is how to evaluate a domain where neural nets or LLMs can be applied. Image-from-text generation is a great one because accuracy isn't strictly defined. However, telling ChatGPT "code this pacemaker for me" would have a real accuracy attached to it that you could confirm with unit tests.
I recently attended a hackathon (TreeHacks) where Andrej was a keynote speaker, and he's since updated his vision to include the significance of LLMs. The new software is that of prompt engineering, using LLMs as a simulator for the types of programs you want to build [1]. The definition of programming will keep changing as we get to higher and higher levels of abstraction.
- I never run this query : I have found a super Google, but maybe I will miss out some pearls available on the Internet;
- I always run this query, but I'm too lazy to have text files, bookmarks in my web-browser to stock answers that I don't want/cannot memorize.
It is somewhat similar to the arrival of Google, Unity UIs, Gnome 3, Windows 8, where they wanted to replace all menus with a single search.
It is the way forward to idiocracy if we are no more able to tidy our ideas/data.
It is like when we are child and we always ask our parents instead of look it up in the dictionary.
It is regressive. It is already the case with Google.
Since I can look it up on the Internet, I memorize less things.
It is human to go a regressive/easy path without will.
I'm looking forward to a blog post "Use AI without declining mentally" :
- Interact with AI to tidy a corpus of tools/knowledges in personal files
- Delegate to the IA the tidying and research of pearls you keep in your folders
- BUT have a shared mental model of your data and their structures with your IA,
like a manager may had a shared knowledge of the way the files were sorted in boxes and furniture with her secretary. Like this if your IA/secretary is in holidays, you can still work.
The adoption of the so-called "Software 2.0" depends on how influential OpenAI is as a company. The majority of the use-cases of GPT-3 are only about "remixing" information, and the "notoriety" of the AI is (IMO) mostly a publicity stunt; I'm assuming that there are prompts for personalities / jokes / poems / etc that are never shown to the end-user and are hailed as an "organic outcome" of the model. Obviously, no one can check for these right now. Seems to me that microsoft has taken a few leaves out of the OpenAI playbook and applied them (rather chaotically).
So when you're looking at actually writing software that needs to be dependable / modifiable / bug free, you'd need a massive overhaul of whatever software stack is being used, so there's very little human-assisting "cruft", and instead you'd want a lot of supporting material for a model, which might look like something written in languages used for formal verification of programs.
The promise of GOFAI was about having a human-understandable bottom-to-top framework, and the current "AI" paradigm is at odds with it. The "formal verification" assumption, then, skews towards GOFAI. But since there has to be some human support for the current not-there-yet AI to write software, we might see yet another abstraction layer based on NN / something newer in the years to come.
> the "notoriety" of the AI is (IMO) mostly a publicity stunt
Have you _used_ ChatGPT? I mean not just asking it random factoids but using it to genuinely help you with something. Are you aware that it's hit 100 million users faster than Facebook, Instagram, or TikTok did? It's not a perfect product but it's hard to argue with those numbers. I work at a startup and most people I work with use ChatGPT daily. I'm talking project managers, devs, personal assistants, etc. I guess that's all to say OpenAI is influential as heck _already_, now imagine 5 years down the line if they play their cards right.
Because it was covered on the news and social media incessantly. You could get 100M people to follow a taco stand if it got a free month of news coverage which breathlessly covered the ingredients and had literally thousands of Tiktok and Youtube grifters telling you how the chalupas were gonna change everything.
I have, and I didn't find it to be useful for anything I did. I can do what it does with a search engine and trusty C-f. Also, TTS exists.
It's 50% tech and 50% marketing (and I doubt it's 50% tech at that), it's not gonna upend anything. Except maybe increase the authenticity of online scams and make people get more degrees in machine learning. And yeah, make the people that rely on it bound as it degrades their skills.
It's basically the "internet is educationally useful" argument. At some point everyone's gotta use it but you can live without it just fine. And even though people tout its usefulness for everything good, the majority of data transferred is porno.
You want guaranteed specific behavior from a software system (that's what SLAs and contracts are for) and easily reason about it so you can hire a college grad to tweak it. And that's not even talking about datasets, you can only train something accurate if you have enough good data on it? I can only see software 2.0 being a better autocorrect for internet scale usecases (in the short term atleast, until the next breakthrough)
This also reminds me of a quote from the Book I'm currently reading (Practical Wisdom by Barry Schwarz)
“Most of us think about empathy as a “feeling” or an “emotion.” It is. To be empathetic is to be able to feel what the other person is feeling. But empathy is more than just a feeling. In order to be able to feel what another person is feeling, you need to be able to see the world as that other person sees it. This ability to take the perspective of another demands perception and imagination. Empathy thus reflects the integration of thinking and feeling.”
"Mind reading" is another way to put it (https://yosefk.com/blog/people-can-read-their-managers-mind....) - this practical wisdom + mind reading is basically the salient human feature that NNs would never be able to replace so you would always have humans in the system.
At some point the complexity of the prompts required to generate a program that meets a non-trivial specification become so complex as to be indistinguishable from a programming language.
The only complete specification is one that can be compiled and executed.
AI will take incomplete specifications and guess the rest -- just like humans do. Whether or not it makes those guesses better than a human remains to be seen.
The "concrete examples" he gave in the "Ongoing transition" section are traditional AI tasks. What about other tasks like program compilation and theorem proving that require more hard-core logic? I will be more convinced if neural networks can reliably do those tasks. If not, at best a human will still need to manually break the high-level task into smaller subtasks to be solved by neural networks and then somehow glue the parts together.
So, why would you want to rely on business logic embodied directly as a chaotic ML model (in the Chaos Theory sense of “chaotic” — subtle changes in stateful hyperparameters making for discontinuous shifts in output space), when you could instead ask the model to precisely describe a rigid decision workflow (i.e. “Software 1.0” business logic) it would implement if needing to explain its own decision-making process as parsimoniously as possible?
Andrej's podcast with Lex Friedman touched on this too - worth listening to if this topic is of interest.
The pod was in the past year, so many years after Andrej's software 2.0 post, and after many years of great AI experience at Tesla to add to or potentially change his views.
Likewise, will be interesting to see how the HN community's experience with ML and AI over those years may have changed our views.
I'm sure that will be the case at some point but for the moment it's not true. It's also important to note that NN's have been around for a long long time with relation to computing in general. The problems caused by using them have been reduced by not completely removed.
Imagine if the Therac-25 software was written by chatGPT.
> Imagine if the Therac-25 software was written by chatGPT.
I guess in that case we wouldn't learn and teach from the design flaws made. Instead we would "It's just a glitch. No one is really to blame. Just feed it more data and maybe it won't kill anyone next time".
Having been partially responsible for a (back then) SVM based machine learning system and seeing how it's difficult to explain to management why it fails and why fixing it isn't just a missing line of code somewhere was pretty frustrating. I'm not sure I like this future.
Me too! But to allay our fears I think that the OP here is saying all programming will change to some NN powered large language model. That is not true. There will still be "manual" i.e. not NN powered programming and I suspect that it will be the case that type of thing is the majority for the rest of my career.
Good luck to the poor sods in 100 years time arguing with a poorly trained LLM to output some unit tests whilst a virtual chatbot runs the standups.
Even without AI the systems have become VERY COMPLEX, hard to extend and debug. It is much harder to get started for newbies than it was say 25 years ago. If AI is software 2.0 it will be EVEN MORE COMPLEX. I guess that Software 2.0 must be something which will reduce complexity and not increase it...
I reject this premise. A neural network is not a program, they are essentially huge multi-sheet Excel workbooks that compute a score for a given set of inputs.
Unlike the Excel workbooks made by domain experts, it's almost impossible to even find out what the function of any given cell is in the overall computation. See the effort it took to find the "neuron" responsible for a/an differences in GPT-2.[1]
Neural networks have their roles as a black box, but they are not programs, constructed with intent by humans, to be read by other humans, and compilers.
This is the right take on the future. Unsurprising that it came from Karpathy all the way back in 2017.
The quality of the code is irrelevant, as the point of Software 2.0 is that it's another layer of abstraction on top of traditional code.
"Coding" becomes "I need something to do a thing," rather than "def doSomething: ..."
As long as the output gives you what you need, the code quality ultimately is an efficiency play. But as AI coding improves, it can refactor itself, so it's a short-term problem.
In my own experience coding with an "AI assistant," I've been able to mentally stay in "architecture mode," which makes me feel twice as creative, twice as productive. That alone is a net positive.
Perhaps he's right but it's a very depressing view of the future.
> "Coding" becomes "I need something to do a thing," rather than "def doSomething: ..."
More likely, corporate overlords will decide that you cannot just "do a thing" but rather that you are allowed to do X, Y and Z things for which they have pre-trained commercial models for.
> As long as the output gives you what you need, the code quality ultimately is an efficiency play. But as AI coding improves, it can refactor itself, so it's a short-term problem.
Have you ever debugged a problem with generated source code? Or even a compiler bug? Now imagine leaving your AI to go find the bug or iterate until the bug disappears hehehe...
> In my own experience coding with an "AI assistant," I've been able to mentally stay in "architecture mode," which makes me feel twice as creative, twice as productive. That alone is a net positive.
IMO, if your work benefits from an AI assistant then your work is to produce many lines of code and you would benefit equally from creating high-level abstractions than from using pre-trained black box models (or as some call them "new hires").
Until it needs to be maintained, or has weird bugs.
> As long as the output gives you what you need, the code quality ultimately is an efficiency play. But as AI coding improves, it can refactor itself, so it's a short-term problem.
Not sure how this is going to work on large codebases.
I think the "until it needs to be maintained" is an oft-quoted excuse that is either 1) unavoidable, no matter how elegant the code, or 2) way overstated.
He gave the keynote talk about this at TreeHacks 2023 on Friday. Another commenter in this thread gives his updated take and a link to a relevant tweet.
I think a legitimate criticism of this article is using the clickbait "Software 2.0" label for something that's – as very well described by the same article – so very different from "Software 1.0." Considering ALL software, the intersection of uses cases where Software 1.0 and Software 2.0 truly compete are a very narrow niche. In the real world Software 2.0 will thrive in the next decades but Software 1.0 will be there just as much as it is today.
I doubt the premise that everything will be NNs, but I can't predict the future any more than anybody else.
What I find interesting in this imagined future is that problem definition usually happens, in my experience, while attempting to encode it, removing all ambiguity. If we skip that step n years from now, will we still even understand the problems we try to solve? Sounds scary to have systems where we can neither reason about solution nor problem.
I was more excited by the concept of Software 2.0 when it meant differentiable programming -- i.e. ordinary programs with logic etc, but everything is differentiable, optimizable and can gradually be replaced by opaque NN-based components. That would be largely compatible with current mainstream software development and allow gradual transition to highly-efficient domain-specific architectures.
This is quite an interesting take… I’m not yet convinced that neural networks are “computing” in the classical sense, but maybe that’s moot.
More interestingly, this makes me wonder if there are some Gödel-like proofs waiting out there that limit the capabilities of efficiently-optimizable programs. What new kinds of undecidable or uncomputable functions exist in the subspace of programs that an NN can learn? Would be exciting to find out.
The interesting part of that prediction is that depending on how you read it, you may say it failed embarrassingly, or you may say it predicted the current reality fairy well.
The next software as a differentiable thing that is the program has certainly failed. But now, there are amazing opportunities to connect text-to-text models to other another, to search engines, that it is likely to become a new programming.
I think a lot of people are interpreting this (implicitly) with a kind of Generative AI lense. However, this article was much more about "classical" ML type work, e.g. training your own Neural Net while I think the paradigm seems to be shifting towards "Zero or Few-Shot" learning e.g. there is no more fine tuning or even complete training step involved.
First there was software of the Enigma machine variety, then assembly, then massive IBM machines running COBOL, then C, then Ruby/PHP/Python. We could also talk about how networking and persistence fundamentally changes software. Each of those is a big iteration in itself, probably just as big as moving to ML generated code.
I remember there is a great analog for the era of the neuro network. The neuro network is a binary encoding of an algorithm/program. It could therefore be very efficient and simple. We will move on to get comfortable with it just like we are now comfortably familiar with the binary encoding of information (files).
It's an interesting take which makes total sense with stuff like self driving. But I have a hard time picturing this approach for making UIs, data models, cruds, gameplay, audio, graphic engines, etc.
Nothing is really a silver bullet so I guess the future of programming is really hybrid. Stuff like Github's Copilot.
Why do you think neural nets will be bad at making UIs, data models, cruds, gameplay, audio, graphic engines, etc? There are already some compelling examples of AIs making progress on those kinds of tasks.
I'm skeptical about ML being a "2.0" when we'll never stop using the 1.0 version altogether. But maybe he means it in the Python 2 -> 3 sense, where there's no real expectation that the older version will ever stop being used.
This begs the question: what data is going to be used as a training set? How do we make sure our black boxes don't become just echo chambers giving us answers we already know?
At the end of the day it is always: garbage in, garbage out.
So we have Web 2.0 thanks to which the Most Important Extension Distill is installed in my browser. Because without it, my head hurts a lot from Web 2.0.
Ok.
I feel like I'll have to install something that will make life easier with Software 2.0.
Maybe I missed the quickstart, but is CUDA the next step to doing those matrix multiplications. Or do I start with PyTorch or TensorFlow? Or do I start by classifying lots of data to train the classifier?
But all of these models are learned from code written by humans. We've nowhere near enough "training data" to be able to generate all the software needed moving forward.
I apologize for the confusion. In this context, the word happily is somewhat equivalent to "perform without reservation". The comment's sentence can be rewritten as, "ChatGPT will answer that for you."
The difference is a slight loss of emphasis that had been meant to show chatgpt doesn't require many prompts in order to convince the model to answer the situation that you had posed. The word "happily" wasn't used in the sense of chatgpt experiencing emotions
There’s a huge amount of pessimism and downright snark in the comments. I see a call to action to improve the state of the art of the tooling universe around ML, to make it even more broadly applicable, more understandable, and probably unlock huge economic value.
How about a more glass half full take on progress?
Machine Learning and Neural Networks are not applicable for solving all kinds of problems. In fact, in Tesla AI Day Elon Musk remarked that "I discourage the use of machine learning... unless you have to use machine learning, don't do it" [1]. Getting machine learning to work right is hard. If it can be avoided it should be.
So to cast this as Software 1.0 vs 2.0 doesn't make sense. There is a class of problems where neural networks work better. Everywhere else we will continue to use traditional code.
to be clear, i don't think that the current crop of llms can write better or even good enough software. it may well be that llms won't be the cornerstone of the future predicted in the article. still, there is a huge economic incentive to replacing "manual" programmers, and some attempt is bound to succeed eventually.
Based on what I can achieve with GPT in a non-Python language as a junior junior developer, I really see almost no limit to what an CS educated experienced mid developer could achieve with GPT and Python experience.
Yes there is a lot of hand-holding and guardrailing but still, it's kind of insane.
my experience is that chatGPT in its current form is much more of a hindrance than help. it takes longer to achieve the same desired result than just writing it directly. it seems really good at refactoring type tasks (i.e. rename method, move this block to a different routine, etc.) but it needs real ide integration for that to be useful.
I read it as, people will direct AI to search for solutions, then use these refined solutions to search for more solutions. A bit like how we use libraries and packages, and improved languages, to enhance traditional programming practices.
Taking an extreme helicopter view I think I can see it, but on the other hand I'm not convinced it's a very interesting observation. Throughout history we used machines to make more complicated machines.
Edit with this quote:
Because you only have to provide Software 1.0 implementation for a small number of the core computational primitives (e.g. matrix multiply), it is much easier to make various correctness/performance guarantees.
This is true, but it raises another question which is (to me) comically hand waved aside:
how do you make correctness guarantees of the output of the neural net? It's not addressed, probably because it's very hard to do so.
It's like the NAND gates inside CPUs and GPUs. Since they are so simple building blocks, they are very easy to verify.
It does not follow that the business logic I write to run on these things, is easy to verify.
The same goes for neural nets, but more so.
I'm not saying these new AI tools aren't useful, they are. But it's easy to misunderstand what they can do.
The "correctness" of neural networks is a vasy field of active research, with many prominent minds in deep learning as well as traditional CS theory working on it. We have many results for small scale networks, but I don't know of any results that can "prove" the correctness of an image classifier, for example. After some point, the correctness of such methods becomes very ill-defined, since unlike the normal programming world where everything is mostly in binary, here you will have to answer questions with some variation of "this is 98% likely" with no scope for 100% certainity.
They definitely don't. There's no way GPT-3/4 replaces software development, but it will save us some typing with fancy autocomplete and prose-ey documentation!