So, that code gets me thinking about premature optimization. In the "verschlimmbessern born of optimizing the wrong thing because you didn't use measurement to guide your efforts" sense.
The rule of thumb I hear is that, in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place. I've never tried to measure this myself, but it doesn't seem to be wildly off from what I see at work, so let's go with it. Let's also assume, for the sake of argument, that these AI tools double the productivity of people writing new code. Or, equivalently, they halve the time it takes to write it. (This is a lot more than what I see people typically claim, but I'm trying to steelman and also it makes the math easier.)
Anyway, this would imply that, speaking purely in terms of raw productivity (not, say, security or correctness) the AI coding assistant is a net win when the code that's written using it is less than 5% more difficult to maintain.
And I'm inclined to think that, if Copilot & friends are enabling more people to write more code that looks like what we see in this article (also not wildly off from what I see at work), then it's hard to see how they could possibly be making codebases where they are used less than 5% more expensive to maintain.
Can you elaborate on this part? I am trying to understand.
> in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place
Wouldn't that imply that rewriting a mature codebase from scratch would take ten times less effort than reading and maintaining it?
I only saw two instances where (different) management approved a complete rewrite of a working product and in both cases it turned out to be a disaster, because it is easy to severely underestimate how much effort it will take to match the features, quality and performance of the old codebase. I suspect it is almost universally better to refactor the old codebase incrementally.
Based on that, I take that you mean something else and I didn't get your point.
If a line of code belongs in a project with one file and a main() function, the presumption of impact of that code line on overall code paths is trivial.
If that line of code belongs in a library procedure used by a million LOC project, presumption cannot be done if you don't the project internals and tooling.
Rewriting entire systems or frameworks because one thinks that it's hard to implement a certain class of features is almost always a recipe for disaster.
Every line of code in the first implementation is there for a reason -- it has an implicit history, and implements a requirement. If that requirement is not captured elsewhere, then the prerequisite for /re/writing a system is to read the first implementation, understand the implicit requirements in the code, capture those requirements (or explicitly eliminate them if incorrect), and /then/ start writing version two. So /re/writing a system is a strict superset of both reading a previous implementation and implementing, and is thus harder than either.
> So /re/writing a system is a strict superset of both reading a previous implementation and implementing, and is thus harder than either.
As you said, every line in the old product had a reason to be. You can learn that reason in two ways: the way it was done in the first place, or by analyzing the old code. My argument is that the amount of effort required in both cases is in the same ballpark -- it will depend on a number of factors such as the quality and quantity of the documentation and testing in the old product, the availability of people who worked on the first product, etc.
So, no, rewriting a system is not automatically more work than writing the original one. But it is certainly not automatically less work either, and many people make the incorrect assumption that rewriting a system is going to take much less work than fixing the existing one.
That said in any non-trivial project, writing in the first place and rewriting aren't too different from each other in practice.
Assuming someone doesn't have the ability to maintain a consistent and always-accurate mental map of tens of thousands of lines of code, most of which they probably didn't write themself and some of which they might have never actually looked at before, they're going to very rapidly reach a point where most of the work involved in adding a new feature consists of reading existing code to understand how it behaves and how the new code needs to interact with it. So the expensive part of adding new code to a work in progress, and the expensive part of rewriting existing code, are more-or-less one and the same.
from the original tweet linked in the post
"ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch."
I don't really agree that code is superhuman if VERY few is able to understand it haha..! Code should complex but easy to follow to make it brilliant in my opinion
I think Kernighan said something along the lines of "Because debugging code is twice as hard as writing it, only write code half as smart as you are or you'll never be able to fix it later". AI-assisted code generators seems to make this problem much worse as I can now write code 2x, or 3x as smart as I am. What hope will there ever be in debugging this?
A more optimistic take is that maybe such tools will let us write competent code in languages we do NOT specialize in, and in the future either a more competent version of ourselves or some actual expert can fix it if it breaks? That doesn't sound a whole lot better :/
The optimistic take is a feedback loop that makes them both better and more capable, writing code you can't debug is bad, but it could help you understand and build your own skills to a point that you can.
The pessimistic and more likely outcome is that people just want shit done and so they will slap any half working garbo together as they have done for the last 20 years I have been in the industry.
What if I told you that one can write code again from scratch instead of fixing broken one?
Writing replaceable code instead of maintainable code seems to be already working for a lot of projects. With LLMs and all that fast computing we have it seems it will be more of replaceable code in the future.
Of course there are always projects where it will never work.
It already works for infrastructure as nowadays servers are not fixed and treated like important things but you spin up fresh one.
> Writing replaceable code instead of maintainable code seems to be already working for a lot of projects.
I contend that these two concepts aren't different. If you have the ability to easily replace a small part of the code and have everything still work, then that's very maintainable. Unless you are talking about throwing out the whole codebase and replacing it, which for sufficiently complex codebases will inevitably lead to the second system effect. If replacing the whole thing is easy, it was probably not that valuable or complex to begin with, but that's not the kind of code where maintainability is paramount.
> It already works for infrastructure as nowadays servers are not fixed and treated like important things but you spin up fresh one.
For that metaphor to work, the programmer would have to be patching the binary output instead of fixing the source code and recompiling.
> If replacing the whole thing is easy, it was probably not that valuable or complex to begin with, but that's not the kind of code where maintainability is paramount.
Yes I think even as far as 70%-80% of code - IMHO - is not complex/valuable.
Most of code is replaceable CRUD not controls systems for flying and landing Falcon 9.
So it is people problem not a technical problem, people pretend that they need "button more to the left and different shade" just to feel more important. Where we end up with loads of systems that do slightly different thing.
This is a very extreme take to something far more nuanced... It's possible for code to be complex and valuable without meeting the extremes of "is a literal rocket control system" and "is some buttons on a form".
When I say valuable, I mean provides value to a business. When I say complex, I mean not buttons on a form.
You spin up a fresh server from a cookie cutter image that lets you create a practically unlimited number of identical servers.
That's not even remotely like how rewriting code works. Rewriting code is more comparable to what spinning up a fresh server was like a quarter century ago. So, back in the days where they were important things because spinning up a new one was an unholy PITA and literally never went off without a hitch.
That’s not the claim. It’s well
commented and formatted so actually quite readable. The claim is that very few could write it.
Though I would say that ‘very few’ is a larger group than they think - there are plenty of people doing metatype programming in TS; I’ve dabbled enough that given the problem I could probably tackle it and I know I learned from seeing others do it (because I am far from a typescript professional). So it’s not ‘superhuman’ if many of the humans who have found themselves wanting to work with the typescript type derivation model could have written it.
These capabilities - type ternaries and inferred type parameters - were put into TypeScript with a view that humans would use them.
The danger here is kidding yourself that this sort of code is beyond human.
I don't think that this is well commented. It explains what each line does, but it does not explain the overall technique being applied here and what it achieves. "Ah, if the parameter is optional I include undefined" is not useful. Instead what this wants is a block comment at the top explaining what this type achieves, how it should be used, and the strategy employed to construct this type.
Here’s (https://github.com/unional/type-plus/blob/main/packages/type...) a human written example that’s well-factored and uses loads of subtypes to clarify what it’s doing - but it’s still going to read like the black tongue of Mordor to you if you’re not familiar with how this kind of type stuff is structured and used.
And factoring all that stuff out may help readability but it doesn’t help comprehension - try and trace what the actual underlying type definitions for some of those utility types like IdentityEqual<> are actually doing (look at https://github.com/unional/type-plus/blob/main/packages/type...) and realize the rabbit hole runs deep in this stuff.
The fact that unclear code is common does not make it clear.
Single-letter variable names, ternary operators and long run-on lines, like salt, are delicious in small quantities. But when you're writing code so complicated you need multiple comments, within a single statement?
I appreciate that you've got to piss with the cock you've got. I've certainly done things with C++ template metaprogramming, with perl, with TLA+ and with bash that I'm not proud of. But the fact the tool forces me to write something unclear doesn't mean I'm not writing something unclear.
I mention it only because certain parts of the experience shared the maths-looking big-sequence-of-ands-and-ors one sometimes sees when dealing with complex type systems. TLA+, very sensibly, lets you break up complex statements into multiple smaller statements and give them names.
I was simulating the behaviour of two ends of a communication channel which had been designed with a bidirectional message number/acknowledge/timeout/resend mechanism. And both a hardware and a software state machine on each end managing the message queue in each direction.
The experience was in equal parts brilliant and frustrating. Brilliant because it did reveal faults in the design that probably couldn't have been found any other way. Frustrating because (for example) if you represent your timeout mechanism as a counter on each end of the link and which can only count from 0 to 2 the check will succeed in a few seconds. But if you change the timeout behaviour to count from 0 to 10 the check will take a lot longer. I don't know exactly how long because I aborted the run after a week. And there was no real indication of precisely why it'd stopped completing - I basically had to undo my changes one line at a time until I figured out what was going on.
TLA+ also seemed to have some sort of TeX integration. So a lot of the documentation is written in TeX - it's a fine way of writing math I'm sure. But sometimes the documentation told you how to do something and showed an example statement with a dot, yet you couldn't copy-and-paste the example from the documentation into your code, to express it in your source code you had to use something like \cdot or \circ instead.
Interesting stuff to try out for sure. And impressive given the size of the team that works on it, which I gather is tiny.
TS type formatting is an unpleasant language to write in, but it’s touring complete and not that different from C++ type formatting. Writing complex statements in it is just like complex statements in any language - break it down to understandable parts and build those up.
Blog author, yes on being a PLT dork. It was mainly very funny to me how I _immediately_ knew what category of code the tweet's snippet was about. Just instantly knew this was a type spec translator. Seen 10 of them, seen them all.
I have `MaybeRequired` serving almost the exact same purpose! You are right it’s a common problem, a useful approach, and I just wish TS had better support for case analysis than chained ternaries.
Even an average programmer can understand it if they bothered to read the TypeScript documentation. “Superhuman” in this case means “has read the manual”.
ChatGPT gives me garbage code unless I ask it politely not to. No joke. Usually the first attempt is pure garbage and I have to call it out as such and then it’s like, “you’re right! Here’s the updated code”. No idea why it can basically never get it right the first time. I also find that it can be quite redundant and offer two distinct solutions morphed into one mutant answer which will turn the undiscerning 1x developer into a -10x developer. But hey, it still saves me time. Sometimes..
We are building CopilotKit = a framework + platform for building context-aware AI assistants into any application (not necessarily coding related applications).
Part of CopilotKit is about giving the Copilot / AI agents access to the application through a typed "inline realtime API". And to make ergonomics great for _our_ users, CopilotKit ships with hardcore type programming.
Cursor's Copilot (unrelated to CopilotKit) helped us write this type code (that's what went viral). And yes, GitHub Copilot is a mis-attribution.
I'm going to make a completely fresh alternative written from the ground up. When I'm done you can all find it easily because it's called CoPilot, shouldn't be hard to find.
By the way - we practice what we preach-
Copilots raise the bar (or the ceiling...) on human productivity - in every domain. Which is why we're building infrastructure to make building copilots easier...
What ideas do you think are still missing from today's Copilots?
I.e. suppose we were looking back at today's Copilots 5 years from now- besides better models, what else has changed?
I've been noticing a steady uptick in increasingly complex types like this making it into libraries/@types packages in the definitely-typed repo, and I'm concerned. There are potentially severe performance implications for being too clever, especially since the TS compiler is written in TS. For example, recursive types can seriously bog down the compiler/checker. It doesn't take long to start hitting diminishing returns, especially in larger codebases. You either get perfect type checking while the TS language server uses 800% of your CPU, or you bite the bullet and supplement the lack of typing with unit tests. I think rewriting TS in a more performant language like Zig or Rust would alleviate this to some extent, but TS will still give you more than enough rope to hang yourself with.
Why's these seen as being difficilt to write? It's a giant switch statement that recurses. This is less indicative of AI coming a long way and more of programmers never working on a program that stores types as data, this being the most common and rote pattern that exists.
> The other day this snippet of Typescript generated by Github copilot was floating around
I was wondering, did that happen on Twitter/X?
And I was not disappointed. That is the place where these people discuss things and take it for granted. Apparently, if you are not using Twitter, you are not part of the conversation.
The future is going to be a horrible place for people taking over legacy code-bases. It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible,
Sometimes you digress to simpler types not because you couldn't write more complex types, but because you realise that the next person who needs to make changes to this piece of code is going to have a bad time.
> It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible
I've seen some legacy code and the kind of spaghetti humans are able to generate - especially with multiple layers of "I'll just throw in something to make feature X work/fix that bug" - is pretty bad already. I honestly doubt code generation will make things much worse.
> generative models are a problem because code is a liability.
I wish I could upvote this many times. This is a major, major problem. AI may get your version 1 out the door faster. But when you try to modify it to create version 2...
But someone may ask, "How is this different from any other machine-generated code?" It's different because those tools were built very carefully to generate correct code, and the generated code usually did not have to be modified after it was generated. But when a generative AI is writing the bulk of your application code, neither of those conditions is true.
I do think that people may let code be generated without really thoroughly checking every part of it and I see that as a problem.
That, in my opinion is a new "untrusted input" attack vector and it will abused (probably even more easily than how the recent xz backdoor happened), by either targeting the training data, using prompt injection or in other ways.
The scary part about it is that any contributer/commuter themselves is considered a "somewhat" trusted entry but if it is assumed that everyone uses code generation AI that entity cannot be trusted anymore.
At the same time, since all the AI generated code has to be thoroughly checked/reviewed, the assumed productivity gains are somewhat diminished.
I think how it will actually play out though is that security concerns will just be largely ignored as it often goes.
The first thing I see when I look at this is "Where are the unit tests?"
This is somewhere in the realm of "clever" or "efficient" code: it looks like it could be written in an easier-to-grok way as dozens of lines of if statements, but I assume there is a reason it wasn't (that is better than "just because"). AI-generated or not, someone or team is responsible for making sure the code/app/service is working for customers, and they have to be able to fix bugs, maintain and modify this.
Does it work today? As a reviewer (or coming across this while fixing a bug), I only know by either trying it, or spending quite a long time to understand and analyze it in my head. Unit tests are the easiest way of "trying it", with all the edge cases and then some.
What if I'm faced with modifying this in 6 or 12 months? Even if I wrote it, chances are my internal mental model is gone. I'd like some reasonable assurance I am not breaking anything.
Also, I'd like to not be the only one responsible for this forever, so I want to let other people modify it. Unit tests are the guard rails that let me say "Go ahead and do whatever you'd like, but don't break the tests".
You can scaffold up "unit tests", but honestly for type-level stuff you are working in a different space entirely. Your types are correctness proofs, so your underlying code is either typed correctly or not. There's not really a middle ground that unit tests catch.
Having said that, typescript's soundness issues make it easy to drive a truck through a certain kind of issue, but generally speaking if your type-level programming is no good you're not really going to be able to run unit tests, let alone validate them. Your code just won't go anywhere.
Maybe not checked in, but I've made "tests" before where the code really doesn't assert things, but whether it compiles or not is what I'm after.
It's far too easy in Typescript to do something where you actually lose your type safety without realizing. So I could for instance have some dummy code that calls foo.bar() but crashes in runtime, and my goal is to fix the typing throughout some generic functions so that it catches the mistake compile time.
Test your functions and endpoints, nobody cares about the internal wizardry. That's an implementation detail.
There's a reason you coded the endpoint, and often times the business logic required is inelegant, self-contradictory and stupid. That's what has to work.
A big enough quantitative change is a qualitative change. There is a big difference between a bad programmer who banged on that code for three days before finally getting it to do what they wanted it to do, and never went back to try to minimize it and clean it up, and that same programmer pushing a prompt into X-GPT and getting that code in five minutes, then moving on to do it again and again and again dozens of times faster than before.
Like everyone else here of any experience I too have waded through gooey code that was impossible to discern any purpose or design in, because there really wasn't any, after everyone was done hacking on it. But the hacking was still bounded by human speeds.
Our only two options for a code base produced that way would be 1. discard it and start over or 2. hope that the next-generation AIs that aren't just LLMs are able to clean it up, since "automatically cleaning up LLM-generated code bases" is going to be a rather lucrative field. LLMs, no matter how much you hypetrophy them, aren't suitable for coding at scale, and they can't be. Their architecture is just wrong. But that's a claim I only make about LLMs, not AI in general.
What is typing. Lets say you have no hands, no eyes, and cannot speak or to speed this up- All you can do is communicate with foot taps (in great detail) to a human translator. Would we tell this person they are not programming?
Programming. All Im trying to illustrate is that, it's not the typing that makes the sausage, it's the ideas. Who cares how the code gets created, if the people involved care, they can provide input and assistance to make the final outcome exactly what they want.
Typing, in the broad sense, refers to the action of inputting information into a device, whether it's done via a keyboard, voice, or any other method. When you describe a scenario in which a person can only communicate through foot taps to a human translator, this still qualifies as a form of inputting information. The method of communication might be unconventional and require translation into a form understandable by a computer, but it remains a way of interacting with a device or system.
In the context of programming, what fundamentally matters is the ability to formulate logical instructions that a computer can execute. If a person can convey these instructions through foot taps, and these are then translated accurately into a programming language by a human or machine, this person is indeed programming.
Programming is defined by the cognitive process of solving problems and giving instructions, not necessarily by the physical act of typing these instructions in a conventional manner. Therefore, we would not tell this person that they are not programming; rather, they are programming using an alternative method of communication. This highlights the inclusive and adaptable nature of technology and programming, which can accommodate various methods of interaction to include individuals with different abilities.
The rule of thumb I hear is that, in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place. I've never tried to measure this myself, but it doesn't seem to be wildly off from what I see at work, so let's go with it. Let's also assume, for the sake of argument, that these AI tools double the productivity of people writing new code. Or, equivalently, they halve the time it takes to write it. (This is a lot more than what I see people typically claim, but I'm trying to steelman and also it makes the math easier.)
Anyway, this would imply that, speaking purely in terms of raw productivity (not, say, security or correctness) the AI coding assistant is a net win when the code that's written using it is less than 5% more difficult to maintain.
And I'm inclined to think that, if Copilot & friends are enabling more people to write more code that looks like what we see in this article (also not wildly off from what I see at work), then it's hard to see how they could possibly be making codebases where they are used less than 5% more expensive to maintain.