Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Trying to Understand Copilot's Type Spaghetti (rtpg.co)
77 points by mooreds on April 15, 2024 | hide | past | favorite | 83 comments


So, that code gets me thinking about premature optimization. In the "verschlimmbessern born of optimizing the wrong thing because you didn't use measurement to guide your efforts" sense.

The rule of thumb I hear is that, in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place. I've never tried to measure this myself, but it doesn't seem to be wildly off from what I see at work, so let's go with it. Let's also assume, for the sake of argument, that these AI tools double the productivity of people writing new code. Or, equivalently, they halve the time it takes to write it. (This is a lot more than what I see people typically claim, but I'm trying to steelman and also it makes the math easier.)

Anyway, this would imply that, speaking purely in terms of raw productivity (not, say, security or correctness) the AI coding assistant is a net win when the code that's written using it is less than 5% more difficult to maintain.

And I'm inclined to think that, if Copilot & friends are enabling more people to write more code that looks like what we see in this article (also not wildly off from what I see at work), then it's hard to see how they could possibly be making codebases where they are used less than 5% more expensive to maintain.


Can you elaborate on this part? I am trying to understand.

> in a mature product, reading and maintaining code takes about 10 times as much effort as writing it in the first place

Wouldn't that imply that rewriting a mature codebase from scratch would take ten times less effort than reading and maintaining it?

I only saw two instances where (different) management approved a complete rewrite of a working product and in both cases it turned out to be a disaster, because it is easy to severely underestimate how much effort it will take to match the features, quality and performance of the old codebase. I suspect it is almost universally better to refactor the old codebase incrementally.

Based on that, I take that you mean something else and I didn't get your point.


Absolutely no.

If a line of code belongs in a project with one file and a main() function, the presumption of impact of that code line on overall code paths is trivial.

If that line of code belongs in a library procedure used by a million LOC project, presumption cannot be done if you don't the project internals and tooling.

Rewriting entire systems or frameworks because one thinks that it's hard to implement a certain class of features is almost always a recipe for disaster.


OP said “writing it in the first place” not “rewriting”

But I agree that their numbers don’t really make sense. Software dev is hard to quantify like that.


> OP said “writing it in the first place” not “rewriting”

They did, but how is that significantly different?


Every line of code in the first implementation is there for a reason -- it has an implicit history, and implements a requirement. If that requirement is not captured elsewhere, then the prerequisite for /re/writing a system is to read the first implementation, understand the implicit requirements in the code, capture those requirements (or explicitly eliminate them if incorrect), and /then/ start writing version two. So /re/writing a system is a strict superset of both reading a previous implementation and implementing, and is thus harder than either.


> So /re/writing a system is a strict superset of both reading a previous implementation and implementing, and is thus harder than either.

As you said, every line in the old product had a reason to be. You can learn that reason in two ways: the way it was done in the first place, or by analyzing the old code. My argument is that the amount of effort required in both cases is in the same ballpark -- it will depend on a number of factors such as the quality and quantity of the documentation and testing in the old product, the availability of people who worked on the first product, etc.

So, no, rewriting a system is not automatically more work than writing the original one. But it is certainly not automatically less work either, and many people make the incorrect assumption that rewriting a system is going to take much less work than fixing the existing one.


True.

That said in any non-trivial project, writing in the first place and rewriting aren't too different from each other in practice.

Assuming someone doesn't have the ability to maintain a consistent and always-accurate mental map of tens of thousands of lines of code, most of which they probably didn't write themself and some of which they might have never actually looked at before, they're going to very rapidly reach a point where most of the work involved in adding a new feature consists of reading existing code to understand how it behaves and how the new code needs to interact with it. So the expensive part of adding new code to a work in progress, and the expensive part of rewriting existing code, are more-or-less one and the same.


Very well put


from the original tweet linked in the post "ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch."

I don't really agree that code is superhuman if VERY few is able to understand it haha..! Code should complex but easy to follow to make it brilliant in my opinion


I think Kernighan said something along the lines of "Because debugging code is twice as hard as writing it, only write code half as smart as you are or you'll never be able to fix it later". AI-assisted code generators seems to make this problem much worse as I can now write code 2x, or 3x as smart as I am. What hope will there ever be in debugging this?

A more optimistic take is that maybe such tools will let us write competent code in languages we do NOT specialize in, and in the future either a more competent version of ourselves or some actual expert can fix it if it breaks? That doesn't sound a whole lot better :/


The optimistic take is a feedback loop that makes them both better and more capable, writing code you can't debug is bad, but it could help you understand and build your own skills to a point that you can.

The pessimistic and more likely outcome is that people just want shit done and so they will slap any half working garbo together as they have done for the last 20 years I have been in the industry.


And AI-assisted code is almost certainly code that AI can't debug...


What if I told you that one can write code again from scratch instead of fixing broken one?

Writing replaceable code instead of maintainable code seems to be already working for a lot of projects. With LLMs and all that fast computing we have it seems it will be more of replaceable code in the future.

Of course there are always projects where it will never work.

It already works for infrastructure as nowadays servers are not fixed and treated like important things but you spin up fresh one.


> Writing replaceable code instead of maintainable code seems to be already working for a lot of projects.

I contend that these two concepts aren't different. If you have the ability to easily replace a small part of the code and have everything still work, then that's very maintainable. Unless you are talking about throwing out the whole codebase and replacing it, which for sufficiently complex codebases will inevitably lead to the second system effect. If replacing the whole thing is easy, it was probably not that valuable or complex to begin with, but that's not the kind of code where maintainability is paramount.

> It already works for infrastructure as nowadays servers are not fixed and treated like important things but you spin up fresh one.

For that metaphor to work, the programmer would have to be patching the binary output instead of fixing the source code and recompiling.


> If replacing the whole thing is easy, it was probably not that valuable or complex to begin with, but that's not the kind of code where maintainability is paramount.

Yes I think even as far as 70%-80% of code - IMHO - is not complex/valuable.

Most of code is replaceable CRUD not controls systems for flying and landing Falcon 9. So it is people problem not a technical problem, people pretend that they need "button more to the left and different shade" just to feel more important. Where we end up with loads of systems that do slightly different thing.


This is a very extreme take to something far more nuanced... It's possible for code to be complex and valuable without meeting the extremes of "is a literal rocket control system" and "is some buttons on a form".

When I say valuable, I mean provides value to a business. When I say complex, I mean not buttons on a form.


You spin up a fresh server from a cookie cutter image that lets you create a practically unlimited number of identical servers.

That's not even remotely like how rewriting code works. Rewriting code is more comparable to what spinning up a fresh server was like a quarter century ago. So, back in the days where they were important things because spinning up a new one was an unholy PITA and literally never went off without a hitch.


That’s not the claim. It’s well commented and formatted so actually quite readable. The claim is that very few could write it.

Though I would say that ‘very few’ is a larger group than they think - there are plenty of people doing metatype programming in TS; I’ve dabbled enough that given the problem I could probably tackle it and I know I learned from seeing others do it (because I am far from a typescript professional). So it’s not ‘superhuman’ if many of the humans who have found themselves wanting to work with the typescript type derivation model could have written it.

These capabilities - type ternaries and inferred type parameters - were put into TypeScript with a view that humans would use them.

The danger here is kidding yourself that this sort of code is beyond human.


I don't think that this is well commented. It explains what each line does, but it does not explain the overall technique being applied here and what it achieves. "Ah, if the parameter is optional I include undefined" is not useful. Instead what this wants is a block comment at the top explaining what this type achieves, how it should be used, and the strategy employed to construct this type.


> It’s well commented and formatted so actually quite readable.

Are you looking at the same code I'm looking at? The first block of code from the article?

I've seen more readable code in Perl Golf competitions.


That’s just how type programming looks.

Here’s (https://github.com/unional/type-plus/blob/main/packages/type...) a human written example that’s well-factored and uses loads of subtypes to clarify what it’s doing - but it’s still going to read like the black tongue of Mordor to you if you’re not familiar with how this kind of type stuff is structured and used.

And factoring all that stuff out may help readability but it doesn’t help comprehension - try and trace what the actual underlying type definitions for some of those utility types like IdentityEqual<> are actually doing (look at https://github.com/unional/type-plus/blob/main/packages/type...) and realize the rabbit hole runs deep in this stuff.


The fact that unclear code is common does not make it clear.

Single-letter variable names, ternary operators and long run-on lines, like salt, are delicious in small quantities. But when you're writing code so complicated you need multiple comments, within a single statement?

I appreciate that you've got to piss with the cock you've got. I've certainly done things with C++ template metaprogramming, with perl, with TLA+ and with bash that I'm not proud of. But the fact the tool forces me to write something unclear doesn't mean I'm not writing something unclear.


Can I ask more about the TLA+ spaghetti you had to write?


In the defence of TLA+ I was a complete newbie.

I mention it only because certain parts of the experience shared the maths-looking big-sequence-of-ands-and-ors one sometimes sees when dealing with complex type systems. TLA+, very sensibly, lets you break up complex statements into multiple smaller statements and give them names.

I was simulating the behaviour of two ends of a communication channel which had been designed with a bidirectional message number/acknowledge/timeout/resend mechanism. And both a hardware and a software state machine on each end managing the message queue in each direction.

The experience was in equal parts brilliant and frustrating. Brilliant because it did reveal faults in the design that probably couldn't have been found any other way. Frustrating because (for example) if you represent your timeout mechanism as a counter on each end of the link and which can only count from 0 to 2 the check will succeed in a few seconds. But if you change the timeout behaviour to count from 0 to 10 the check will take a lot longer. I don't know exactly how long because I aborted the run after a week. And there was no real indication of precisely why it'd stopped completing - I basically had to undo my changes one line at a time until I figured out what was going on.

TLA+ also seemed to have some sort of TeX integration. So a lot of the documentation is written in TeX - it's a fine way of writing math I'm sure. But sometimes the documentation told you how to do something and showed an example statement with a dot, yet you couldn't copy-and-paste the example from the documentation into your code, to express it in your source code you had to use something like \cdot or \circ instead.

Interesting stuff to try out for sure. And impressive given the size of the team that works on it, which I gather is tiny.


Good to know, thank you. I write a lot of teaching material for TLA+ and this will be helpful to think about.


> That’s just how type programming looks.

Then that is an argument to do less of it. Or at least to back off and learn to write better types that are not a mess.


TS type formatting is an unpleasant language to write in, but it’s touring complete and not that different from C++ type formatting. Writing complex statements in it is just like complex statements in any language - break it down to understandable parts and build those up.


[flagged]


Blog author, yes on being a PLT dork. It was mainly very funny to me how I _immediately_ knew what category of code the tweet's snippet was about. Just instantly knew this was a type spec translator. Seen 10 of them, seen them all.


I have `MaybeRequired` serving almost the exact same purpose! You are right it’s a common problem, a useful approach, and I just wish TS had better support for case analysis than chained ternaries.


Even an average programmer can understand it if they bothered to read the TypeScript documentation. “Superhuman” in this case means “has read the manual”.


That's also how you become an extreme outlier at Bash. Actually read the manual.


My favorite kind of manual to actually read is sprinkled with hijinks-enabling arcana, and the Bash manual doesn't disappoint on this front.


I read hundreds of the funny chat logs!


No average person would be happy if they saw that mess


Seems like the tweet is another AI hype pr piece. Since Devin was making outlandish statements, CoPilot can't fall far behind.


I agree! This is not superhuman code, this is machine code.


Technically they're saying they can read it but wouldn't be able to come up with it on their own.

Which is impressive, generally reading code is considered harder than reading it so in that sense it is inhuman.


ChatGPT gives me garbage code unless I ask it politely not to. No joke. Usually the first attempt is pure garbage and I have to call it out as such and then it’s like, “you’re right! Here’s the updated code”. No idea why it can basically never get it right the first time. I also find that it can be quite redundant and offer two distinct solutions morphed into one mutant answer which will turn the undiscerning 1x developer into a -10x developer. But hey, it still saves me time. Sometimes..


Are you using 4? I've had great luck with 4-Turbo and Opus


maybe ChatGPT crawls stackoverflow, stackoverflow #1 answer old or just garbage, second answer better but not the upvoted,

ChatGPT gives you #1 basically (with some small tweaks but still it is garbage)

you tell it that's garbage

it sees in comments "why is this upvoted answer, answer two is clearly better"

it returns answer two with a few tweaks.


Hi, CopiotKit CEO here (I wrote the original viral tweet). This article is great! Thanks for posting.

I'd also written an analysis of the code - including announcing a $1000 prize for the best alternative code: https://ai88.substack.com/p/ceiling-has-been-raised-analyzin...

We were going to announce the winner this week but if we get a few more submissions we will definitely consider them.

Just submit a PR to https://github.com/CopilotKit/CopilotKit


Could you help me understand which Copilots are involved in this?

Your tweet at https://twitter.com/ataiiam/status/1765089261374914957 mentions "Cursor's copilot".

The blog post at https://rtpg.co/2024/03/07/parsing-copilots-type-spaghetti/ talks about GitHub Copilot - did they make a mistake there?

And your product is CopilotKit - is that related to the GitHub and Cursor Copilots in some way or is it something different?


Hah, too many copilots… let me try to clarify:

We are building CopilotKit = a framework + platform for building context-aware AI assistants into any application (not necessarily coding related applications).

Part of CopilotKit is about giving the Copilot / AI agents access to the application through a typed "inline realtime API". And to make ergonomics great for _our_ users, CopilotKit ships with hardcore type programming.

Cursor's Copilot (unrelated to CopilotKit) helped us write this type code (that's what went viral). And yes, GitHub Copilot is a mis-attribution.


I'm going to make a completely fresh alternative written from the ground up. When I'm done you can all find it easily because it's called CoPilot, shouldn't be hard to find.


The people behind Google's chat product naming could really learn a lot from this situation


Indeed. For example, they could learn that Copilot is a winning name so lets go with that.


You love swift ??


That's free training/fine-tuning material, right?


By the way - we practice what we preach- Copilots raise the bar (or the ceiling...) on human productivity - in every domain. Which is why we're building infrastructure to make building copilots easier...

What ideas do you think are still missing from today's Copilots?

I.e. suppose we were looking back at today's Copilots 5 years from now- besides better models, what else has changed?


I've been noticing a steady uptick in increasingly complex types like this making it into libraries/@types packages in the definitely-typed repo, and I'm concerned. There are potentially severe performance implications for being too clever, especially since the TS compiler is written in TS. For example, recursive types can seriously bog down the compiler/checker. It doesn't take long to start hitting diminishing returns, especially in larger codebases. You either get perfect type checking while the TS language server uses 800% of your CPU, or you bite the bullet and supplement the lack of typing with unit tests. I think rewriting TS in a more performant language like Zig or Rust would alleviate this to some extent, but TS will still give you more than enough rope to hang yourself with.


Why's these seen as being difficilt to write? It's a giant switch statement that recurses. This is less indicative of AI coming a long way and more of programmers never working on a program that stores types as data, this being the most common and rote pattern that exists.


When I read the first sentence

> The other day this snippet of Typescript generated by Github copilot was floating around

I was wondering, did that happen on Twitter/X?

And I was not disappointed. That is the place where these people discuss things and take it for granted. Apparently, if you are not using Twitter, you are not part of the conversation.


We’re really doing LLM hermeneutics now.


The future is going to be a horrible place for people taking over legacy code-bases. It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible,

Sometimes you digress to simpler types not because you couldn't write more complex types, but because you realise that the next person who needs to make changes to this piece of code is going to have a bad time.


> It was difficult enough with other people spaghetti code, but being able to generate vast amounts of hard to decipher spaghetti code it only going to make this horrible

I've seen some legacy code and the kind of spaghetti humans are able to generate - especially with multiple layers of "I'll just throw in something to make feature X work/fix that bug" - is pretty bad already. I honestly doubt code generation will make things much worse.


I fear it will be a quantity issue more than a quality one.

Why refactor some code to make it extensible, when AI can write a whole new module from scratch?

Don't understand the 10 AI written classes? Leave them be and have AI write some more.

Once we get AI code deletion and refactoring I guess it will be ok, but generative models are a problem because code is a liability.


> generative models are a problem because code is a liability.

I wish I could upvote this many times. This is a major, major problem. AI may get your version 1 out the door faster. But when you try to modify it to create version 2...

But someone may ask, "How is this different from any other machine-generated code?" It's different because those tools were built very carefully to generate correct code, and the generated code usually did not have to be modified after it was generated. But when a generative AI is writing the bulk of your application code, neither of those conditions is true.


I spend most of my time with AI assistants simplifying complex code other humans, including me, have written.

They have made me much more efficient at refactoring spaghetti code and removing unnecessary complexity.

It all depends on how you use them. They can be a cause but also a solution to a problem that affects human codebases too.


I do think that people may let code be generated without really thoroughly checking every part of it and I see that as a problem.

That, in my opinion is a new "untrusted input" attack vector and it will abused (probably even more easily than how the recent xz backdoor happened), by either targeting the training data, using prompt injection or in other ways.

The scary part about it is that any contributer/commuter themselves is considered a "somewhat" trusted entry but if it is assumed that everyone uses code generation AI that entity cannot be trusted anymore.

At the same time, since all the AI generated code has to be thoroughly checked/reviewed, the assumed productivity gains are somewhat diminished.

I think how it will actually play out though is that security concerns will just be largely ignored as it often goes.


The difference is in the working memory, a human's is much smaller than a computer's, that leaves room for so much art (In the horror genre).


I think the difference is that such incidents were previously the exception in a codebase, in the future they will be the rule.


The first thing I see when I look at this is "Where are the unit tests?"

This is somewhere in the realm of "clever" or "efficient" code: it looks like it could be written in an easier-to-grok way as dozens of lines of if statements, but I assume there is a reason it wasn't (that is better than "just because"). AI-generated or not, someone or team is responsible for making sure the code/app/service is working for customers, and they have to be able to fix bugs, maintain and modify this.

Does it work today? As a reviewer (or coming across this while fixing a bug), I only know by either trying it, or spending quite a long time to understand and analyze it in my head. Unit tests are the easiest way of "trying it", with all the edge cases and then some.

What if I'm faced with modifying this in 6 or 12 months? Even if I wrote it, chances are my internal mental model is gone. I'd like some reasonable assurance I am not breaking anything.

Also, I'd like to not be the only one responsible for this forever, so I want to let other people modify it. Unit tests are the guard rails that let me say "Go ahead and do whatever you'd like, but don't break the tests".


You can scaffold up "unit tests", but honestly for type-level stuff you are working in a different space entirely. Your types are correctness proofs, so your underlying code is either typed correctly or not. There's not really a middle ground that unit tests catch.

Having said that, typescript's soundness issues make it easy to drive a truck through a certain kind of issue, but generally speaking if your type-level programming is no good you're not really going to be able to run unit tests, let alone validate them. Your code just won't go anywhere.


That's not really true; I see unit-tests for type-level stuff frequently. Here's some: https://github.com/RuyiLi/cursed-typescript/blob/master/type...


You can't test the function or endpoint from a business definition "does the right thing" perspective?


You cannot because if you're testing the type level wizardry, the failure case will not compile.


Maybe not checked in, but I've made "tests" before where the code really doesn't assert things, but whether it compiles or not is what I'm after.

It's far too easy in Typescript to do something where you actually lose your type safety without realizing. So I could for instance have some dummy code that calls foo.bar() but crashes in runtime, and my goal is to fix the typing throughout some generic functions so that it catches the mistake compile time.


Test your functions and endpoints, nobody cares about the internal wizardry. That's an implementation detail.

There's a reason you coded the endpoint, and often times the business logic required is inelegant, self-contradictory and stupid. That's what has to work.


Type level unit tests are indeed super helpful, and in my experience they are easier to write than “real” unit tests, because mocking is trivial.


These tools can also generate tests, the OP just doesn't discuss it.


So unraveling AI generated code is the new programming?


If you are the #2 or later programmer on a project, unraveling human generated code is the old programming.


A big enough quantitative change is a qualitative change. There is a big difference between a bad programmer who banged on that code for three days before finally getting it to do what they wanted it to do, and never went back to try to minimize it and clean it up, and that same programmer pushing a prompt into X-GPT and getting that code in five minutes, then moving on to do it again and again and again dozens of times faster than before.

Like everyone else here of any experience I too have waded through gooey code that was impossible to discern any purpose or design in, because there really wasn't any, after everyone was done hacking on it. But the hacking was still bounded by human speeds.

Our only two options for a code base produced that way would be 1. discard it and start over or 2. hope that the next-generation AIs that aren't just LLMs are able to clean it up, since "automatically cleaning up LLM-generated code bases" is going to be a rather lucrative field. LLMs, no matter how much you hypetrophy them, aren't suitable for coding at scale, and they can't be. Their architecture is just wrong. But that's a claim I only make about LLMs, not AI in general.


Based on what you're saying, it seems like in the future people will only choose 1, not 2.


It's my current bet but I'm not excited about bounding the capabilities of future non-LLM-based AIs.


What is typing. Lets say you have no hands, no eyes, and cannot speak or to speed this up- All you can do is communicate with foot taps (in great detail) to a human translator. Would we tell this person they are not programming?


Depends -- are the foot taps communicating a program or was the message for human consumption?

If self-modifying code executes in the forest but nobody is there to observe it, does it have an author?


Programming. All Im trying to illustrate is that, it's not the typing that makes the sausage, it's the ideas. Who cares how the code gets created, if the people involved care, they can provide input and assistance to make the final outcome exactly what they want.


What is your point?


They should have used copilot to write the comment, might hav been coherent them /s


Here you go:

Typing, in the broad sense, refers to the action of inputting information into a device, whether it's done via a keyboard, voice, or any other method. When you describe a scenario in which a person can only communicate through foot taps to a human translator, this still qualifies as a form of inputting information. The method of communication might be unconventional and require translation into a form understandable by a computer, but it remains a way of interacting with a device or system.

In the context of programming, what fundamentally matters is the ability to formulate logical instructions that a computer can execute. If a person can convey these instructions through foot taps, and these are then translated accurately into a programming language by a human or machine, this person is indeed programming.

Programming is defined by the cognitive process of solving problems and giving instructions, not necessarily by the physical act of typing these instructions in a conventional manner. Therefore, we would not tell this person that they are not programming; rather, they are programming using an alternative method of communication. This highlights the inclusive and adaptable nature of technology and programming, which can accommodate various methods of interaction to include individuals with different abilities.


Haha, it's my friend's tweet, Atai, from https://CopilotKit.ai he became a meme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: