I've seen the Bun Zig->Rust MR a few weeks ago when it was current. Now I'm seeing this, and I have to ask, since you're here:
Is there no way to make this changeset smaller?
At work, I've usually written large patches. I used to be worse at it. I was mentored out of it, and while I still like my patches to be complete, I balance that with the available bandwidth of the team and what the team can reasonably actually process.
For perspective, my "large patches" were PRs on the order of 10-12kLOC for relatively big features. I consider those to be on the upper end of what is reasonably reviewable by a small, non-dedicated team, and towards the upper limit of the kind of PR where I can speak for nearly every line of code, what it does, and why it's there.
On the other hand, now, LLMs are part of the equation, and they can (and often do) write code in insane volumes. They arguably tend towards extreme verbosity, without even talking about docs/markdown files. While LLMs are part of the workflow, my company, and those my friends work at, have all instituted policies of the developer attaching their name to the code ultimately being responsible for the output (which IMO is a lazy strategy, but I can't think of a much better one under the circumstances).
I cannot, personally, fathom how you can stand behind a single changeset spanning 2000 files and a quarter-million lines of diff. Do you consider this sustainable?
At this point the code bases are very quickly getting away from us in the open source community and even in proprietary code bases, and these are important code bases. Often very complex, often legacy. Who ultimately still owns these? Who's really going to be accountable if things go wrong?
It's easier to justify in a fast-moving greenfield code base with a verbose language... but I won't defend it. I've gotten better and I'm still getting better at breaking these things up.
I brought the 10-12kLOC PR up as an example supporting my point of view. I don't encourage the behaviour. Most of my PRs these days fall under the 1500LoC mark, tops -- maybe a bit more if it's a tricky component that needs a ton of tests.
10k in 2hrs is 1.5 lines of code per second for 2 hours straight without spending any time to make comments, think about what the code is doing, etc.
In pre-ai era that is just skimming and trusting the person who wrote it or the code changes are largely auto-generated or there exists an exceedingly simple test suite that is incredibly verbose.
Post-ai you are ruining your code base, I probably have to spend 3-5x longer reviewing ai generated code, the code they write tends to be too verbose, mediocre, filled with subtle bugs, adds unnecessary comments, etc. If someone gives me 10k loc pr it's a sure thing they've just let the ai run loose and I'd just tell them what they need to change in general terms instead of wasting days of my time reviewing junk.
> I was mentored out of it, and while I still like my patches to be complete, I balance that with the available bandwidth of the team and what the team can reasonably actually process.
I struggle with the same issue. In my experience you can't reduce the total number of lines. If the feature took 10k or 15k loc to deliver it, you aren't going to be able to reduce that meaningfully.
You can usually break it into stacked series of commits. New code can be split up into stand alone modules, which all compile and pass their unit tests. They could even be shipped, although they wouldn't be used because the changes to the UI are always the last piece of the puzzle. If you are refactoring, you can usually find a way to split the refactor into smaller steps, each building on the previous one. That is almost certainly possible in this case.
The issue with both approaches is while you can review each step independently, what you miss by looking at just that commit is the motivation for doing it. You can only get that from the big picture, and to get the big picture you need the entire 10k or 15k loc available.
That means you have to push the entire series of commits. If you want to make it plain they are individually reviewable you push them as stacked PRs. Either way, it's a 15k loc push.
I don't see a way out of that for the same reason neither bottom up nor top down design works on their own. You have two edges - the upper (often the UI), and the lower (the OS, standard library - things you have to use to get anything done). You work from both edges simultaneously, each working towards the other, hopefully so that when they meet in the middle and the two fit together nicely. The point is, you have to review like that too. You can't just look at how neatly the blocks are stacked on the foundations, you have to evaluate if they are taking the best route to the destination. The review can fail in both ways - the UI can be beautiful but it stands on a mess, or the code could have built up a beautiful series of abstractions that bubble through to the top level and ultimately confuse the end user. So you have to review the code from both perspectives, and to do that you ultimately need to get your head around all 15k loc.
This means a reviewer demanding they be spoon fed a few thousand lines of code at a time is being as unreasonable as the person delivering 15k loc in one commit. They are demanding a simple solution to their problem, and it is wrong. They should be demanding all 15k loc be delivered in the form the author intends to ship, but split into digestible commits that clearly explain the path and reasoning taken between the top and bottom edges, so both the top and bottom level designs are plainly visible.
What happens when I do that is I get into fights over forced pushes. Everyone hates them, and for good reason. They asked for a simple change in their review, and what they want to see is a small commit reflecting their request. Hiding that by doing a rebase is met with howls of pain: "no forced pushes!". So you insert your commit reflecting just the change they asked for into the stack of commits the large feature necessitated. Doing that rebases every commit that depends on it, of course. You push the result and are treated with a chorus of "NO FORCED PUSHES".
Forbidding all forced pushes makes about as much sense as forbidding a 15k loc change, even through its well-structured into commits. It makes me wonder if unis bother to teach modern software engineering practices.
Is there no way to make this changeset smaller?
At work, I've usually written large patches. I used to be worse at it. I was mentored out of it, and while I still like my patches to be complete, I balance that with the available bandwidth of the team and what the team can reasonably actually process.
For perspective, my "large patches" were PRs on the order of 10-12kLOC for relatively big features. I consider those to be on the upper end of what is reasonably reviewable by a small, non-dedicated team, and towards the upper limit of the kind of PR where I can speak for nearly every line of code, what it does, and why it's there.
On the other hand, now, LLMs are part of the equation, and they can (and often do) write code in insane volumes. They arguably tend towards extreme verbosity, without even talking about docs/markdown files. While LLMs are part of the workflow, my company, and those my friends work at, have all instituted policies of the developer attaching their name to the code ultimately being responsible for the output (which IMO is a lazy strategy, but I can't think of a much better one under the circumstances).
I cannot, personally, fathom how you can stand behind a single changeset spanning 2000 files and a quarter-million lines of diff. Do you consider this sustainable?
At this point the code bases are very quickly getting away from us in the open source community and even in proprietary code bases, and these are important code bases. Often very complex, often legacy. Who ultimately still owns these? Who's really going to be accountable if things go wrong?