Hacker News new | past | comments | ask | show | jobs | submit login
Claim: the ideal PR is 50 lines long (graphite.dev)
84 points by samscully 4 months ago | hide | past | favorite | 124 comments



I have found that teams who allow larger PR sizes without complaining about it can get a lot done a lot faster. Which is to say having 50 line PRs are not ubiquitously ideal, but are somewhere on the spectrum of acceptable but come with definitive tradeoffs. Reviewing and merging 10x60 line PRs is, in my experience, more time intensive than reviewing one 600 line PR.

Most of the tests for and new functionality alone in any of our PRs well exceed 50 lines. Indeed we have 2-3x more test code than actual code. So should we be writing less tests? Merging the code first and later the tests? Merging the broken, codeless tests first and then the code?

It’s all ridiculous. Just make PRs that represent unit change, whatever the means to you. The unit functionality is independent of lines of code. Sometimes that is 12 lines, sometimes it’s 800. Yeah large PRs are harder to understand but that’s why you have tests. Also XXL PRs aren’t usually happening every day. If they are, you have a very productive team and maybe you should count yourself lucky.


But why does it take longer for 10x60 line PR's?

If it's because those PR's are getting reviewed more thoroughly, that's likely a great use of time.

If it's because the small PR's are getting bikeshedded, it's not.

If it's because of tooling, then the OP at graphite.dev has something to sell you. I'd be buying but we're a GitLab shop.

> Indeed we have 2-3x more test code than actual code. So should we be writing less tests? Merging the code first and later the tests? Merging the broken, codeless tests first and then the code?

I find that test code is rarely reviewed properly. If you want it reviewed properly, then create MR's that are just tests. You can create an MR with the change and happy path code, and then subsequent MR's with each of the rest of the tests.

If you're fine with tests not getting the same level of attention (and that's likely OK), then tests don't count as part of the 50 lines, IMO.


It's because every time I switch from coding to reviewing, there's a context switch, and every time I switch back from reviewing to coding, there's another context switch. I can easily spend 30 minutes getting back into the flow of coding after doing something else.

One PR has 1 context switch. 10 PRs have 10.

To say nothing of the cascading effects of CR on early PRs. Oh, you want me to rename a variable that I used in the first PR? Looks like I'm spending the next 10 minutes combing through my other 9 diffs updating names. That would have been a 5 second refactor if I only had a single monolithic PR.


Yeah—on the other hand, I’ve gotten PRs that are just too damn big and that means that I can’t review them without sitting down and getting into the flow just for that review. What I end up doing is, after some false starts, getting myself a block of time to review the massive PR.

Let’s say I just finished up some change I’m working on, and I take a look at my coworker’s PR. It’s 11:45, and I’m heading to lunch in 15 minutes. If it’s a 10-150 line PR, I can probably review it within 15 minutes, and maybe I can review six or eight reviews like that in a day, when I stand up to get coffee or something like that.

If I see some 700-line PR coming through, I have to allocate some focus time to that, just like I were programming. It’s a burden.


I totally agree, and this was a point of conflict between my last manager and I. On one hand, he'd be on my case for not getting some coding done, and on the other he'd be on my case for not jumping on a massive PR right away. Part of the problem I think was that we had a very bottlenecked QA process and decoupled feedback systems for reviewing and discussing.

I'd get a large PR and some smaller ones on GitHub that would likely have tests included, so I'd have to check those too, while also maintaining conversation over on Jira about some other ticket and trying to actually get code written. There was no concept iteration because every unit change needed to simultaneously be broken up by 2 week sprint AND be completely unit tested and flawless before asking someone to review it or getting it merged. Meanwhile, I'd never know when someone updated their PR or Jira thread after discussing unless I manually checked or always kept the browser tab open.

So I'd take a full morning to review a PR, then respond to things on Jira, and if I'm lucky, write some code. Eventually the PR would be updated but I'd forget to check because there wasn't a good system in-place to manage those notifications, and when I did it'd be another massive change that I'd need to take quite a lot of time to check.

So I started using Draft PRs for bigger or more complex changes, and explicitly note what wasn't functioning or written yet, which would provide for lower pressure time to get early feedback instead of trying to complete everything up front, and updates to those PRs would be more incremental.


I've found that if a PR is sufficiently small then reviewing it doesn't entail quite as severe of a context switch as reviewing a larger PR. Something about it being shorter means I can keep some of the coding context fresh in my head so I can switch back to it quicker (whereas for a longer PR I need to jettison that context in order to comprehend the whole PR).


Not the OP, but in my experience, it's because those 60 line PRs lack a lot of context, and it's not until you see the cumulative changes across those 10 60-line PRs that you really understand how they come together to form a feature--and are able to review them within that context. So instead of having one 600 line PR in which the whole picture is clear, you have to go back and forth between 10 different PRs and comparing notes between them to make sure you really understand any of them.

I still think 600 is a lot, but there's a happy medium between "minimal change that excluded broader context" and "monster PR that has all context, all changes, and eats left-pad packages for breakfast". Line count shouldn't be the goal. The goal should be making as small a change possible that is self-contained (ie all context is either within the PR or already in the codebase), and (IMO) that can be delployed to production as-is.


100%. It's missing the forest for the trees.

You can focus on those 10 60 line PRs and completely miss the bigger context. Not to mention these PRs can be reviewed by different teammates that can easily lead to missed issues within the whole scope of the change.

As always, there's no one right answer. Sometimes yeah, those 10 PRs will be better. Sometimes one or two PRs will facilitate and deliver a much better final solution with less bugs.

It really bothers me how these articles/companies try to claim "ideal" scenarios completely devoid of context. That's how bad managers will use your PR sizes to say you're doing a "bad job" despite not knowing a single thing about coding.


An ex-manager (ex-Google, too) insisted we move from GitLab to Gerrit, and mandated that all commits be a maximum of 5 lines, ideally fewer.

This led to a complete loss of all context; without that overview, you might very well be releasing code faster, but it could be the wrong solution to the original problem - and crucially, you won't know until it's out in the wild.

A preference for smaller, more focused PRs is fine - it certainly makes them easier to review independently - but I think putting any sort of limit on the size makes your team more likely to omit things like tests and also to not have sufficient context to understand the overall problem space.


> mandated that all commits be a maximum of 5 lines, ideally fewer

Yeesh. How would you ... rename anything? Imagine a linter rule which prevents referencing a given name on more than 5 different lines of code.


Many language allow an alias name. Which means you change the name and then alias the old name to the new line - 2 line PR. Then a bunch of one line PRs to change easy use of the old name. Finally a PR to remove the alias.

Which is why I don't have such a rule and would oppose it.


The attitude might have been something to do with the individual's SRE background. In that context, I understand the desire to ship small changesets that have a limited blast radius.

It's just not really practical for actually building software.


5????

Please tell us more. Did 6 line commits get rejected? How did anything get done?


Well, things did get done, but not the right things.

The rule wasn't hard and fast, but you had to have a damn good reason why you wanted to add bigger commits. Protobuf definitions didn't count, so you could get away with doing longer commits for those.

Tests were discouraged anyway ("Google just tests in production", we were told), so there wasn't a lot of that sort of thing.


>"Google just tests in production", we were told

well that sadly explains a ton of little quirks Ive seen in all sorts of google software over the years. Chromecasting functions is especially egrigious and it feels like connecting/disconnecting does a different thing each time.


This must be a typo - 50 or 500 lines maybe?

...or satire?


I wish it had been satire. This was fundamentally one of the worst parts of my coding career, and it was disappointing because the company had been amazing to work for prior to this individual's arrival.

Post-arrival, I was ordered to rewrite a working, stable PHP web app into Go ("Google doesn't use PHP, because it doesn't scale"). We weren't allowed to write RESTful APIs any more, everything had to use Protobuf.

And there was a physical fight in the office one day between two of the developers, that culminated in one of them storming out and never coming back.


I should probably also point out that the "improved" architecture turned out to be corrupting customer data for more than 6 months... which led to termination of that individual.

I'd already left by that point, after a heated conversation in which I was told "you're not half as good a software engineer as you think you are". Well, duh.

Amusingly, the PHP app is still running, still working.


That each MR be deployable is a great guideline and should trump any sort of line-number guideline. It's possible that an MR could be deployable but lack context for the bigger change, but it's less likely.


"I have found that teams who allow larger PR sizes without complaining about it can get a lot done a lot faster".

Sure.

How many times is the code touched after the PR has been merged is a better question.

Faster is not better. Better is not faster.


Weirdly, what the article claims is that it's the ideal size because of speed. So they are trying to say it's faster, therefore it's better.

Which is in my view a very silly claim devoid of any context.


The article also claims it's the ideal size because it minimizes reverts and ends up with PRs getting more thoroughly reviewed (based on # of comments). I think that makes for a more compelling case.


Not unless we know what those changes are.

Guess what, all typo commit fixes are <50 lines and never get reverted.


I wish never, definitely have been burned by a thinko in a supposed "typo fix". But I'll give you "rarely".


Haha, that's a fair point. Cries in dynamic languages

I've definitely had code with typos consistently replicated due to auto completion, then had to fix it all later on, haha.


I couldn't disagree more. My current team has a history of large PRs and everything was a dumpster fire until we started really hammering down on PR size. Prod broke all the time, reverting code was a nightmare because of how big the "units" were, code review was not thorough, problems problems problems.

> large PRs are harder to understand but that’s why you have tests

I'd rather have comprehensible, well-reviewed code than test coverage (I'd rather have both, of course). We've got a mission critical module that is a crusty piece of junk that everyone is scared to change because of how much of a mess it is, even with its amazing test coverage (frankly the code is so bad that I assume the tests are just as bad anyway).

> XXL PRs aren’t usually happening every day. If they are, you have a very productive team and maybe you should count yourself lucky

I've never seen an XXL PR that improved productivity. They pile up tech debt, bugs, and down time. I don't care how fast the devs are moving if they bring prod down every week. That is absolutely unacceptable.

On a team of superstar, perfect developers that all understand the code base perfectly, I'd probably agree with you. If that's your situation then I envy you :) But I've never been on such a team.


> everyone is scared to change because of how much of a mess it is

As a rule messy code a way more easier to cleanup in one big chunk. Teams with CR limit prefer not to touch such nests.


My team has a history of large (and small) PRs. If it's a huge feature / vertical, it's often done in a few large PRs. Very few reverts in our 5 year history.

I guess, what I'm saying is, it depends on the team, on their seniority and capabilities, on the product, the culture, etc.


Measuring the total size of the PR seems to me to be missing a dimension, namely the commit size. What if a 600-line PR consist of 10 60-line reviewable commits? Which, if not for GitHub’s broken review UI, is what I would hope is the goal for such a thing. Then you have a reasonable amount of code to look at per commit, and all in context of the larger overall change.


Refactorings and redesings could easily span 1k lines even for small projects. Yes, almost always you could make it incremental, but total lines counts would increase and it requires hard planing beforehand. Too much fuzz.


In re: to

> Reviewing and merging 10x60 line PRs is, in my experience, more time intensive than reviewing one 600 line PR.

The article disagrees, and the article has data:

> Some folks might wonder if writing small PRs leads to less code output in total. We all want to be high velocity, but there are times when you need to be high volume. Consistently writing sub-20 line PRs will have a significant impact on your net coding ability - but interestingly, so will writing PRs greater than 100+ lines. The highest volume coders and repositories have a median change size of only 40-80 lines. I suspect this is because the volume of code is change-size * speed of changes. Too small of changes, and the faster merge time doesn't make up for it. Too large, and you start getting weighted down by slower review and merge cycles.

I agree that you shouldn't be dogmatic about PR size - there's an art to engineering as much as it's a science. But part of that art might also be recognizing that "unit change, whatever that means to you" is - as you suggest - a flexible concept. The takeaway for me here is, given what the data shows, when you end up with a 600-line unit it might be worthwhile to try to, if possible, find a way to break that unit down.


> and the article has data

How does that data prove anything? The ideal PR size is 50 lines because that’s the median size on Github. Seem like a pretty worthless claim.


> The ideal PR size is 50 lines because that’s the median size on Github.

???

No, that's not why it's arguing that it's the ideal size. Where did you get that idea? (TBH I doubt it's the case - apparently the average PR size on GH is nearly 1000 lines [0])

It's arguing that it's the ideal size because (1) PRs of that size end up getting reviewed and merged the fastest; (2) PRs of that size end up getting reverted the least, implying that they're less likely to cause breakage; (3) PRs of that size end up getting more review comments, implying higher engagement and less rubber-stamping (there's a concern that this could imply unnecessary bikeshedding that slows things down, but see 1); and (4) PRs of that size end up creating the highest total throughput.

I suppose that doesn't "prove" anything, because correlation doesn't imply causation. But it gestures towards it.

[0] https://www.keypup.io/product/average-pull-request-size-metr...


> (1) (2) (3)

Which IMO is still not worth much or hardly anything at all if we ignore the content of those PRs. I would assume that most are bugfixes or trivial changes (since no tests are needed) so it makes sense that they are causing the least issues.

That’s hardly relevant if you are trying to introduce actual features which require hundreds if not thousands of new lines.


I'm probably in the minority here, but personally I'd much rather review a 300 line PR instead of 6 50-line ones if the change is a single context.

I briefly worked with a hard line-count-limit for PRs and I thought it made everything much worse. It is fine for changes that are actually small, but when you need to go back and re-open 4, 5 merged PRs in different tabs to get the full context again, the time to review increases tenfold with tiny PRs that don't really make complete sense by themselves.

I have worked with co-workers that have the complete opposite preference, though, and anything over a set amount of lines wouldn't even be reviewed.

Interesting to see the numbers on the article, however. My anecdotal experience would make me guess the opposite. I feel like work slows to a crawl once the PRs are artificially limited and broekn up like that, specially in fast moving companies and startups.


Line count (or count of any units in the PR) is completely meaningless.

I feel that in the last decade somehow there has been a loss of context of what is the purpose of version control.

Why do we bother with version control? Sure, there are many reasons.

But a primary reason is that if we discover behavior X turns out to cause regressions (or is otherwise a bad idea), we can easily revert it by reverting its commit.

That's a why a single commit should contain a single behavior change, and contain the entirety of that single behavior change.

If the change is split among multiple commits, it'll be a pain to revert it. If the change is contained within a commit that changes other things, it'll be an ever larger pain to revert it.

So that's my rule. A commit is single behavior change, all of it, and nothing else.

You can't express that in lines. Might be 1 line might be lots.


100% agree, if you have multiple commits to complete the change during development, just squash them into one for the PR.

That being said many companies (including some FAANG) use commit counts as a performance metric, thereby directly disincentivizing clean, readable commits.


Sure, and at the same time a PR can contain multiple commits. You don't have to squash and merge.


I've rarely seen cases where multiple commits per PR are a good thing. When that happens, it's (usually) that either the commits can be squashed into one logical/consistent commit, or there are too many changes for one PR.

The exception being that many ci/cd setups will squash your commits for you when merging a PR.


> but when you need to go back and re-open 4, 5 merged PRs in different tabs to get the full context again, the time to review increases tenfold with tiny PRs

You can't really say much about the first 4 PRs because if you don't know how that code is going to be used, it's kind of hard to giving meaningful feedback. By PR 5, it's too late to give feedback on PRs 1-4.

I also have worked with people that prefer that but I never understood it. Code without context isn't reviewable imo


A skill that you see in some of the more experienced devs is the ability to front-load the most important stuff in a series of changes, or make changes that can at least be understood in isolation.

It’s not always reasonable to do that, but you can often deliver incremental changes where each incremental change either brings a whole feature visibly closer to completion, or provides some self-contained utility, or makes some refactoring changes with the promise of “this will make it possible to do X, later”.


That's a talented approach, yes. But something I've observed in a couple of former coworkers is they will put together 6 PRs to the main branch where the code won't even (purposefully) build yet. By PR 6, the code functions, but is in a terrible state because PRs 1-5 were essentially pointless to review, and many of the issues you have with the code base are now firmly outside of the 6th PR's diff. I've likened this phenomena to the H.H. Holmes murder hotel, where he hired contractors to do extremely small jobs individually, and none of them realized (or at least had some plausible deniability) that they were creating a hotel made perfectly for the owner to murder residents. Except in this situation, I perceive the author was hiding the fact that they have no clue how to write code, which I don't know who they thought they fooled.


Right, in my experience large changesets often don't add a single thing, but do some refactoring, which enables the change, and then add some new infrastructure and the individual feature.

If you split it, you can verify the refactoring is a pure refactoring, not changing functionality (much), can understnad the infrastructure added and then understand the feature built on top.

Of course often things aren't developed in isolation, thus preparing the review for isolating those aspects is work again, but usually that in a side effect leads to better architecture, as you look at the parts in isolation, leading to half a refactoring, half an infrastructure and a feature hacked in.


Definitely. The good version of small PRs are small but useful changes that incrementally produce value on their own

The bad version is people adding random unused classes and methods that don't make sense until the final PR comes in and uses them


My bar is if its more than 20 files, or breaks the version control UI, which I've seen in bitbucket some years ago, then your PR is a little too much, and if its a major automated refactor of some sort, it should not contain any other code changes outside of what's necessary to make that specific change work as intended. Other changes should be their own discrete PRs as follow ups.

I just try to be reasonable about PRs. Everyone will have their preference, I just think as long as you can go back and figure out what happened, why the change was made, and do it quickly enough, then you're golden.


Lines with actual code changes aren't a problem for me, it's the automated IDE indent/spacing/bracketing that really drives me up a wall and fatigues the hell out of me.

But this might only be a problem with those of us working on legacy codebases. The kind of PRs I see in OSS projects I could review 1000 lines at a time - it's so clean!


Split the reformatting and real change. Make the reformatting change first and have a policy of merging those change fast.


Some teammates agreed to do this, but it's not known what files will be edited ahead of time, so a large job often has formatting edits anyway.


Yeah, it's really something that you have to get buy-in for.

You could always make a commit that just has the reformatting when you do start editing a file. If it turns out you didn't need to edit that file after all, then you can revert just that commit easily. And if you did, you can collect up all the reformatting commits into a PR as a first step.

The flip side of this is making minimal surgical small changes, and then doing the refactoring after the change. The key message though is split them.

Refactoring doesn't change behavior, so refactoring efforts should be judged only on "tests exist that cover the functionality, those tests continue to pass, the code is easier to maintain". It's when you add "changed functionality, so we changed the test" to review at the same time, that this becomes a hassle.


I tend to agree.

For my part, ever since I was a wee lad learning to program for the first time, I've been of the opinion that line counts themselves are a largely worthless metric for anything.

How many lines code takes up is too variable between languages, too variable within the same language, too dependent on irrelevancies like formatting, etc.

What actually matters is logical and conceptual complexity, not line counts.


How reliable is reversion as a metric? In my experience, the problem with 50 line PRs (that aren't just making a quick adjustment to something, but actually trying to build new functionality in sets of 50 line PRs) is that each change is so isolated, that when everything has to be combined into an actual working feature, the small changes aren't coordinated with each other. The PR that adds the DB migration doesn't add the appropriate columns needed to actually store the information sent from the frontend, etc. All the small things that can't be planned for in advance and are only discovered during implementation.

Those small PRs might not be reverted, but there are probably a fair number of 50 line PR follow ups that end up modifying the original implementation, and using up more total time and effort than if it was just a larger PR implementing the feature in its entirety.

I wonder how many of the large PRs they say take longer to merge and get less comments are actually feature branches into which the smaller PRs are merged, and which we expect to hang open for long periods of time with little interaction.

This analysis seems to take a set of data without much investigation into what the data actually represents and makes an immense effort to find some conclusions about the context-less data.


Reversion is an unreliable metric, because some PRs will be large enough that they NEVER get reverted, even if they should because it's easier to just fix the broken PR with another PR (about 50 lines should do it).


I’m reading this article on a break from reviewing a PR with more than 300 lines changed over 20+ files and I wonder if some people simply live in some sort of alternative wonderland universe.

The “ideals” are almost never a thing I see in my daily job, and I’ve worked for countless companies as a contractor.


Good luck.

For me, I feel like the difficulty score of performing an effective review is slightly greater than nlogn but less than n*root(n).

300 is a bit beyond my favorite limit of 200.


I file this under the category of "needless rules created without sufficient context". The author's experience is not sufficient to generalize to the diversity of development environments where pull requests are utilized. Hopefully, OP is a good listener and can compromise when their strong opinions meet resistance from other developers at their organization.


Graphite is a code review tool (to organize PRs in stacked diffs), their data probably comes from their userbase, not just their internal team.


Yes, it says as much.

> Our sample set

> All of the data-based statements in this piece are made using private and public PRs and repos that have been synced with Graphite.


Also file under “product advertisement”.


For me personally, it shouldn't be about line count. A PR should have scope of a single feature/piece of functionality. Or if it is a PR for a bug fix, 1 bug fix per PR. If its 1 line, 2 lines, 50 lines, 500 lines, I don't care. Each PR should address one single problem.


If I need make a new thing - for example, introduce an end-to-end benchmark suite for a service, it may take something like 2000 lines. If I split that up into 50 line PRs, it’s 40 PRs. I’ll also need to write a pre-read document explaining the overall design because PR descriptions for each change individually won’t be enough context for a reviewer to understand.

If CI takes 15 minutes and I create PRs one by one, I’ll need to wait a total of 10 hours just for CI, which is impossible, I’ll need to use stacked PRs. If I’m using stacked PRs, that means I do all the work up front, and then try to painstakingly split up my final code into little 50 line chunks that build incrementally. I’m sure someone will say “oh just write incremental small commits to begin with”, but that never seems viable to me. Code and design evolve as I build in ways that would make the review of the first 50 lines I commit on the project pointless since none were retained in the final product. So not only do I have to build this feature, and write 40 PR descriptions, and write an overall meta-PR-description for the change in aggregate, I also need to cut my finished code up into puzzle pieces for reviewers to mentally re-assemble.

Anyways, maybe for maintaining mostly-done features 50 lines is nice, but I’d much rather a new feature or larger change be a handful (1-3) of larger (500-2000 line) PRs possibly accompanied by pair-review/walkthrough discussion.


The correct PR is the one that solves the problem the best. The correct PR's size has nothing to do with the # of lines it is. Tracking time to merge is not that meaningful because merging code does not in itself say anything about the value of that code in production. That is the type of metric that may become a target for a scrum master or project manager despite it being fairly divorced from the reality of moving the right code into the production environment.


Lots of commenters seem to be arguing against straw men. The article doesn't say "you need to follow this exact rule on all your PRs"; it doesn't frame any of this as a hard limit or a "best practice", it's just saying that small PRs are easier to understand and review. This doesn't seem terribly controversial to me. The fact that the article actually includes data in support of this thesis is icing on the cake.

Now, yes, it can be a bit more work to split up big PRs into smaller standalone changes that can be merged in isolation, but it seems plausible that the benefits might outweigh the extra work in many cases. The whole idea of a version control system is to create a useful history of changes over time; making each individual changeset simpler and easier to understand seems like an admirable goal.


There's no context in this to say what kind of change that is referring to.

I can make hundreds of 2 line PRs fixing typos and reformatting text or whatever else low stake change that are as small as this.

So, yeah, obviously those low-stake changes that don't really do much will be quick to review and merge. Does that make them any better? Why are we optimizing for review speed alone in the first place?

It's just silly statistics being thrown around. There's no correlation in lines of code or review speeds for value added.

A larger PR that actually does something and requires more thorough review will inevitably take longer. Does that mean it would be better to have 10 separate PRs with no connection between them, different reviewers and 10x the feedback loops? This for me is missing the forest for the trees.

This honestly reminds me of way back when Agile/Scrum was the new hot thing, and my company implemented it where we ended up having 10x the stories taking twice the time.

Local optimization on these sorts of metrics are pretty much always going to lead to perverse outcomes.


Okay, let's try for a more concrete example. Suppose you're writing a new feature, and the bulk of your PR is a one-off 1,000-line function that does ten different things in sequence.

If you can factor out that big long function into ten smaller functions that are each useful in isolation, then your code is probably going to be a lot easier to understand, regardless of how your structure your commit history. Future code reuse is also easier. Testing each function in isolation may be simpler as well.

Once you do all of that, putting each of the standalone functions in its own commit is trivial, and each commit can have a more focused commit message. Reviews are easier too, since you can easily view each function and its associated docs/tests/etc separately.

Local optimization on these sorts of metrics are pretty much always going to lead to perverse outcomes.

Again, this is a straw man. Nobody is saying to blindly optimize to the metrics, or that this is a black-and-white thing where every commit needs to fit within a hard line limit. But, all else being equal, I've personally found that my code quality improves when I try to break things into smaller independent changes. I also find it much easier to review PRs where other people do that.


Much more common in my experience is a 10 line function that relies on 30 lines of new SQL schema, both of which are used in an 80 line report.

Context matters. Reviewing any one of the 3 without simultaneously looking at the other 2 is folly.


I'm not arguing that big PRs are better. I'm arguing that being smaller also doesn't make it better. Neither has any real meaning without context.

Take your same 1000 line change example, but make that change span multiple areas of the code that are related, but not immediately apparent. If you blindly split that up, you can very easily let bugs slip in even if in isolation each change looked fine.

My whole point is: there is no "ideal" size. It depends. It always depends. We should never substitute thinking with mere rules of thumb and reductionist claims.

On another point, notice you said: commits, not pull requests.

That's an important distinction. Pull requests have a whole feedback loop on top of them, and serve exactly as a way to tie multiple smaller changes required to achieve one meaningful change. That means most of the time we can get all the benefits of small code changes and the benefits of the full context by having a slightly larger PR and separating changes by commits.

> Again, this is a straw man. Nobody is saying to blindly optimize to the metrics

Maybe. But this is an advertisement blog post, made for a company that targets managers that will say "let's buy this and it'll make our developers faster. It gives me a pretty graph to say who's good".

That's why some of it might look like a straw man at first sight, but if you take in the context in which this blog post was written, it's a valid argument in my view.

It's way too easy to blur real code concerns (splitting functions, naming, separation of concerns, etc) with silly meaningless numbers (PR size, line counts, PR numbers) that managers use as a proxy to actually knowing the subject matter.


I wish. My small PRs die by a thousand bikesheds. My large refactors are followed by LGTMYOLO


LGTMYOLO is the inverse of the bike shed.

Same rat brain muscle that made me buy an extra 200,000 dollars of stuff I didnt really need on my home, but I spent 6 days at Home Depot fighting over which color of mustard to paint the guest bathroom


You missed your parent's point: LGTMYOLO is the response to large PRs; where small PRs get bikeshedding.

And this is the classic theme behind bikeshedding. People can understand that small PR at a glance, and there's always multiple ways of doing everything. There's a perverse desire to demonstrate productivity, and nagging at small PRs is a way to do that without a lot of effort; ironically reducing actual productivity, see: Cobra Effect.

That big PR though? Saying something intelligible about it requires serious effort. In a low-skill environment, nobody will touch the PR for fear of demonstrating that they don't know what they're talking about.

I disagree with the premise of the article. The ideal PR is a well-encapsulated unit of work. Sometimes you need to fix a typo or handle an edge case, and you make a 1-line PR. Sometimes you need to add a new module and you add 3 thousand-line files and put little hooks in 30 more files.


That's exactly what happens. At some point a PR gets too big and the only thing you can do is run the tests, try the functionality, and LGTMYOLO approve it.

You leave it open longer in hopes someone else will look at it, and maybe one or two people look at places they've had issues in the past, but it gets through and blows up spectacularly and those get fixed.

Anyone who says their development is not the above is probably confused or lying ;)


I've participated in month-long reviews where nobody's time was wasted. That's extremely rare, mind you, but if you're accustomed to low-quality code and a "break shit fast" attitude, you might be inclined to mistake that for a universal rule of software development.

In fact, if you put quality first, the "only thing" you can do with a long PR is actually do the work. And the more accustomed you and your team become with actually doing work, the more productive you'll be.


In this thread: software engineers all of a sudden put some value on counting lines of code.


> 50-line code changes are reviewed and merged ~40% faster than 250-line changes.

Reviewing a 50-line code change in 60% of the time it takes to review a 250-line code change means the shorter code review takes _four times_ as long to review per line.


Also splitting the 250-line change into 50-line parts will probably involve more total lines - to ensure it can be done piece-by-piece then fit together later.

So not only is it more time per line, it's also more lines.

Basically the only benefit is PRs per day, which is... not useful in itself.


There are two ways of looking at this, I think the author took the positive way, and you've taken the negative.

You could argue that shorter PRs are easier to review and therefore people are doing more review, and that should result in better changes (indeed the 15% fewer reverts supports this).

Or you could argue that shorter PRs are harder to review as they have less context, and therefore more time is taken but that a better output is not necessarily being reached.

I think both of these are entirely plausible, and the truth probably depends on other process factors.


It is also going to be more reverts.

If you split 200 lines of code into 4 x 50 you are actually going to have 0.85 x 4 reverts or 3.4x reverts with this strategy.

But overall the whole data is terrible and infuriating because likely the changes are very different in the dataset depending on the loc.


To say it another way, you should not expect more code to be merged in the same amount of time as less code. 250 lines is 5x more code, so what is the purpose of a statistic showing 1/5 of the code gets merged faster? That's not a good or a bad thing.

Unless the metric is actually 40% faster per LOC but that's not specified.


Which seems to indicate fairly clearly that, at least as far as time investment, 50 line PRs are objectively worse than 250-line PRs.


Just my comments are 50 lines, but thanks I'll never stop finding these "functions must be this long and classes must have that many properties, and PR should be so many lines" etc. rules hilarious.

It cheers me up that people so naive exist.

Size is contextual to the thing the PR regards. Maybe it's one char. Maybe it's 500 lines. Maybe it's 10k lines. Screw this.


Seems like a skill issue, honestly. I feel like many of these arbitrary rules end up being made to account for low skill developers to curb their behaviors which seems like the wrong way to tackle that problem.


Precisely


I just think the people who whole-heartedly believe in such "best practices" (and much more so the people who succumb to them) help justify my salary.


If someone sends me an multi-thousand line CR, I'll review it and give them impactful feedback. I may miss some details, but I guarantee I'll miss close to all of the details if that is split over 40 CRs. I wouldn't want to work with someone who has time to review like that unless it's a very niche domain.


We merged a PR a few days ago that was 40,000 lines added and about 13k lines removed. Tbf, there was quite a bit of config and generated code in there, but the actual changes were at least half of that.

It was tested in smaller chunks in a feature branch, then the larger feature was merged. There was some testing done on the huge feature branch, but not everything that was present in the smaller PRs.

In our case, this new "feature" was a complete overhaul of our business logic engine. Over time, we had essentially developed 3 or 4 different API versions for various business tasks, with a lot of logic being duplicated between them. We want to bring it all under one umbrella, and after about 8 years, we think that we have a good idea of what sort of abstraction can work for the next 8 years.

Now that the engine is merged, we need to start the grueling task of actually moving logic from the old APIs into it and removing them from our codebase. We spent about 6 months (between 2 devs) writing the engine and I suspect it will take 2 years to move all of the logic into the new engine.

Idk where I was going with this comment anymore, but needless to say, I don't often see 50 line PRs at work!


This is one of those cases of where we're focusing on the wrong metric for a very generic problem.

There's too many facets to what makes a PR good that boiling it down to line count is silly.

Ex. Small code changes are required if the solution requires it - forcing 5 lines change to 50 lines would be considered wildly bad idea. On the other hand a 500 line PR to solve one problem suddenly broken down to 10 PRs makes the review far harder depending on the context.


We went through a relatively controversial process w.r.t. enforcing this at my last co. What it really came down to was intent vs. impact - the intent of the change was to enforce better development practices, especially across junior engineers that had shipped monster prs containing multiple bugs that were extremely hard to catch ("the easiest way to ship a one line change is in a thousand line pr").

This worked reasonably well in terms of forcing junior engineers to reason harder about what an atomic change should look like, but also impacted senior folk that were making atomic changes that encompassed broad areas of the codebase. We eventually reverted the change and relied on implementing CODEOWNERs to enforce senior devs to request changes when PRs got out of hand.

There's definitely nuance to enforcement of best practices across an entire organization; haven't really seen it done well although I've primarily worked at startups.


PR line count is such a silly metric. For example my recent PR +600/-900 lines. What you can deduct from that? Is it complex?

Changes were quite simple. 600 common lines are about rearranging big module into multiple smaller modules. 300 removed lines are technically could be in a separate PR, but it's some cleanup aftermath related to rearrangement.


If you change enough at once GitHub gives up and displays "infinity files changed". Likewise for lines. I've gotten such a PR merged in a finite amount of time. Thus, by GitHub math, infinitely large PRs merge faster per line changed than any other size.


What a weird claim.

PRs that are <50 lines are more often than not documentation, typos and small fixes in my experience.

Most real value add PRs end up bigger due to tests, documentation changes, and the actual change.

This for me is the typical claim that'll become an example of Goodhart's law when implemented. Random people that are not engineers will use this to claim "your PR should be 50 lines long" as some sort of absolute truth, when in reality the number of lines of code is absolutely irrelevant most of the time.

Context and details matter. Sometimes multiple small PRs are better. Sometimes a larger PR is better. It always depends. These absolutist claims that "this is better" is what leads to atrocious management practices that always end up getting summarized to a number, losing all of its meaning in the process.

Had we not we moved on from the lines of code metrics already decades ago? :facepalm:


Large and small PRs aren't the only options.

For a large PR, I usually regroup my changes into smaller unit commits which can be reviewed commit by commit.

For an even larger (or more complex) PR, I'll make a set of stacked PRs which can be reviewed in pieces, each with a good description of what/why/how stuff is happening in that piece. Once all PRs have been approved, the lot of them can be deployed together.

I've at times had 3 PRs for a total of 3 line changes, because they were for a single table's schema migration picking new indexes and each had many factors to consider. The PR descriptions were of course much longer.


ITT: company that sells product for breaking PR into smaller PRs claims smaller PRs are better.

Not saying the claim is necessarily wrong, but it's also exactly what I'd expect given the source.


This all depends on whether you believe in merging unfinished work to master, or whether you believe features should be completed and tested before merging. Personally I'm in the latter camp, but some of the big tech companies seem to have a style of merging commits representing partial progress toward a feature.


I personally mostly care about changes per commit. But Bitbucket is terrible about showing commit context to reviewers. And apparently GitHub is not any better.

Showing some context per commit makes all the difference when I decide to do a first commit which fixes all formatting in the file that I’m visiting (for the real/substantive change).


I'm seeing some pretty surprising claims in this thread about, "I prefer reviewing larger PRs," and, "Teams that make larger PRs go more quickly." That all seems like rubbish to me. I've seen data indicating smaller PRs having lower error rates and such and then we have this article with data recommending a 50 line PR size. I've *never* seen data show large PRs are better except for some developers saying, "I prefer it that way."

Anyways, I don't so much care about the size of a PR as much as I care about how understandable it is. That generally ends up being a few rules of thumb like:

1. If you change a common function name, change it everywhere but please don't do anything else in the same commit.

2. If you need to make a change in code you are depending on to accomplish a goal in your own code, try to break the change to the dependency into another commit. There is a little bit of an art here to both understand how to break things up and to not break them up too small.

3. Try to make your history meaningful rather than just having a bunch of random commits like "fix", "forgotten file", etc.


Yeah. Last team I joined I started doing reviews and immediately I was dealing with, like, 500 line PRs. They’d do things like add three completely new API endpoints (one of which is broken), or set up an entirely new service with all sorts of associated infrastructure.

At first I faced a lot of pushback from team members when I wanted them to break up the commits. So I caved. I just tried to give the feedback that I could give, and approved the PRs when they were “good enough, I guess”. They kept breaking in production and it would take a week to fix them. It was hard to tell what, in the commits, was broken. They were just so large.

And it’s not like these PRs were getting written quickly. People would spend two weeks or four weeks working on one PR, because they wanted to make one massive change that did everything and closed out a ticket.

At this point, I’ve made inroads and the team is sending out much smaller PRs, much more frequently. The PRs are getting reviewed, merged, and tested within an hour, and one developer can submit three or four PRs if they’re being productive that day.

I think the problem is that the cognitive load of making large changes is superlinear. A 2x larger change is not 2x as hard, but maybe 2.5x or 3x as hard. With large PRs, you spend way too much time just trying to understand what you are doing, as an author, or what you are looking at, as a reviewer.


I too am very surprised by what I'm reading. For contrast, here's a recent thread of people explaining how beneficial it is to work in small commits: https://news.ycombinator.com/item?id=39187487


I always try to care about the reviewers. Not too large PRs so they don't LGTM it instantly. Not too small PRs so they don't have to switch contexts multiple times a day or search through many smaller PRs to get context why new PR changes specific lines.


sounds like the ideal PR includes no testing


Had the same thought reading this, maybe it’s because I work in OSS and write tests with changes and fix tests for fixes that weren’t caught. That adds 20-40 lines very fast alone. Guessing that is not common.


This has never, and probably never will, be anything more than a theory. In my experience this is only true if:

1. The "unit of work" is small enough to decompose into a 50 line PR. This is possible on larger codebases and less verbose languages. It can enforce over-DRY code which often leads to a lower universal ability to understand what is going on.

2. The team responsible for reviewing is responsive. I have never worked at a company where PR reviews happened quickly. Everyone is busy, everyone is overworked, and often times PRs are a chore that can be put off until later. Of course you can make some labyrinthine chain of small PRs all merging into one big PR but the problem still exists. I'd personally rather review a 300 line complete PR written well than 6 PRs that I have to constantly recall context on.

This advice follows the same logic as the asinine cyclomatic complexity rules. Yes, in theory they are sound. In practice, it's more of a flexible benchmark rather than a hard rule. But since someone inevitably writes YASL (yet another stupid linter) that enforces these types of things, an enterprising work-focused engineer will end up spending either more time fixing code to please the linter or more time exploiting loopholes in the rules.

Just write Good Code (TM) - whatever that looks like.


Forget about refactoring then


FTA:

  Once PRs start to exceed 10k lines of code, they seem to become slightly “safer.”
  I suspect this is because the extreme end of PR sizes includes refactors, which
  maybe start including less functionality change and therefore have a slightly
  lower chance of breaking. Alternatively, engineers may become progressively more
  reluctant to revert PRs after 10k lines because of emotional anchoring and merge
  conflicts.


I would speculate that PRs of > 10k line changes are more likely to be deletions of entire modules or assets that were dead code already.


Big massive refactoring PR's have killed many projects. Sometimes you can't, but if you can split a big refactoring PR into multiple commits it usually works better. The first PR introduces a feature flag that lets you pick between old and new code.


So if we fix a > that should have been >= we should add 49 lines of comments explaining this? Or perhaps the PR is doomed to never be ideal due to being short.


I think this might just highlight how simple changes are indeed simpler instead of showing that slicing one big PR into severall smaller ones is actually good.


Making pasta is easy. Eat pasta 3 times a day, every day.


Counterclaim: the ideal PR is the one that's never opened. Instead, have high test coverage and do trunk based development.


Claim: the ideal PR is exactly as long as it needs to be to achieve its goal.

The same goes for functions

Please stop listening to this garbage advice.


Still valid disregarding the verbosity or lack of the computer language of that PR? Where is the proof of that?


This is a fine example of how misguided measuring for the sake of measuring really is.


> If what you care about is maximizing the amount of engagement and feedback on your code over the long run, you’re best off writing as small of PRs as possible.

> The highest volume coders and repositories have a median change size of only 40-80 lines.

This claim is rational under the Facebook axioms, so to speak.


It's times like these when i question the quality of HN or how can such shit get so many upvotes?

> We flew down weekly to meet with IBM, but they thought the way to measure software was the amount of code we wrote, when really the better the software, the fewer lines of code.” — Bill Gates


Your ideal PR has no tests?


OP needs to discover the <acronym> html tag or define their acronyms imo. i thought they were talking about press releases. sheesh


The response to seeing 10k lines of code in a PR is not to skim it, that is harming one's company. The response is to request the author to justify its size, and/or reject the PR and request that it be delivered in smaller sizes. If the author cannot, then the author needs their responsibilities reduced and to receive additional mentorship.

Skimming a PR is pure cognitive dissonance, dont do this and complain when your app turns into a dumpster fire.


I would kill myself if I worked somewhere where they actually did this type of analysis. I cannot wait for AI to replace me and I become homeless but free from a horrible choice to get involved with computers many years ago. Damning me to these type of just asinine bullshit conversations and posts. I wish I had made different choices.


Agrees with gut check.


Just use your brain to PR a feature or some sub-piece of a feature instead of this arbitrary shit.

Who wants to spend hours bothering with hardline process - you’re the programmer, you’re going to be the guy answering any questions on your pull; therefore you’re going to have to look at it yourself.


PR - pull request


These types of arbitrary rules are why projects end up depending on deprecated packages. Nobody can do it so everyone whistles around a slowly smoldering dumpster fire.


Correlation is not causation. What if changes that are inherently more complex — which take longer to review, are more likely to have bugs, etc — typically require 250 lines or more?


... or is constrained to doing a single logical thing (perhaps many times).


Typical rookie stats mistake of confusing correlation with causation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: