Hacker News new | past | comments | ask | show | jobs | submit login
I gave commit rights to someone I didn't know (2016) (davis-hansson.com)
382 points by love2read on May 30, 2023 | hide | past | favorite | 159 comments



Counterpoint: gorhill and uBlock. IIRC, gorhill (Raymond Hill), creator of the popular uBlock adblock extension, wanted to step down and hand the reins off to a contributor. The contributor then promptly removed all references to gorhill and started charging for the plugin and turning the extension into an affiliate marketing product.

This reddit comment covers it pretty well: https://reddit.com/r/ublock/comments/32mos6/_/cte0a3n/?conte...


In my view, browser extensions are the biggest example of projects where this approach is a terrible idea. Most of them have enormous access to their users' browsing data and it's easy for them to inject ads, and so there's a lot of incentive for bad actors to try to take over popular extensions. From what I understand it's fairly common for publishers of popular extensions to get offers to purchase them.


For what it's worth, access to user data is what Manifest v3 tried to fix, albeit with limited success (and thus support).


There's no compelling reason to do that with an open source project. A link to someone else's fork and a note on what you're doing is good enough. You don't need to give anyone else the project's brand.


i'm interested in this but that link doesn't work, what happened with ublock?


The link works fine for me. The quick version is that when gorhill handed the repository over, the new maintainer immediately took actions that didn't exude confidence, so gorhill forked it into uBlock Origin, which he still maintains and is the one you should be using.


"exude" isn't the right word here. In this context "exude confidence" mean "exhibits a high level of confidence".

"Instill" or "inspire" would have the meaning you're going for here.


Looks like style to me.

Like saying "their actions weren't exemplary" when you really mean "their actions were bad"; maybe a British-English style?

When one uses this format, like in the phrase "[something] wasn't ideal" (eg 'I crashed my car and missed my own wedding, which wasn't ideal') the claim to less-than-perfection is really an implicit statement that the object/situation was the opposite of perfection.


It's deeper than style.

Presumably the new maintainer took some rather bold actions which would indeed suggest they had a high degree of confidence themselves, confidence in their own actions. But those actions did not inspire confidence in the product, among the user base.


Saying the new maintainer did not inspire confidence is an understatement. (Since the new maintainer destroyed the trust in the project.)

Saying that the new maintainer did not exude confidence is a completely different meaning.


Hey shkkmo, "isn't" isn't the right word here. In this context, the right word is subjective and everybody else understood what they're saying so the problem is with your perception of what their meaning was, not their comment!


Language does change over time and trying to ignore that after it has happened can make you a pedant. However, words still have specific meanings, even if they can change. If you decide that "isn't" has a totally different meaning than the established one and start using that new meaning without explaining it to anyone, you'll just confuse people and fail to actually communicate. If you have a group of people that all use that new meaning of "isn't" then you can use it among yourselves without clarification but should still be aware of the potential for confusing others.

Generally, when people use words they are trying to communicate something so they try to use meanings that their audience will understand. If you can clarify an incorrect use politely and without condescension, it is often appreciated and helpful.


Good to know, thanks


Try changing the link to use old.reddit.com


I know this is not point of the article but:

> The PR was bigger than what I felt I could sensibly review and, in honesty, my desire to go through the hours of work I could tell this would take for a project I no longer used was not stellar.

The PR: https://github.com/django-money/django-money/pull/2/files?di...

Do others share this sentiment?

This doesn't look like a particularly big PR to me, judging solely by the amount of code changed and the nature of the changes at first glance.

Are most of your PRs at work tiny, couple lines of code at most? Am I sloppy for not even consider reviewing this for "hours"? Are all code bases I have worked on sloppy because features often require changing more code than this?


When I wrote this, almost all my engineering experience was in OSS database development. That environment has several forces pushing towards very detailed reviews and clean commit histories, like others have hinted at in the thread here.

The PR review is in public and heavily scrutinized by paying customers and passionate community members. APIs cannot be broken, and even with automated tooling it's very easy to accidentally introduce a change that breaks tens of thousands of deployments. And the code itself is really sensitive. If a bug gets in and released, it can be several days of grind to get a patch out, and after that many months of new tickets for the bug from customers that won't move to the latest patches.

Now I work somewhere where the code I write runs in-house. If a bug sneaks in, it's usually a 5-minute redeploy to resolve and the cost is borne primarily by my own team.

So I think the answer to your question is: it really depends on the environment you're writing code in. In some setups the cost of introducing mistakes is very high, so it makes sense to pay a lot at the review stage; in others the correct balance is less strict review and fast fixes/rollbacks instead.


That makes sense, thanks for the clarification.


The PR isn't very large, diff-wise, but IMO it isn't well made, which makes it hard to review.

Half of the commits are merges from other fork branches into the contributor's master, and the PR name and description doesn't mirror that in the least.

Then (eyeballing) 90% of the diff is whitespace changes, which would be fine in its own PR ("Formatting changes") because it's easy to eyeball that it's just that, but when you mix it with other changes, it's hard again.


FWIW, whitespace changes can easily be filtered out in Github's PR view.


It's a mistake to filter out whitespace changes on Python diffs. These days it's best use `Black --check` or similar in CI to make sure they've matched your whitespace settings, to minimise these changes.


Sure. The link above is in "hide whitespace" mode (`?w=1`), and there are still visible whitespace changes (new or removed whitespace lines).


Like 3 of them. Is that problematic enough that you cannot review the rest of the code? My eye just slides over them.


I am working on a GitHub pull request viewer that displays changes using a semantic diff, and therefore has some more advanced whitespace handling behavior than just ignoring leading or trailing whitespace. I tried it with this PR:

https://app.semanticdiff.com/django-money/django-money/pull/...

It doesn't make a huge difference, but it filters out changes like the added line break in "if value: value = str(value)" nicely. I haven't announced the project yet, but maybe someone will find it useful :-)


I don't think this was the case in 2016 though.


They added it to the UI in 2018 but you could do it via URL since 2011.

> A diff view with reduced white space has been available since 2011 by adding ?w=1 to the URL.

https://github.blog/2018-05-01-ignore-white-space-in-code-re...


Also, Git had it since forever, and since GitHub's diff view isn't particularly convenient for browsing multi-commit PRs you usually review the changes using Git anyway.

That said, I'd ask the contributor to tidy up the branch first. It's kinda disrespecting to ask others to review branches in such state.


After a brief scan I’d call the full change reviewable enough I could do it in a sitting. Most of it looks reviewable on my phone. But seeing >30 commits, I’d pause. Partly because I’ve become a lot more sensitive to the impact of commit history itself, partly because the quick scan of such a small change set doesn’t seem to line up with so many commits, but mostly because it implies much more context exists than the attention I’d pay if it came pre-squashed.

That kind of implication stops me in my tracks to learn more. I’ve spent literal days tracking down the meaning of single line code changes through multiple dozens of commits, sometimes across repo boundaries (ahem the original author’s suggestion of deprecating in favor of a fork comes to mind).

The size of this particular PR only becomes a factor when any one of those numerous commits can become that rabbit hole. How many humans’ days are going to be spent tracing history through this particular merge? For how many different reasons? I didn’t even look at the changes midway, but how many nuances are buried in there and lost unless this weird bundle of changes is preserved?


I bet you're other thinking in this case. In general, the expectation on GitHub is that PR commit history doesn't matter, and owners should simply squash on acceptance. I think most contributors don't event consider that all their commits are visible or would be of interest and only think about the final product.

It's certainly simpler for the contributor to do the squashing, but when GitHub makes it so simple in practice it doesn't matter.


People wonder why Linus Torvalds dislikes github.

Imagine writing a highly performant and featureful relational database and successfully using it with large projects for a while without the database itself becoming particularly popular and then having a company come along and popilarise your database by telling lots of people about how good it is as a flat key value store.

Then people are really confused and annoyed as to why their key value store has this complicated and confusing relational database attached to it so they write lots of guides skimming over the details to help people get better at using the database to just store keys and values in one table.

If I was Linus I would be pretty pissed too.


That story strikes home for me. I worked at a company years ago where I designed and implemented a custom event queue like architecture using lua scripts in redis. The data would eventually end up in another database, but it would start its journey by being sent to redis. Once it was in redis the application servers considered it to be "committed".

Of course, sometimes bugs showed up in our system. One of the engineering leads would often say "oh lets just clear the redis cache". Every time I told him no, and once again slowly explained how we weren't using redis as a cache, and how deleting everything in redis would delete user data and be a terrible idea. He would have this far away look in his eye while nodding along and pretending he understood. I guess in his mind he was just thinking - why on earth would it be unsafe to clear the "redis cache"?

Months later I went on holidays. They ran into some bug. He reacted by wiping everything in redis. And, predictably, all hell broke loose. User data rollbacks happened, which caused cascading failures in the UI (which assumed that rollbacks would never happen). The team ended up reconstructing some lost data from some JSON which accidentally ended up in web request logs. Users had downtime as the whole app broke. It was a disaster.

When I got back to the office, I was hit with some strange combination of "why weren't you here" and "why didn't you tell us". Ugh. I still think about it sometimes. I have no idea how I could have handled that better. But I can tell you one thing for sure: I lost a lot of respect for that engineer.


As far as I understand, Linus doesn’t write code. He reviews code other people have written, and even other deputies review the code before it gets to him.

Additionally, many people are paid to work on his project by other companies. Linus doesn’t pay them, yet he’s their boss.

All of this is to say that Linus is very insulated from externalities. He can insist on his platonic ideal of a commit and SCM, if it makes his life easier. He’s like the editor at a publishing house, rejecting countless manuscripts yet never writing a word himself. And that’s fine.

However, most people do not use an SCM like Linus does. If you’re maintaining an open source project on GitHub you’re probably working for free, as are the people submitting PRs. The more difficult you make their lives, the fewer people will be willing to submit PRs and the more work you’ll have to do eventually.


It's really irrelevant what Linus does now or what your personal opinion of version control is. The point is, git does certain things well, and a large portion of its users don't seem to need or want it. Not it might be because those users don't understand what they're missing out or whatever, but git was written by Linus from endless experience dealing with both contributing and maintaining code. It's simply wrong to suggest that Linux has no idea what he's talking about from a contributor perspective. Moreover, there's still plenty of people, who actually understand git, who are happy with the workflow that it was designed around. Yes it's a complex tool, but it's not as nonsensical or confusing as the people insisting on not using it to its full extent seem to always claim it is.


I disagree pretty hard with this – for instance I've recently needed to dig into the code for the Gradio library, and when PRs are like https://github.com/gradio-app/gradio/pull/3300 (and the merge commit's message is what it is) it's hard to understand why some decisions have been made when doing `git annotate` later on.


I don't think you're in disagreement. Parent is saying that it is customary nowadays to have crap commit history. You're saying that crap commit history is problematic from a s/w engineering perspective. Both are true.


I’ve worked on a project with devs that write crap commit messages. It’s far easier to just commit to squash merging and requiring good PR titles and descriptions (which become the commit message following the squash).

Especially when you have multiple people working on a shared branch rebasing can be quite painful. The most common example of this is when sharing a branch with a QA.


I disagree, commits are mehh

Pull requests, patch notes, documentation and comments should be source of truth

Git is not project management tool, it just manages my letters history.


Curious use of `annotate` instead of `blame`. Former user of cvs, bk, hg, bzr, fossil, etc?


My favourite tool for this is the blame command in Magit (the Git client for Emacs). You can cycle between styles; one shows the commit message as a kind of header in-line with the code for instance. Another just shows a faint rule between lines that were changed in different commits. Then, one can press return to show the specific commit that the line was changed in. Well worth a try if you haven't already!


No-blame culture :grin:

(Also, JetBrains tools use "Annotate".)


> In general, the expectation on GitHub is that PR commit history doesn't matter, and owners should simply squash on acceptance.

This varies greatly from project to project, and is by no means a general expectation.


One sitting is a lot for a project you've forgotten about, lost context on, and are no longer excited by, especially assuming you have a lot of other stuff going on in your life. And then ya, everything else you said. And there are no tests.


This is it entirely - imagine suddenly being asked to clean and organize a house you haven’t lived in for ten years. It may take you a day just to get back up to speed on the code.

(Interestingly enough Knuth praised literate programming for TeX as the reason he could get back in and fix bugs after an almost ten year hiatus - where parts of the code he had not looked at in 40 years.)


Some professional cleaners do indeed do their job on properties they might never have seen before; maybe the programming equivalent is the external consultant who is expected to quickly identify bugs in unfamiliar codebases.


>> I’ve become a lot more sensitive to the impact of commit history itself

This was something @whitequark taught me. Having a clean commit history is very important when looking back, and we we look back more often than most people thunk. Self contained commits with good messages is important. I'm still not great at concise descriptions but trying.


+1 to this. I let my team pick one: very small PRs that get squashed when merging, or bigger PRs with well-separate commits and linear history that get merged normally.

Having a bunch of “fix” and merge commits in the main branch history is terrible.


So with libraries that are consumed by third parties, it's rarely about the number of lines of code. Since you don't really have a definitive list of all usages of your code, things like the changes to `_money_from_dict` making things nullable mean you have to consider all ways in which that might blow up.

And like "the public API is the public API, the private API everything goes" is easy to say, but it's so easy to just break other projects with these kinds of changes. This makes it very hard to move forward with certain kinds of "bug fixes" in projects.

That being said, it's not that that PR is "hard", but it's hard to say "merging this is A-OK" instantly. Hours? I dunno, but I would definitely add some tests and try to figure out how to write code that breaks with those changes.

If I were managing that project at the time, and had motivation, I'd definitely do a lot of cherrypicking, get all the "obviously won't break anything" changes merged in, to leave the problematic bits in for a focused review. Again, this all might be under an hour, but sometimes you look at a thing and are like "I don't really want to deal with this, I have my own life to deal with."

At a higher level, the biggest problem with these kinds of libraries is having the single person who can merge things in, who can hit the "release" button. There's a lot of projects that interact with Django that have heavy usage, and survive mainly thanks to random people making forks and adding patches that properly implement support for newer Django. At $OLD_JOB we used a fork of django-money (including a lot of patches to fix stuff like "USD could be ordered with JPY", pure bug generators). It was very easy to add patches because, well, we had our usage and our test suite and no external users. It's great, but it's also important to try and get patches upstreamed when possible (and we did for a lot of projects).


Echoing your experience, I too worked on a project that was stuck on angular.js 1.3 when 1.6 was about to be released. We took a look at some migration guides and didn't see that many changes we had to make, so we tried updating it.

Everything broke.

Investigating, I realised that one of my long departed predecessors forked angular-bootstrap and made a few small changes to it. The problem was that the that library was tied to angular.js 1.3. To update angular.js, we had to update the library. To update the library, we had to remove all the changes in which would break large parts of our UI. The project was already in maintenance mode by that time and we decided to just leave it as is. I spent the next month converting it from coffeescript to es6.


I had exactly the same thing last year with a previous long departed staff member having forked django-flex-fields into their personal GitHub and having made substantial changes. Porting to Django 3.2 then became a huge and costly project in itself as a result.


>So with libraries that are consumed by third parties, it's rarely about the number of lines of code. Since you don't really have a definitive list of all usages of your code, things like the changes to `_money_from_dict` making things nullable mean you have to consider all ways in which that might blow up. And like "the public API is the public API, the private API everything goes" is easy to say, but it's so easy to just break other projects with these kinds of changes.

Strongly typed languages, a well-designed API, and senantic versioning can make this problem largely disappear.


I like typing and am a big fan.

If you fix a bug, this can break existing code. This is a fact of life. Changing performance characteristics can generate downstream problems! You have to consider this stuff seriously.

Here the change introduced nullability. In another universe the function would already be nullable but the conditions in which a value becomes None changes. A spec can be changed, for the better, and still be bug generating if people just upgrade. That is not captured by most type systems, and there aren’t that many great production web apps running on Idris.

But ultimately the reality is that people have a lot of flexibility with Python projects in general. It’s great, and libraries that are aware of this, well… they write it in release notes. They also have open repositories to enable actual code diffs. It’s non-zero amounts of work but it’s there.

There is a theoretical universe in which a static language with well-designed libs provide good aesthetics to make developing certain software easy. Meanwhile even as a big functional programming lover I still reach for Python because I can get work done because the libraries are in fact well designed, and the code is easy to work with, and I can fix issues quickly. As a user it’s great, as a library maintainer I gotta apply some more care. Could be better but it's alright


Largely, but not quite all. E.g. there's only so much information about runtime behavior you can capture in types, and then almost no one captures even half of it in practice. Doing this right requires a lot of experience, plus good knowledge of the problem domain, and it tends to both bloat and ossify the code.

It's easier if you're doing version 2 of a well-tested, scope-limited library, and thus can afford to do some holistic design. But the trend in our industry is to develop iteratively, small releases done often, everything kept at 0.x.y perpetual beta, and on the off chance your project grows old enough to warrant 1.x.y version, it has so much evolutionary baggage that you'd have to rewrite it from scratch to get proper typing in (which, of course, is against the zeitgeist).


How so? The API still breaks if it's strongly typed, the breakage is just easier to spot by downstream consumers.

And note that saying "release as a new major version" is not making the problem disappear, it is simply choosing not to solve the problem, but with a warning on top.


That has not been my experience. Nor Hyrum Wright's (of Hyrum's Law fame [1]), nor Randal Munroe's [2].

Note how things like performance characteristics leak through strong-typing, well designed APIs, and semantic versioning, in spite of non-guarantees around such characteristics.

1. https://medium.com/se-101-software-engineering/what-is-the-h.... 2. https://xkcd.com/1172/


One of the stories shared with me while working on OpenJ9 was of a customer that used stack overflows as a scheduling mechanism, and at some point they were compelled to update their JVM to a version that incorporated tail-call optimization, which caused their Rube Goldberg scheduler to stop working (by spinning indefinitely).

I would however argue that the existence of a user relying on behaviour that has explicitly been reserved as subject to change must not preclude development from rendering improvements to products. At some point the consumer needs to be on the hook for relying on a positive externality that they do not have a right to.


According to the article, the author no longer used this project. It was no longer front of mind. There is a massive difference between reviewing a PR when you are actively involved in a project and reviewing one when the project is in your past.

In that case, it’s perfectly reasonable to spend a couple of hours getting back into code you wrote a long time ago. If anything, taking that time is a big win for overall project stability.


It's not too bad, but in the context of reviewing for a project in your spare time, it can be draining. I totally understand why he didn't feel like doing it.


I've committed bigger just with a "minor changes" -message :D

But seriously: that code seems to be touching bits that really should have automated tests attached. If the tests pass, then I would feel more comfortable accepting the PR.


Depends on the bits they don't do. I get contributions from time to time. Yet can look small , but I need to add tests, documentation, and so on. It can easily take an hour for just one method added, or whatever.

If the code is just fire and forget, then fine. If it's part of a bigger system with rigorous standards then 300 lines can take a day or more to "merge" in.


It's certainly not hours of work to review it... or maybe it is since it's financial? Either way, "it was in the script" as my wife and I say about corny movie moments. It made for a good article.


Critical Drinker has a phrase for nonsensical things in movies: "X occurred... so the movie can happen".


>This doesn't look like a particularly big PR to me, judging solely by the amount of code changed and the nature of the changes at first glance.

Well the author specified that this was their subjective take on it.


For this PR in particular, seems like a lot of it is formatting changes, so you’re right, may not have been a big deal in practice. But I wouldn’t take the statement so literally. For an open source project, every PR can contain unbounded toil for no pay.


It isn't that big at all, but when you maintain something you really don't want to maintain, you're over anything but the tiniest changes such as updating the copy. Ask me how I know. lol


One issue that makes it more inconvenient is there's no CI set up for this.

Some typo could break everything.


That's literally 300 something lines. I'm buffled, 5k PRs aren't that rare at work.


You genuinely review 5k LOC PRs?

If I was doing a proper review of a PR that big and making sure you understand how everything works that would easily take multiple days and generate 10s-100s of comments.

In reality I would flat out refuse to review it. Even 1k is very large without being broken down.

The only time I see PRs that big at work are cleanups (E.g. deleting whole directories), automated linting changes across the whole codebase and large structural refactors (E.g. changing directory structure).


Almost every time I’ve reviewed a 1k+ LOC PR, even if it’s from a really experienced and good engineer, it has introduced a bug. Obviously I’m not gonna say it at work, but changes that big are too hard to properly review and consider all side effects and gotchas


It's also more difficult to bisect to find the actual bug. Small commits have a lot of advantages.


What is being changed matters.

To use an analogy: if you wanted to reupholster my car seats, or spice up the radio panel, sure knock yourself out - I'll come check when it's done. But if you were to as much as think about tightening or loosening a single screw anywhere near the engine block, believe me I will be paying very close attention to what you're changing.


That would be a reason to never accept the PR; not to auto-accept it.


I’d hate to work there. I’d rather review small chunks and merge often than review 5k lines that could be built on a shaky foundation/idea


Best team I've ever worked with. 5k was on the larger side, sure, but technically difficult tasks often just can't be split into a series of smaller pull requests.


I am always amazed that many people with significant experience are so resistant to this idea. For what it's worth, my experience matches yours entirely: many significant changes can't be meaningfully committed in small chunks, they only make sense as an all in one.

And even more so, I've often seen the opposite problem: people committing small chunks of a big feature that individually look good, but end up being a huge mess when the whole feature is available. I hate seeing PRs that add a field or method here and there (backwards compatible!) without actually using them for now, only to later find out that they've dispersed state for what should have been one operation over 5 different objects or something.


What changes can't be committed in <5k LOC? That's a shit ton of code. If you can't break that down into smaller shippable chunks there's probably something wrong, or you're building something extraordinarily complex.

It's definitely overall quicker to ship like this, but there are tradeoffs. You are effectively working independently from the rest of your team, there is no context sharing and everything is delivered at once after a longer period of time.


Here's a personal example from work:

We're performing atomistic simulations. The first edition of the code stored each atom on the heap and had a vector full of pointers to the individual atoms. Obviously this would obliterate the cache, so I crafted a PR to simply store all the atoms in a single vector. On its own, that was a one line change, but it was also a very fundamental change to the type system. Everything as simple as

    Atom* linker = atoms[index];
    linker->x += 1.57;
Suddenly had to be

    Atom& linker = atoms[index];
    linker.x += 1.57;
If I didn't make those corresponding changes, the code wouldn't type check and the build would fail. I think the final PR came out to about 17 kLOC.


The OP is talking about feature work, as am I.

Obviously if you make a change to something like your type system it's going to generate a very large diff, but you also aren't going to review the full diff.

You're just going to make the change with find+replace or some other automation then write in the PR description "I made this change to the type system". No one is actually reviewing 17k LOC.


Actually the example is pretty good for the kind of problem I'm talking about.

Let's imagine that instead of optimizing the pointers to in-place structs, we were taking the optimized program and adding support for dynamically allocated atoms because of some new feature for dynamically adding/removing atoms.

We could of course split the value->pointer 17k line change into a single PR. But, that PR is only doing a pessimization of the code. On its own, it makes no sense and should be rejected. It only makes sense if I know it will be followed by the other feature, and even then, I would have to see the specific changes being made to know if this pessimization is worth it.

And if it got committed to the main branch, in preparation for other PRs that depend on it, the main branch is no longer in a releasable state, since it would be crazy to release with a performance penalty and no feature gain.

So, the right way to push this is as a single PR with a 17k-changed-LoC commit + the commits that actually use it. Of course, people would only manually review the other changes, but that's easy to do even if it's all in a single PR. And anyone looking back at history would clearly see that the pessimization was only a part of the dynamically-allocated atom feature, not a crazy change that someone did.


You specifically mention "big feature" in your original comment so it's confusing that this is a good example of the kind of problem you're talking about.

This is a very different situation than large PRs containing feature code. I think most people would agree that one large PR is the correct approach for this kind of situation.


Usually new features require modifications of old code, at least in my experience. If I came across as claiming it's likely for a feature to require 5k new lines of code, then I clearly communicated badly. But a feature coming with 5k lines of modified code, while rare, still happens several times a year in a large project in my experience.


This isn't what the op is talking about. They're talking about a net new feature that would span 5k lines. Your change is trivial compared to it, and frankly would earn approvals immediately without much thought (assuming the changes were already planned and talked about)


A new feature can easily involve the kind of modifications that they are mentioning. It's pretty rare for a new feature to exclusively involve new code, in my experience. And when it needs modifications in a deep part of the stack, the new feature will easily spiral into small modifications to thousands of lines of code.


In that case I also see "large" PRs, but I point the reviewer to "file X and then 4999 copies of file Y." When doing large no-change refactors, I submit standalone PRs and merge them quickly because no one will review 1k identical changed lines (e.g. if I change the location of /lib/ imported 600 times in the project)


What if the refactoring makes the code harder to understand when looked at in isolation, but is necessary for the rest of the feature to work? Why submit it in a separate PR without the context of why it's necessary?


I've experienced this changing the API for a primary memory allocator in a frequently updated code-base. Each location updated was perilous and it needed to be changed in bulk to avoid an endless war of attrition.


We have a small feature that was put together in a rush. It "works", but only works correctly for some simple cases, otherwise there are bugs everywhere. There were very few tests. We must redo the while thing, update the UX and add tests. Tell me how we can achieve that without replacing all existing code and add new tests that cover every use case in the same change.


Why do you have to do it all at once? You can't improve the codebase piece by piece?

Shipping all the changes in a monster PR is usually not a good option. One big reason why is that you do not create any value until the whole thing ships. If you ship piece by piece you can create small amounts of value every time you ship.

Also, if it's a "small feature", why is it 5k+ LOC?


Upgrading versions of a framework with breaking changes is one. I did a 12k PR a few weeks ago that touched almost every file of the app.

It was an all or nothing thing, as almost every third-party dependency we used had to be swapped by something else.

At least we have amazing test coverage, so it was easy to find bugs.

But there wasn't much I could say other than "trust me".


I think if a review is very large, the owner of the code should do a code walkthrough for the whole team.


>many significant changes can't be meaningfully committed in small chunks

They almost always can. The exceptions are stuff like autogenerated code or updating a dependency.


We as a team sometimes decided to PR into a PR branch, not as meticulous as a PR to develop but there'd still be eyes on the code entering the branch. Especially useful when there are dependencies and/or different disciplines contributing to the feature.


As if you are reviewing 5k lines, though. It's either 90% whitespace changes that you completely skim over (probably missing the one bit that actually changed) or you just skim over it looking for a few patterns you don't like such as using a loop instead of Array.map or something.


When they say 5k lines, I'm assuming they mean 5k lines of meaningful code. 5k lines with whitespace changes, linting etc. would be impossible to meaningfully review


This is so true.


> at work

Level of trust with colleagues will hopefully be higher! And individual ownership of and responsibility for the code probably lower.


Which is the philosophy behind Gerrit as opposed to Github.


Well, I feel like "at work" is a keyword here.


What language/stack?


Typescript, React, Monaco, Treesitter, etc.


Life is gonna life


This is not at all a huge PR. Sometimes I make thousand line changes (or way over that), although in most cases it is just me working on a project. Changing a couple files (like in this PR) should be OK.


Do others share the sentiment that your reply is quite, sad to say that, arrogant?

What are you exactly trying to achieve by comparing the guy's "I'm busy, this is long" with yours or anybody elses? Moreover, what on earth has his job to do with the topic?

Bad day...?


Trust goes a long way. Microsoft had given me read/write access to full Windows source code on my first day despite that I was only hired to work on certain parts of the kernel and drivers. I'd never been shown this kind of trust before; other companies I'd worked with always had layers around trust, so, this kind of "you're 100% one of us now" message on my first day had made me extra happy and motivated at Microsoft. I was extra careful protecting source code too. Loved working there until the day I quit.


Not to worry, it's in a source control and mistakes can be reverted.


If detected.


Exactly.

I worked for an (unnamed) company once. For some reason I needed to update the gender of a user on the production server with two million users on it. (There was no interface to change gender I expect)

I accidentally forgot the WHERE clause on the SQL UPDATE and made everyone male.

A fellow developer was watching over my shoulder and we decided to just use the Title column (e.g. Mr/Ms/Mrs) to repopulate the Gender of all users without letting anyone know.

I want to apologize to all the female Drs. in that database.


honestly the security of that makes me nervous. I would frankly expect nation state levels of paranoia around the windows source. Was there any investigation into whether you could be compromised by someone? or maybe I'm underestimating the amount of checks between you and code in windows updates?


In "the days of yore," security was secondary.

Apple's operating system used to have "hooks," where you could register to intercept almost anything that went through the system. Basically, any app could intercept the execution thread of another app (or the operating system), and insert its own code.

Doesn't that sound fun?

One of my favorite MacHack projects was the "Energizer Bunny" hack (I think Dean Yu did it). You installed the hack on multiple Macs, and, randomly, the Energizer Bunny would start banging across their screen. When it was done with one machine, it would move to another machine on the network.

These days, security geeks would defecate masonry, if they came across that.

One of my favorite apps, was something called "Kaleidoscope"[0], which allowed you to select custom "themes," for the operating system, bypassing the Appearance Manager[1], which was an API over QuickDraw. The themes could have executable components.

Some of the themes where ghastly (but fun).

[0] https://en.wikipedia.org/wiki/Kaleidoscope_(software)

[1] https://en.wikipedia.org/wiki/Appearance_Manager


> I would frankly expect nation state levels of paranoia around the windows source.

Why? They give that source away under NDA to non-employees. Handing it to someone who passed all the employment hurdles and has signed a more restrictive NDA shouldn't be a problem.


You can get access as a third party too with the right NDA.

> maybe I'm underestimating the amount of checks between you and code in windows updates

Access doesn't mean force push rights to master (-:


And even "force push to master" would be okay. It's not like you can force push and not have anyone notice: you're going to break everyone's pull.

The biggest reason to disable this is more to prevent accidental mistakes: you think you're force pushing your feature branch but you accidentally force pushed master. We've all done something like that.


Non-force push rights to master are the true horror - if noone notices one, that commit gets pulled with no errors or warnings.

Of course MSFT probably requires x code reviews for PR/patch merges.


In defense of the concerned comment, they do say “read/write” access. I’m not 100% what they mean by “write” though, surely adding in some sneaky evil code where it doesn’t belong would set off some alarm bells at least.


Why? That's security by obscurity. On the contrary, the more people look at the code, the bigger the chance to find and fix any issues earlier rather than later. Linux is widespread and nobody freaks out about it being open source. And as of the changes - they are reviewed and tracked in version control.


it's the write access that worries me, not the read


A good system should have backups and monitoring of commits. I.e. worst thing that can happen when giving write rights is to make others spend some time fixing a bad commit, that's all.


Another possible outcome of "I gave commit rights to someone I didn't know": https://github.com/dominictarr/event-stream/issues/116


"he emailed me and said he wanted to maintain the module, so I gave it to him. I don't get any thing from maintaining this module, and I don't even use it anymore, and havn't for years"

This is the risk of open source not figuring out funding.


Listening to https://podcast.sustainoss.org/157 yesterday a great point was made about two distinct roles - the "author" (original person creating the library) and the "maintainer" (person continuing updates, feature requests, etc) and sounds like in this case this is an author who was nudged into being a maintainer, and them being funded may still not have been beneficial


I think authors usually give admin access to active contributors that seem reliable, instead of some random guy who ask for permission without doing much previously...


> This is the risk of open source not figuring out funding.

Not necessarily. There is software I built for me, thought that it could be useful to others, used for some time and then went away. Sure, if it brought me 1M€/month I would work on it hard. But it was not the primary goal anyway.


Not really. Even if it was somehow funded, the original author may still want to stop maintaining it and hand it over to someone else.

So it's an entirely different issue.


I don't understand why he didn't tell them to fork. The risk of him making new changes was so low.


You can not compare an email asking for rights, to someone who has written a massive amount of work, which you can randomly pick a section and see it is correct.

This commentator puts it well - https://news.ycombinator.com/item?id=36121561

This is simple spammer entropy.

Now with GPT spammers can create entropy very easily, so it's tricker.


An attacker would be willing to write a 40 line commit like this guy in order to get the access needed


I've given commit and other "dangerous" access (moderator or admin privileges) to people I don't personally know or barely know for a long time now and I don't think I've ever regretted it.

My criteria is usually just a willingness to improve the situation. I can observe this over time via pull requests, forks and general community participation.

I'm very reluctant to give access to someone asking for it. I firmly believe this is something that should be given and not to be expected.


I gave moderator and admin rights to someone over my Ingress Google+ community I hard worked hard to build, because I didn't play anymore. He promptly removed my admin rights and pursued his agenda with regards to some community Drama. That was an education to me in the human appetite for meaningless petty power plays when all is needed is a bit of good will and cooperation.


hah! i _think_ i remember that? i was an avid ingress player in downtown LA for a time. was definitely part of a few ingress circles/communities and do remember some BS happening in one, basically caused me to stop bothering with google+ after that, i wasn't part of many communities other than ingress. i think g+'s death knell was not soon after?


Is this sort of different, since it is a game or game-adjacent space? I’ve heard that, for example, Eve Online players do stuff occasionally that would normally be pretty Not Cool, it were done for some non-game reason.


I did this with every first committer to https://github.com/zladx/LADX-Disassembly : giving commit rights immediately (so that they can merge their first PR themselves).

I did wonders to foster a community of contributors, and get more patches coming. The CI ensures nothing breaks, and there never was any trust incident.


I think Graeber used to say that if you tell most adults that "Here's your power. I dare you to be responsible with it" they will.

(The notable exception are people who specifically seek power. Somehow they seem to be the least responsible with it.)


I think Graeber's principle works better in meatspace. Online, anonymity/pseudonymity, distance and ability to block communications tends to erode responsibility somewhat.


This is true. I've run a couple of co-ops and there's a huge difference in engagement when you mostly discuss tasks remotely and when you actually get together and do stuff in the same room.

I still haven't figured out why that is[1], and if there are aspects of it that can be recreated remotely.

----

[1]: Some ideas I've had is that it's about sunk costs ("Now that I got my ass here anyway, I may as well contribute") or that there's a visual component (perhaps seeing a face triggers some responsibility chemistry in our brains?) or that it's harder to avoid persistent questions when you share a room with someone.

I've also speculated that eating together may enhance engagement, but I don't know if that acts as some sort of Skinnerian reward mechanism or if people feel cared for and that triggers their desire to care in return.


Only somewhat related but in my experience people are also much quicker to be rude over the phone than when you meet them face to face (something which I paid close attention too, as someone who used to have irrational anxiety about interacting with people)


> I've also speculated that eating together may enhance engagemen

Definitely definitely.

Even without eating. If it's just "seeing a face", then zoom might suffice -- and zoom might improve things.

But I think it's clear that actually being in the same physical space with other people is an important ingredient of building relationships of trust and respect. I imagine neuroscientists could do a lot of research into why, and I imagine it's not just one thing (like "seeing a face"), but fairly complicated. But from many people's experience, it seems pretty clear that it's true.

(And I agree eating together is special extremely powerful "magic" here -- which btw is/was another severe cost of people's covid pandemic habits of not eating with people they aren't already intimate with...)

Still, I've built relationships of respect and trust with people online too.

I think another thing going on is that when someone shows up with an agenda to abuse your trust, it's somewhat easier to detect face-to-face (which doesn't come close to meaning universally reliable; of course it is quite possible for manipulative and sociopathic people to show up face-to-face with agendas of abuse and get away with it).


I wanted contribute a fix for a bug that I ran into all the time to https://curlconverter.com/ . The author gave me commit access while I was still working on my first PR (or shortly after) without me asking. I appreciated the trust and I ended up contributing way more than I originally intended to.


See also [0]. Rutger argues that if you have never trusted someone like this (somewhat) blindly, you may have indeed protected yourself from some misery, but you are also overwhelmingly denying yourself the better parts of life. Because "most people are kind" (which is the original Dutch title of the book btw).

[0] https://en.wikipedia.org/wiki/Humankind:_A_Hopeful_History


Discussed at the time:

I gave commit rights to someone I didn't know - https://news.ycombinator.com/item?id=12522654 - Sept 2016 (100 comments)


My personal rule is that you get commit access after some number of good PRs, depending on the project. Has worked out quite well:

- instar. I had two guys basically rewrite the entire thing and make it WAY better. I had a good vision for the API but my implementation was pretty bad.

- mutmut. I would never have gotten windows support going without help. (Although I am thinking of abandoning windows anyway soon...)

- iommi. This project is much more complex and has a certain philosophy, but we gave commit access to one developer pretty fast as it was super obvious from the first PR what kind of deep thinking he did.

All in all, great success.


These days getting commit rights to inject malware into an already popular gem has become a real threat.

In 2016 I think it wasn't yet/wasn't recognized.

I am very sympathetic to the suggestion in OP prior to recognizing that there may be people actually actively trying to abuse your trust to intentionally inject malware.


I've never had giving away commit rights backfire on me.

And if it did, sorting out the mess and reverting a malicious commit wouldn't be the end of the world.


If the malicious committer is able to publish releases, that malicious commit might have already reached users (and stolen/deleted their data, or whatever it was meant to do) by the time it's detected, and reverting won't help.


I created/maintained a popular project for years[^1], and recently passed ownership to someone else. It's been great seeing issues resolve, PRs merge, etc, after languishing for a while :)

[^1]: https://github.com/nccgroup/sobelow


Man I remember having to monkey patch that library to do some caching of the exchange rates. Otherwise it would do dozens of requests for the same information exchange rate.


I maintained the pjax library at one point. This is what the author posted at the time:

https://twitter.com/MoOx/status/955903710617620482?t=BvPIWQ-...


Another similar article, "the pull request hack":

https://felixge.de/2013/03/11/the-pull-request-hack/


This happened with clojupyter for me. I just gave everyone who submitted something good commit access, and a handful of people who are way better clojure coders than me made it way better in every way.


If the maintainer of a package trusts everyone on the Internet, then users who trust that maintainer (by installing their package) transitively trusts everyone on the Internet, which they might not have expected if the maintainer didn't declare this position upfront.

Maybe we need a way to declare in the package and repository metadata that the maintainer considers it world-writable and it shouldn't be installed or updated without very carefully reviewing the code of every new version.


> Spoiler: trusting your contributors works

I'm glad it worked for him, but just want to remind people of survivorship bias: https://xkcd.com/1827/


I was around around when pugs[0] was a thing, and the "commit bit to everyone who wants it" was kind of magical.

I am not sure you'd want this for everything, but for quick paced experimental work it seemed to be incredibly effective.

[0] https://en.wikipedia.org/wiki/Pugs_(compiler)


Glad it's not the case but that could have worked out very differently for him and all the users of that project.


Giving commit rights to some random person is risky, and will only happen to "celebrity" contributors, with enough social validation. That means not everyone will get it, regardless of how good job they do contributing to a fork or proposing solutions.

So I really appreciate projects like JazzBand [1], that gather likeminded contributors and individuals that want to harbour open source repos around an ecosystem (Eg. Django), while giving some assurance on governance. If JazzBand would be around in 2016, django money would be a very good candidate to be harboured by the org.

On a meta level, I really would love that more OSS devs would user orgs, rather than personal accounts and repos, so that they can grow their projects with a team, rather than becoming the bottleneck and gatekeeper for development.

[1]- https://jazzband.co/


> Giving commit rights to some random person is risky, and will only happen to "celebrity" contributors, with enough social validation.

Meh. I once stumbled upon a repository the maintainer had abandoned following a change of employment, there were a few things to fix which didn’t seem to hard so I figured I’d ask (thankfully this repo was part of an org I could ask the owner of).

I was able to get access to the repo and have been low-key maintaining it (updating the infra, etc…), it’s small and simple so it ain’t much work anyway.

I can assert that they’d never heard of me, because I never actually used the package. Still don’t.

> On a meta level, I really would love that more OSS devs would user orgs, rather than personal accounts and repos, so that they can grow their projects with a team, rather than becoming the bottleneck and gatekeeper for development.

A personal repo doesn’t preclude “growing your project with a team”, you can give write access to a personal repository. An org means extra overhead and complexity, it doesn’t just pay for itself when you create it. Do you create a new org every time you create a new project? Because that’s essentially what you’re suggesting.


Jazzband was mostly a graveyard though. And like you said, it's pretty much dead too. So seems like it would be bad to give something quite alive like Django to an org that died.


Strange that all of this happened without a single bit of ongoing communication. I always talk with people continuously whenever there is a collaboration.


This would be a godsend for a popular repo that's abandoned. But, other packages still use it and they may or may not migrate to a new package.


Careful there, some made one mistake like this and still have to support it 20 years later.


Very positive article, thank you! I wish I had similar experience!


Glad it worked out that one time, but, could never be me


Your milage may vary




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: