"In addition, rewriting code is a way of transferring knowledge and a sense of ownership to newer team members. This sense of ownership is crucial for productivity: engineers naturally put more effort into developing features and fixing problems in code that they feel is “theirs”."
Perhaps their metrics tell them that all of this is fine, I don’t know. But then, you have to realize Google has so much money they don’t really have to spend it very efficiently. For everyone else, this approach to rewrites seems like an extremely expensive way to produce software that’s not even all that great.
Perhaps their metrics tell them that all of this is fine, I don’t know. But then, you have to realize Google has so much money they don’t really have to spend it very efficiently.
The user experience ranges between good to mediocre to bad, depending. Google is simply too rich and powerful to care. So long as the goose keeps laying the golden eggs, they can just keep going along and patting themselves on the back.
But some one got to tick some boxes on there promotion track
Bug fixes and incremental features will generally not get you promoted for good reasons, we expect senior engineers to have system design skills and there is simply no way to demonstrate that without having your engineers design systems. If you only assign tasks based on the needs of your product and not on the needs of your workforce you could easily find yourself with a critical skill shortage.
There's a certain inefficiency to this, but only if you put your blinders on. If you have engineers churning out meaningless work then you certainly need to address that problem, but if you prioritize short-term product success over team health you are only trading one problem for another.
This is exceptionally bad, but sadly true.
If you have an engineer who can unblock teams and fix issues in an hour that others take a week or cannot fix at all, they are gonna jump ship if they can't be recognized.
At that point you've lost a valuable resource.
If I can do that without it being a fluke, they’d better bump my salary.
Even if the product/library/framework never changed much, rewrites would still be necessary to keep it going as new generations of programmers shuffle through. Otherwise we wind up with a Verner Vinge dystopian culture of software archeology.
An added kicker, I have actually gone down this road, and later realized the only thing I needed to do was make a small change to the existing system, so I did just that and discarded the new work.
Business got their new features, delivered on time, the system was still stable and reliable, everyone was happy, and now my domain knowledge has increased significantly so that new features and development or bug fixes will be even faster.
Out of date dependencies can be seen as code rot, regardless of quality/state of the actual code.
(Not saying that it should be rewritten because of that, but I do consider outdated dependencies to be code rot.)
Trying to make the software "theirs" seems to be an issue at Google, at least with their open source software and has seemed to have lead to it being less reliable.
For example, Angular Material v1 was one of the most complete and stable front-end packages on the market a few years ago. Then, back in 2017 the lead developer for the project was replaced with a new dev.
This new dev then went about assigning every issue and pull request to himself, modifying or rejecting pr's that had previously been approved, closing issues that had in progress pr's as won't fix, locking discussions, and just generally breaking stuff. (I've personally had to peg my project to v1.4 because everything since 2016 has been a regression)
If you go to the Angular Material v1.x Github page today, you'll see the same dev on pretty much everything.
This isn't productive ownership as it prioritizes the engineer over the customer and has lead to a generally broken system from one of the most stable properties out there. Not to mention, these open-source projects are most people's first exposure to Google's code... Having them be unpredictable regarding the functionality of their software with little concern for users/contributors in the name of making their devs feel special seems like a bad model to learn from.
I also don’t think it’s laziness. Developer hubris is real and it can get in the way of actual business value.
You pretty much nailed everything else. I share the sentiment because I’m currently working on splitting up a monolith into microservices. Most of the “quick wins” are things that didn’t need to be rewritten anyway, and the stuff that actually sucks is difficult to address. It’s tricky to get right.
Nobody said it is good general advice. There is so much stuff you need to do well to have a good chance at successful rewrite that it become good advice that it is a bad idea. Nobody said rewrites cannot be done. If you have a company that knows how to do rewrite (and makes it constantly which, I guess, helps a lot) then a lot of problems can be solved by starting from scratch and it it may be it is well worth it.
A sense of ownership and responsibility over a codebase is fundamental and essential to proper stewardship and maintenance of that code, and refactoring and rewriting is the most effective way to inculcate that feeling. Sometimes it’s not always feasible, and sometimes it’s not necessary, but the end state is essential. Group or shared ownership of code is a manager’s wet dream but pragmatically impossible, a swamp of mediocrity.
What is the current programming language used?
Is the project mostly legacy (few or no changes in the last few years)?
Is the project critical to the company?
How many people are working on it today? Are they fully assigned to it or is it a "touch when it breaks" kind of situation?
Is the project following modern CI/CD practices? If not, how hard would it be to adapt in the current software stack? Would it be easier in a new stack?
Are developers spending more time than expected understanding the code base?
Is it a political situation? Is the current owner refusing changes? Could it be that rewriting it is just a costly way for removing ownership from that developer/manager?
The fact that "write-only code" is apparently considered a part of sensible software-engineering practice speaks volumes about what their technical culture is like more generally. One would think that code should be much easier to read and survey than it was to write.
In fact, I'll often sketch out a block diagram or some pseudocode if I'm having trouble grokking something I'm reading just to help me get into the proper mindset. I agree this is a problem with the current state of the field, but I haven't seen any good solutions, only hacks and workarounds.
If the intention is for a uniquely skilled human to give a computer instructions, and the computer to execute them - then just write it in machine code.
We need to be precise about what the problem is, exactly. Here is a more salient formulation: "Not Invented Here" is a form of bias that distorts cost/benefit analysis when people are deciding whether to reuse existing code.
They get ticked off by one imperfection here and other over there and immediately run for exit shouting “I could do so much better”.
Given the formulation of NIH as a form of bias that distorts cost/benefit analysis, what can we do at this point? How about quantifying the cost of the "one imperfection here and other over there" instead of going by the subjective being "ticked off?"
Instead of understanding why things are the way it is, they fantasize about how they can one up original authors and claim their own hero title. They go on to throughly underestimate the time to recreate what has taken years of learning.
To complete the cost/benefit analysis, we can then quantify the development cost of the library they are considering re-implementing.
Meanwhile competition has moved on to V2 laughing their way to the bank and customers scratch their heads why you are still stuck in same place for so long. Then our new “owners” gets their promos after massive marketing of how much better everything is now.
This needs to be quantified and put into the cost/benefit analysis.
But to everyone’s surprise they soon leave the project because working on bugs and incremental features has became boring and BTW, the new stuff is just as complex as old stuff. New devs roll in and we start the whole cycle again.
This also needs to be quantified and put into the cost/benefit analysis. Instead of doing cost/benefit analysis attached to individuals, it should be done by product or by project, so that these turnover costs are also accounted for.
It's perfectly reasonable (for the reasons in the submission) to rewrite code that exists already.
Hell, if they really care about claiming that hero title, why not spend that effort to understand and refactor the parts of the code that were initially hard to deal with. What you wrote above is a distillation of what a "tech bro", "Type-A player", ego-based culture looks like. It is the opposite of true professionalism.
A system such as Gmail is composed of many smaller parts. If one of those parts was written years prior for a world that has since changed, it may be accruing technical debt as it's continually extended to fit new requirements. An occasional rewrite helps address this type of decay.
Without the rewrite you may find yourself 10 years later with a system that's both critical and kludgy, and at that point the rewrite will be a much larger project.
I like how you state this like it's an objective fact. I've always been happy with the gmail UI and the latest iteration is great too. Outlook on the other hand...
Be too quick and you're constantly chasing shadows… wait too long and you're immobile.
To get on-topic: Gensim could use such a rewrite as well… The ML world changed, expectations and requirements changed, ecosystem and APIs changed. I changed too.
Apparently me spending half a day reading code is no big deal but cleaning it up is a waste of time.
I find this is particularly common mistake by junior programmers (not saying this applies to you), presumably because they aren't used to reading other people's code. Frustratingly, this is often coupled with an attitude that missing out large chunks of existing user functionality is acceptable if it makes the code a bit simpler.
Of course, sometimes rewrites/refactors really are an improvement. Sometimes code really is fragile and confusing, either because of who wrote it or because it has had many small changes tacked on in the easiest places. Or perhaps the last person that understood that code has left the company, so it's OK that you find the code clearer only because you just wrote it! But in any case, it is fair to ask for real justification for a rewrite.
You can say it disrupts others understanding of shitty old code, frankly I don't care. Code is not immutable and maintaining status quo helps nobody except the old guard stay relevant.
> Of course, sometimes rewrites/refactors really are an improvement
If that's what you were talking about in the first place, fair enough and my apologies.
Those perfect one line fixes are great but they approach their limits and eventually need to be refactored. This is natural and should not be frowned upon. Likewise you can't just refactor all the time, one line fixes are faster and less costly in all sorts of ways. The key is to pick and choose when to use each strategy.
I think we're too often looking for simple rules. There are none. Just a bunch of guidelines.
I'll try to give an example I hope is realistic:
One of the heaviest things an application can get is a complete theme system. Suppose you don't have one. It's not a requirement.
Adding a theme system when there is none is months of work and might impact basically every line of code that displays anything.
So you're not doing it. Now for some exceptional system there is just one case where somewhere you are displaying something under an external widget that doesn't meet its size constraints or whatever - long story short your text is invisible, you want to inverse the font to get it working. You don't have any code like that.
Do you think it is OK to add it to the easiest possible place: in this case perhaps you add an optional argument called "need_to_invert_color" (this awkward phrasing tells you it's a hack) to a single function, default it as false, comment it as: //invert the color of the font. This is needed where an external graphing widget with a black background leaks onto our canvas due to not respecting our pixel boundaries, so that our text displays over it.
And then where you call comment the same thing, that //currently a bug in the widget code makes the widget leak xyz pixels below its bottom border. As a temporary fix we introduce an argument need_to_invert_color into our display function. As of this writing 3 Jan 2019 we are just using it from here. The correct fix would be for the widget to stop leaking instead, and when that is done white text is unnecessary - and we might not notice. So we start by testing whether the area we will be overlaid over is indeed the wrong color.
ETC. In other words a quick hack for a corner case, that doesn't fix the underlying bug (workaround) and even as a hack makes use of something that doesn't exist (a theme system), instead adding and documenting a half-assed thing tacked on.
When exactly is the right time can be hard to figure out. Especially because, if enough hacks have accumulated that code really does need a reshuffle, then the job of refactoring is harder, which actually increases the temptation to put it off. But I wouldn't make a big sweeping change just because of one small hack that's a bit ugly.
This was my thinking on a lot of projects. But you know what? That rewrite never became necessary!
So not only did I not start with the "right" architecture - I didn't end with it either!
That's what "many small hacks in the easiest of places" reminded me of and I wondered if you in fact do approve of it. It has always seemed fine for me. Just no problem at all. But with documentation right there and the worse the hack, the clearer the documentation right there explaining and justifying it, up to and including "I don't know why this works but this system call makes the next line succeed, whereas removing it causes the next line to fail sometimes - this is tested in testxzy." Obviously a hack, a terrible hack if you don't know why it works. And a project can end up with a lot of these.
You refactor it.
Now one person understands the system.
Now it takes 2 hours.
I’ll take that.
Sometimes you gotta sit down and bite the bullet and read and ask questions.
I think Google partially does this in order to keep its engineers happy, as you are more happy when you develop something from the ground-up compared to just maintaining something already built. By going down this route those engineers are kept in a “happy state” so there’s less risk of them flying off to other pastures, where they could potentially build the next product that could “kill” Google. Sort of invisible golden hand-cuffs, if you will. I personally find it tremendously wasteful at a societal level but I can see the value of this strategy for Google as a company.
I mean, I guess the duplicate code/class is now coupled to the two places that use it, but I have a hard time seeing how that is worse than two duplicate instances of the code.
Two pieces of code in different parts of the codebase are very similar. They have nothing to do with each other - even semantically. But the code is very similar. So someone thinks this is code duplication and creates a function/class/whatever that both pieces of code can use. Repeat all over the place.
Then one day, one of those two places needs custom behavior. I can either change that function/class and create complexity (have to now support two use cases). Or I can stop using that function/class in that place and go back to the old solution. Sometimes, this is quite a lot of work as aggressive "DRY" leads to a fair amount of coupling - there could be a few layers of DRY'd code there to untangle.
I put "DRY" in quotes because none of this really is DRY. DRY originally was about requirements - not code. No requirement should show up in multiple places in the code base. In this example, even though the code was almost identical in both places, there was little else common. They dealt with different requirements, for completely different reasons. They should never have been refactored to use a common function/class.
These days people keep talking about over-use of DRY, but they're really complaining about overabstraction of disparate code - not the DRY in the requirements sense.
It takes a fair amount of work to design a good class where one can easily apply the open-closed principle. And one should where it's needed. But tying together two completely unrelated parts of your code base with such a class just because the two pieces of code are almost identical is the wrong approach. Then going ahead and designing it for the open-closed principle merely adds complexity.
Sometimes this is beneficial. Sometimes it isn't.
My point is — more often than not — conventional wisdom is that DRY is always preferable, whereas the reality is not that simple.
In my honest opinion, frequent rewrites are by-and-large a disastrously bad idea, for several reasons. If there is one thing I would change about Google, it would be to slow down the frenetic pace of change inside. Rewrites just make the pace of change untenable. And I say this as one who is totally part of the problem: I helped rewrite significant parts of V8, the JS VM in Chrome, particularly the optimizing JIT compiler, TurboFan. (Don't get me wrong--I am not knocking any one specific project, my coworkers, even my leadership, etc). I've been at Google 9 years, and I don't know how barely any of it works anymore.
1. The assumption that requirements and environment around software change so frequently that it must be burned down to the ground and rewritten is a big part of the problem. Why do the requirements of software change? A. Scale. B. Because the software around it changed. Bingo.
2. Rewrites actively destroy institutional expertise. Instead of learning more as time goes on, engineers' knowledge becomes obsolete as old systems are constantly changing and rewritten for unclear benefit. Experts can no longer rely on their knowledge for more than a couple of years. This is extremely bad for critical pieces of infrastructure. In short, no one ever masters anything. This is due not just to incentives but due to change itself.
3. No one ever has time to do an in-depth followup study on whether the rewritten artifact was better than the original. Instead people go on their gut feeling of having rewritten something they often did not write themselves (and did not fully understand) with something new and shiny of their own creation. The justification of the outcome is done, in short, by the people who have a big vested interest in declaring success. (And yes, me too).
4. The idea that the software requirements keep changing around software is promulgated by the exact same people who never spend any time up front simply writing requirements down. Well, no fracking wonder the requirements seem to change somewhere in the middle or years later: they were never anticipated in the first place! We'd do better overall if the industry in general did some good ole requirements engineering. Most people I talk to have never even heard of this. Instead, we never have any time to stop and think about doing things right, but we always find time to rewrite from scratch.
5. Zero incentive to do things right. As a field, as industry, we are actually not very serious about writing good software. Instead, we're just going to trash it after 5 years. So software is constantly bad. But the next rewrite!
The drive for rewrites is mostly a swindle in my opinion. The reality is that some software needs to be shot in the head, some needs to be rewritten, but most software needs to be just maintained. That means bugfixes, performance improvements, scalability improvements, and sometimes, yes, refactoring too. But bugfixes and incremental performance improvements don't get anyone promoted. Even more cynically, but very realistically, "old" software that is maintained by experts means a dependency on those experts, and they end up being expensive. Corporations hate when their employees have job security! Rewrites are A.) driven by an influx of young talent who want to make their mark, B.) incentivized by the promotion process and C.) driven by a corporate pressure (everywhere, not just Google) to make sure that programmers and software are commoditized to avoid dependencies, bus factor, and job security.
So much 3, 4, and 5.
I was so spoiled at my last employer. Leaving was a mistake, but I can't go back because I relocated. Seems like nobody in this entire town "gets it". Seriously.
What I have noticed from the travel industry is that backend developers tend to write a ton of original code and rarely refactor anything as though they are scared an improvement is always a regression. Frontend developers tended to not refactor anything either, but then they were scared to write any code at all (note: I am a frontend developer).
Part of this fear was justified because their automation was shitty. Often things were properly code reviewed, but changes would only be accepted with the smallest possible footprint. It hurts when the diff is a static text comparison counting every character, which means a bunch of comments explaining things or white space changes looks like a ton of code changes. It also keeps you from removing unnecessary code or reorganizing things. The killer though were frameworks for everything including test automation in various different flavors for the same sorts of things, which is like writing tests for testing of tests. In this case testing became a block to check off that had little or no real value.
But both, search and adwords, have been rewritten multiple times since their launch.
You can have a look at the papers created by Jeff Dean (1) and Sanjay Ghemawat (2) which mention some of the new concepts/technologies/features used in those products.
Disclosure: I co-author the Python client library and have written a few of the docs on the site listed above.
"Almost all development occurs at the 'head' of the repository, not on branches."
Googler Rachel Potvin made an even stronger statement in her presentation about "The Motivation for a Monolithic Codebase" :
"Branching for development at Google is exceedingly rare [..]"
In the related ACM paper she published with Josh Levenberg there is the statement that:
"Development on branches is unusual and not well supported at Google, though branches are typically used for releases."
I my world when we have to make a bigger change we create a branch and only merge it into the trunk when it is good enough to be
integrated. The branch enables us to work on that change together.
I don't understand how they do this at google. As far as I understand in their model they either have to
- give up on collaboration and always have just a single developer work on a change.
- share code by other means.
- check in unfinished work to the trunk for collaboration and constantly break trunk.
What is more common is that very large changes are checked in as a series of individually compatible changes, and often broken up across the repository (there are of course tools to help with this). It's relatively rare for multiple developers to work on a single changelist; it's much more common to break the work into separate changelists.
Haven't worked there for some years now so I'm a bit rusty on some of the detail.
Git and its model was the best thing few years back. Now since google is doing all its dev in the main trunk/master, it must be correct and more intelligent.
Wouldn't it be a case that they went with what they had at a certain time and continue to use it as everyone is used to it and it still works? Not sure if google analysed if branching was bad and then chose trunk based development?
I cannot understand how a company that has a well defined process doing branches, is doing it wrong? or how it is so not optimal etc. I guess it is a matter of processes and culture. None of the great companies are great because their source control strategy (or code) was excellent.
We developers always over analyse everything and come up with excellent logic and some of us are gifted with words more than others.
Instead of branching you would just create a `changelist` (a commit, a set of changes to files) and work on that.
You can show it to your colleagues. You can build and test it. You can send the id to anyone to have a look at it, or test it themselves.
You can have multiple changelists depending on each other, without being commited.
This might be an interesting read for you: https://paulhammant.com/2014/01/08/googles-vs-facebooks-trun...
"Branches & Merge Pain
"TL;DR: the same
"They don’t have merge pain, because as a rule developers are not merging to/from branches. At least up to the central repo’s server they are not. On workstations, developers may be merging to/from local branches, and rebasing when the push something that’s “done” back to the central repo.
"Release engineers might cherry-pick defect fixes from time to time, but regular developers are not merging (you should not count to-working-copy merges)"
For example: How do they untangle a wad of code that is large enough that it takes longer than a few days and more than a single developer to get the code back into a state that is acceptable for trunk?
The changes required for this kind of refactorings can be all over the place, regardless of any organizational boundaries in your code. I can't see how changes of this nature can be put behind feature flags.
- make all code that uses said code use the interface instead
- build the feature switch into the interface
(i.e. you create the new API, migrate stuff to use it, deprecate the old one, migrate the rest, retire the old one).
Well, you could actually call that "unfinished" because in the beginning the code doesn't accomplish the task, but progressively it will become more useful.
You can when you can but you can't when you can't.
I 100% agree with you that we should work this way whenever possible and we should work hard to keep our code in a state that lets us cleanly divide work.
In my experience it is not always possible to split up work that way. Think of untangling dependencies of a larger part of the code as an example.
Other times it's genuinely necessary to make a long standing branch. In those cases, you just do it. Trunk based development should not be a dogma, just a different default choice.
I only scanned through it, but it seems similar to the de facto way of doing things before distributed version control systems became popular (in the late 2000s?).
Is a cleaner and more obvious guide.
The idea is you have a constantly usable master, and your branches should be short lived so you don't hit a brick wall trying to get reviews and merge on your massive change sets.
Ultimately it means you want to test and review your change before it goes into master as opposed to creating "production", "staging" and "develop" branches, which largely just kick the can down the road and is a different way to solve that "what's deployed where" issue.
1) Team A checks in their code to provide Feature X. Their code is not used anywhere in the codebase yet, however full unit test coverage exists for the public API; this is required for code review.
2) Team B checks in their code to turn on Feature X in their product, gated under a command-line flag which by default uses the old behavior.
3) Team B checks in an integration test that flips the flag and makes sure everything works as planned.
4) If Team B requires changes to Feature X to get expected behavior, they communicate those changes to Team A and someone from either team (using available human resources) makes the changes.
5) Team B checks in a small change to flip the flag by default.
6) Team B monitors their product. If things go awry, only the very latest change is reverted and repeat (4).
7) Once stability is achieved, Team B checks in a change to remove the flag.
There are (multiple) tools in Facebook and Google which are an abstraction on top of their VCS.
(e.g. which feels more like git, where you can work on a stream of changes which depend on each other without actually pushing anything to head)
Don't see why branching should be needed for this.
It's visible in code review UI, has a description, has tests run on it, it can be merged by other people and it can be referenced from anywhere. Eventually it's merged into the head or dropped.
If every branch was always merged back into head before doing anything else, and always had its commits flattened into one, and someone forking off of your branch was basically opening it up, copying the changes in your clipboard, and pasting it into a new branch with no attribution or history, then sure.
With branches, if someone updates the branch you depend on, your work is based on stale stuff, and it can get ugly. Just try to do it on github :-)
A "patch" is actually just a commit (actually a changelist) which can be viewed, commented and edited in the browser based code review and IDE tool.
Imho I find it much easier to get an url of a "patch" and comment on it inline, instead of having to checkout a branch etc.
If you have a question to a specific example I'm happy to answer it in the way it would've been done within Google/Facebook.
One thing I infer from your answer is that it seems that there is an established process and dedicated tooling for working with patches at Google. I think a lot of my pain with patches stems more from the lack of process and lack of an agreement on formats and standards in my environment than from the use of patches per se.
Where I still see an advantage of branches is that they facilitate documentation of what has been done by whom and when. All of this documentation is in the same place and form as the documentation of changes in the trunk. It all is in commit messages whereas patches are only documented somewhere else, possibly in the Email or IM used to send the patch. Even if most of the branch documentation does not survive on trunk when we squash the final merge it is still there and easy to find as long as the branch doesn't get deleted. When I want to look up why I applied a certain patch I'll have to dig through my messages. I think that makes it harder to work with patches than with branches.
Google's system is derived from Perforce, which has the concept of a changelist (think: commit), which can be "pending" and stored on the server for review/cloning by other developers: https://www.perforce.com/perforce/doc.051/manuals/p4guide/07...
This allows you to share work without (in Git terms) pushing to master. Branches in Perforce-like systems tend to be more heavyweight and permanent (IIRC you have to branch an entire path of files, it is not the same as the Git concept of "branch" which is just a commit that points to another parent commit).
You can think of the system as enabling you, in Git terms, to create pull requests without the creation of an underlying branch.
You basically work on a "patch" (changelist), get feedback from others and send it out to review at the end.
Before you can submit (commit) it, you'll have to sync to "head" (to have the latest changes) and run all tests.
^ most of this happens automatically, and as most changelists ("patches") are small, this happens very fast and async in the background.
The PDF explicitly calls out the time consuming part:
"Almost all development occurs at the “head” of the repository, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes."
Anyway, I feel sad that so much efforts were put on really nice VCS concepts and almost no one use them in enterprise development.
Example issue, note that public ones are not associated with commits:
Once a month? In an averagely well run company even that may be towards the higher end.
Should your entire development strategy be based on a once a month occurrence?
Closer to once a week for me.
If 2+ people are working on the same file which might result into a conflict, you can either:
- handle the conflict as soon as you merge your branches somewhen in the future
- handle it when trying to commit your change to head
Only difference is whether you handle the conflict now, or in the future.
This whole conversation the last day or two on HN has been kind of nuts. Like everybody agrees you shouldn’t put all your code in a single file, right? Why not? It would let everyone see all of the source code in one place! But it would be huge and hard to avoid conflicts. So we split things into files. Then “trees”, etc...
Basically it sounds like googles monorepo is really a bunch of repos glued together with changes in one triggering changes in others. The difference, it seems, is that google does not get to benefit from the things OS developers like about git. It’s like google developed custom versions of GitHub, circleci, and other tools and are marketing that as a better solution (just build several billion dollar solutions to manage your monorepo!).
And even after all that, google has a bunch of separate repos for important open source or secret work.
a = 1
a = 5
a = 0
Both contributors created a pull request and submitted it. In the description they both state that the new value should be the one they put it. How would you resolve this issue in a timely fashion making sure you do not take down a service accidentally and do not slow down development too much. I intentionally gave you a very simple example but if you want we can go into rolling out new features, fixing security bugs and a lot more where such issues arise. And, no git will never be able to solve these issues.
I don't think that HN going nuts (except few zealots) and these problems come from the nature of software development in general. We have seen how Google solves these (monorepo, custom CI/CD, etc. etc.) and there are other companies solving it different ways (maybe have a branching model, using Github). People are just putting out here they experience and based on that and they level of understanding the perceived solutions.
Someone’s changes get committed first. That’s a business decision, not a code tooling one. Second pr has to adjust. Same on both mono and poly repo, just using different words.
At least branches let you have the choice, which cannot be said for branchless.
Effectively this workflow means everyone is working on the same branch and the first commit to pass review gets in. The next guy will have to rebase.
In the end one wins as you said, but the level of detail is rebasing individual commits, not merging entire branches.
So this ends up being the age-old rebase vs merge discussion.
If the monorep contains let's say five different products and in a day only one of them gets a merge, then Blaze still builds all five and all five are released (based on successful integration testing)? OR only releases the changed product (and any other ones which depended on the changed one)
EDIT: Also, the "canary" server is still for testing ? There may exist practically a set of canaries running very different versions ? Is there any correlation or any version "roll-up" constraints between various canaries ?
Canaries are live/production traffic only. When a release is deployed, it goes to the canary instances of a job first, and it will take a small subset of traffic. This allows the job owners to see if the new binary has any adverse effects before rolling out more widely. More details about canaries can be found here.
Once important thing to think about with Google's source control is that it is closer to SVN as far as versioning goes. There are no (for the most part) feature branches or anything like that. Everyone is always working on HEAD. when you do a release, you will start near HEAD when cutting the release.
Releases are done by each team. Look into blaze/bazel. It allows me to say "I depend on these sources only". So a day with only one change, you might only build and release the changed artifact (in practice this never happens).
More mature teams do a lot of complex stuff. Large teams may have multiple stages of canary, multiple canaries, feature experiments, etc.
I'm not as well-versed in canarying though I've set it up for a team or two. I've only ever seen a single canary version for any particular binary.
Canarying is done. I haven't seen canaries running multiple versions of the same binary. Though teams will often guard new behavior behind experiment flags.
Being able to write and use a mapreduce with a high level of confidence that my code would continue to work years later was another nice benefit. MRs I wrote in the first year at Google still compiled and ran with minimal changes almost 10 years later(!) which is amazing given the amount of environmental change that occurred.
That said, somebody could still halt development across the company by changing and checking in a core file (like proto defs for websearch) without testing.
Whatever social system led to google3/borg and the amazing productivity associated with it, it was a special moment that hasn't been replicated many times.
I'll give google credit for one thing: it can change backends a lot without too much user visible pain.
Amusingly, this comment may explain how someone in this thread can be incredulous about the idea that things get rewritten every 2-3 years (which yes is an exaggeration).
Google is very good at making sweeping infrastructural changes (generally improvements, I might add) without significant user pain.
Ps thanks for getting scipy into third_party all those years ago.
It's worth noting that this is only viable at Google because they don't use git. Git's insistence on every client having a full copy of all history of every file in the repository makes monorepo much more expensive.
I see conflicting reports over whether google use Perforce or something proprietary called "piper"?
You might be happy to know Microsoft created a Virtual File System for Git  so you do not have to have every file checked out in your working directory. Microsoft uses Git in a monorepo (for Windows, and it's 2.5 million files/300GB)
(It may still be backwards-compatible, but it's been years since they turned off the last real Perforce and I haven't worked there for years myself. So it may have diverged.)
Note that there's nothing forbidding you from writing a virtual git filesystem that fetches objects from some centralized repo as files are open()ed. Git on cloud steroids.
The underlying assumption it that the project will fold in 2 years anyways, so one may get away with never doing dependency upgrade.
The idea of wanting to make a change to a third party library and then see all the downstream consumers who would be broken by that change if they updated to start consuming that change is an incredibly stupid thing to want, and it’s no measure of success whatsoever to build something that gives you that information.
It’s like the most giant case of coupling you can imagine (letting the statuses of thousands of consumer apps act as any type of constraint on the developer choices of the third party library, as opposed to all those consumer apps opting in to changes on their own terms by updating their dependencies).
Imagine if I have shared a bunch of copies of my resume with a bunch of recruiters. They are out there selling me as a candidate or whatever. Now I decide I want to change my resume, but I don’t know if it’s going to upset the approach some recruiter is taking.
If I can’t update my resume unless I first consult a big oracle that tells me which recruiters will be negatively impacted, that’s a problem, and not at all some type of live-with-able “customer service” positive thing. It’s just plain old bad coupling.
Creating such a system that could automatically diff the old resume’s usage constraints against my proposed changes would be a gigantic waste of time. The exact opposite of something to celebrate.
I say this as someone who routinely writes in-house software libraries used by dozens or hundreds of other apps, various teams, and even a few that are open source.
The primary thing gauging the health of our development is that we are decoupled from any consumers. We are free to make whatever changes we want, and whether downstream teams would like to receive those changes is wholly an opt-in process with versioned dependencies and easy rollbacks controlled by those consumers.
It's also useful information in the edge case where consumers are relying on undocumented or unintentional behavior in your package.
Yes, you don't want a hard-constraint of no-breaking changes ever, but knowing immediately when a change is breaking change (especially if you didn't intend it to be) is useful.
> “in practice its important to remember that your package only exists to be consumed by its dependencies. That is its sole purpose. If your changes aren't serving those consumers, then they're the wrong changes.”
I agree completely and that’s exactly why you want downstream consumers to opt-in to your changes.
As the library writer / maintainer, nobody knows better than you how to implement the behaviors downstream consumers want. Sure, those other folks know what they want, but are not at all a trustworthy signal for how to solve it for them.
If you are constrained by what breakages your new approach would introduce, this is backwards, exactly from the “in practice” perspective you described. That means you are not able to actually solve your consumers’ problems, create new solutions, refactor old bad ways of working, because you are coupling the what with the how.
The fact that you are beholden to your consumers is all the more reason to decouple the development process from the delivery process. It only makes this idea of wanting a big oracle to tell you what would be broken not because of functional incorrectness but because of a consumers lack of accommodating the new changes all the more egregious.
This seems attractive for other large organizations. Any positive or negative experiences from readers?
When it gets tied into career progression, you end up with not very productive goals (but easily measurable!) like "Reduce eslint warnings in legacy project X by 50%", because business value are either not easily measurable (how do you quantify better knowledge of the overall system architecture? Bugs not caused? But how do you know the developer didn't just stay in their comfort area/have easy projects this quarter/year? I saved Joe 4 hours on Friday since he didn't have to investigate what the system does with foobars as I had the answer? That just sounds petty. ), or not directly under the developers control (revenue).
I've come to the conclusion it's folly to pursue any one companies career progression maze. Because you get corralled into all sorts of sillyness like this; vying for projects with your peers, acrimonious code reviews, chasing silly metrics, etc. I find it's much more effective in time, money and title (and work life balance, and mental health) to simply switch jobs for the higher title.
In other words don't fall for the "work your ass off for a possible future bump in pay and title" game many companies play.
I've worked for a few. In my experience they end up getting pushed out by the bureaucracy after a few years.
If I recall correctly we had to state goals for each of 5 major categories and another set of 5 supplementary categories. The categories were things like - deliver customer value, enhance team work, collaborate well across teams and various other enterprise buzz words, I forget exactly..
As a coder, my goal was pretty much to write good code and avoid sitting through pointless meetings as much as possible (a very hard task in that place). I would basically have to spend half a day coming up with various different ways to word this so that it would fit each of the five goals.
Every six months (one mid year review, and then the final review) I would have to meet with my manager to provide evidence that I was achieving my goals. That would be another half a day twisting words around to try to fit what I had done to the goals. My manager would have to do the same for me. I was then scored on each of the goals. Then the score was totalled up and was used to determine the salary increase that I would get at the end of the year.
We all despised the system. A collective groan would go around the meeting room when it was announced that it was review time again.
1) While writing up the goals, you don't have a full view of the problem. Goals change, but once written down, there is a strong pressure to implement what has been written down.
2) It selects for people who are good at writing convincing design docs. Often these people write sub-optimal code and the designs only look good on paper.
Actually, the products (aside from search+ads) that come out of Google look exactly like the have been produced using this methodology; and that's not a good thing.
In my experience, the ones who write good design docs are the ones who write good code.
Design doc writing is not simply overhead and marketing - it is concisely describing what and how you want to do something, and inviting feedback and other ideas.
The exercise of writing a good design doc brings you through the process, thinking of every non-trivial aspect.
It also typically only takes a day or two (or maybe a week for something more complicated) - far less time than the corresponding code takes. And if a colleague points out something that could be done better, you won't have wasted weeks or months writing the wrong code - only hours writing the wrong design. Much less costly to fix, and much easier to move on from, emotionally.
The best way to make more money is to change jobs. Google may be different.
If you are at an appropriate tier company for your skill level/aptitude then you likely will have to work a lot of extra hours to be (and to be seen as) an 80th percentile performer.
You won't know if that time investment gets you anything until the end of the year. You won't even know what the bonus pool will be or if you'll even have the same manager by the end of the year.
All for an extra 10-30k a year (pre-tax)?
Your time is probably better spent building your reputation in the industry and trying to get a higher paying job.
But working at regular non SV companies, I’ve learned just to focus on doing as well as possible while still having a sane work life balance, keeping on top of industry trends and job hopping when my salary and the market were out of whack.
Google pays such above market salaries, the strategy would be different.
But seriously, you are evaluated against the role description (software eng? product mgmt?) and your level. If you exceed expectations consistently over several review cycles, you are encouraged to apply for a promotion.
The goal is to get you promoted into a role and level where you can consistently meet expectations.
Roles/levels are calibrated so that expectations at L+1 are generally speaking aligned with strong performance at L.
Instead you have to have this checklist of your quarterly goals, on which you can go through with your engineering manager on biweekly 1-on-1 meetings! And of course you should make a nice spreadsheet and a confluence page documenting your progress, since we're data driven :)
Did you fix a major fuckup in some legacy component? Where are the numbers? Ah, then it's not visible enough for a promotion, here's your 1% raise instead. Do you see Paul over there? He made great progress this quarter! One of his goals was to write a blogpost each week, and guess what, he did! He's on a great growth trajectory, and well deserves his promotion.
Long story short: it's a great way to drive away your best talent while keeping the confluence page and blogpost writers.
What if that legacy component was no longer in use? Is there some bigger picture in play?
I've been at bad companies before, so I understand being cynical. But, having people just do whatever they feel like also does not work. From the outside it looks like Google has quite a bit of this, so they are clearly trying to get people on some path. The messaging app situation shows it hasn't quite worked yet (from the outside anyway).
Said legacy component was (and is) making a substantial chunk of the companies revenue.
He was actually assigned by his manager to investigate the problem. Source: am a programmer.
If your manager asks you to do something, ask them how the thing they are assigning will help advance your career. Ask the manager how -- if you deliver on what they ask -- they will go to bat for you when they are sitting in the room with their peers justifying your evaluation score. Be willing and able to simply state, "I don't know how you can expect me to spend my time and energy on something that won't help me advance my career." If your manager can't understand or respect that, then that's great! You have a clear warning sign that it's time to fire your manager.
I'm telling you, fixing a serious issue in a legacy component is anything but amusing. I'd be happy writing blogposts about the current framework of the week, but if I uncover an issue while working on my regular tasks, I'm gonna try to fix it instead of sidestepping it.
Goals could be "Pick up language X in order to help development on project Y" or "Get formally introduced to all R&D team leaders, and get introduced to their roadmaps" or "Facilitate 10 job interviews together with team leaders in marketing"
Have you ever read a white paper? You see how they manage to say that the product will solve every problem you have and nothing specific at all? Same idea.
Goal: get promoted/get higher salary
Steps to the goal: did my job well
Most orgs that adopt OKRs only make them visible to the person's manager, and without transparency and good feedback, the other problems mentioned here proliferate.
Not anymore. Let me talk startup anti-Google pattern here.
* Most of Google’s code is stored in a single unified source-code repository, and is accessible to all software engineers at Google
This can be the worst nightmare from a management POV in a startup. Sure it sounds wonderful everyone can see/fix anyone else code but 99% people shouldn't have time to do so (if they do their work load is not full, increase the load). The 1% I guarantee all your codebase has just been stolen by an ex-employee with malicious attempt. Instead divide your codebase into different projects/roles and people only gain access when needed.
* The next step is to usually roll out to one or more “canary” servers that are processing a
subset of the live production traffic.
Not necessary when your misery not-product-market-fit-yet website only gets 100 users. Just roll-the-f-out , let it break and fix later. Building the canary system is a huge overkill in the early stage.
* All changes to the main source code repository MUST be reviewed by at least one other
Same as above. Just build and RTFO.
That's a sweatshop mentality. Startup, big corp, whatever: your engineers should have flexibility to work on things that they recognize the importance of. That's not the same as free time or a lack of tasking; that's treating people like adults who might notice things you do not.
And if you're intentionally trying to manage your engineers in a way that keeps them from having time to notice bugs in others' work, you're doing yourself a fucking massive disservice: when people have collaborative ownership of something, they get invested in its quality/growth/feature-set, and productivity goes up. "Don't look at other people's code, just keep your head down and churn out the feature in time, ignore the larger picture!" is a recipe for low-output, uninvested, hard-to-reassign, burnt out engineers. Doesn't matter if it's a team of 2 or 2000.
Did you just say that all startups should never use Python? This is such a ridiculous statement I could hardly imagine where to begin with it.
...and we all know how horribly they failed.
Agree about the monorepo thing though, it just seems like people are optimizing for the wrong things with monorepos.
This comparison makes absolutely no sense.
Being a startup does not excuse this kind of cavalier attitude
Not a single startup was killed by software bugs or design faults. Never. Many other things do.
You have way more important things to worry about in a startup than optimising uptime.
> Same as above. Just build and RTFO.
Deploying without review is not only a development nightmare (if you keep deploying without review, you'll eventually break something or introduce security vulnerabilities, unstable code etc.), but it can also get you in massive trouble with your compliance audits.
Peer review is very important in production code.
A developer shouldn't have time to fix other people's bugs?
Go through this tutorial and gobyexample.com and they should have enough knowledge to work on a project at the end of day three, because the language is so simple. If not, fire your new hires.