Hacker News new | past | comments | ask | show | jobs | submit login
Software Engineering at Google (2017) (arxiv.org)
685 points by weinzierl 4 months ago | hide | past | web | favorite | 309 comments

Buried in the "2.11 Frequent rewrites" section, but a great hack for "productivity via a sense of ownership":

"In addition, rewriting code is a way of transferring knowledge and a sense of ownership to newer team members. This sense of ownership is crucial for productivity: engineers naturally put more effort into developing features and fixing problems in code that they feel is “theirs”."

I’m not convinced this works so great for Google. Some rewrites are very noticeable as a user, and things in the UI frequently shift around for no discernible reason. Perhaps worse, they seem unable to get below a certain level of bugginess in products like Google Maps and Gmail. Perhaps because a new round of rewrites always introduces new bugs before all of the old ones ever get fixed.

Perhaps their metrics tell them that all of this is fine, I don’t know. But then, you have to realize Google has so much money they don’t really have to spend it very efficiently. For everyone else, this approach to rewrites seems like an extremely expensive way to produce software that’s not even all that great.

I’m not convinced this works so great for Google. Some rewrites are very noticeable as a user, and things in the UI frequently shift around for no discernible reason. Perhaps worse, they seem unable to get below a certain level of bugginess in products like Google Maps and Gmail. Perhaps because a new round of rewrites always introduces new bugs before all of the old ones ever get fixed.

Perhaps their metrics tell them that all of this is fine, I don’t know. But then, you have to realize Google has so much money they don’t really have to spend it very efficiently.

The user experience ranges between good to mediocre to bad, depending. Google is simply too rich and powerful to care. So long as the goose keeps laying the golden eggs, they can just keep going along and patting themselves on the back.

As jwz put it, "It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?"

I saw this a BT where on system was rewritten in OWS (Oracle Web Services) used 15 Person Years and around a Million Quid - Not the best use of shareholders money.

But some one got to tick some boxes on there promotion track

Many enterprises are rewriting their apps to the cloud stacks for no good reason, at a great capex and later opex.

Well, but that money was certainly not needed for someone else if it was so readily available.

Like pay rises, funding more roles to allow more career progression or gasp retuning it to the share holders?

If you have engineers with physiological problem of “not invented here”, you have a very serious issue. I am currently seeing this in real time in one of the projects and I was told almost exact same words as “reason” to recreate what we already have and working beautifully. It was clear to me that some developers are just too lazy to dive in to complex system. They get ticked off by one imperfection here and other over there and immediately run for exit shouting “I could do so much better”. Instead of understanding why things are the way it is, they fantasize about how they can one up original authors and claim their own hero title. They go on to throughly underestimate the time to recreate what has taken years of learning. So they spend next many months sweating out, copy as much code from “old” stuff as they can, dropping important feature here and there, adding new and old bugs - very often arriving at more or less same place they started off. Meanwhile competition has moved on to V2 laughing their way to the bank and customers scratch their heads why you are still stuck in same place for so long. Then our new “owners” gets their promos after massive marketing of how much better everything is now. But to everyone’s surprise they soon leave the project because working on bugs and incremental features has became boring and BTW, the new stuff is just as complex as old stuff. New devs roll in and we start the whole cycle again.

You can call it a psychological problem if you want but calling names is not a solution, nor does it really provide a good path to finding a solution. The labor market being what it is, people will leave steady jobs with good pay for more exciting work with riskier prospects and less pay. This happens all the time.

Bug fixes and incremental features will generally not get you promoted for good reasons, we expect senior engineers to have system design skills and there is simply no way to demonstrate that without having your engineers design systems. If you only assign tasks based on the needs of your product and not on the needs of your workforce you could easily find yourself with a critical skill shortage.

There's a certain inefficiency to this, but only if you put your blinders on. If you have engineers churning out meaningless work then you certainly need to address that problem, but if you prioritize short-term product success over team health you are only trading one problem for another.

> Bug fixes and incremental features will generally not get you promoted for good reasons.

This is exceptionally bad, but sadly true.

If you have an engineer who can unblock teams and fix issues in an hour that others take a week or cannot fix at all, they are gonna jump ship if they can't be recognized.

At that point you've lost a valuable resource.

Recognition is stupidly cheap.

If I can do that without it being a fluke, they’d better bump my salary.

Would you settle for a certificate?

My landlord doesnt take certificates.

Or a "kudos"? (page 18 of the PDF)

The fundamental nature of a tech company is near-unlimited appetite for new stuff to build and new people to build it. If there are zero projects on your backlog, such that the only way to do interesting work is to retrace old projects, you’re in serious trouble.

Code rots, even the best code rots, because the people and cultural environment that the code was written under change and fade away. Refreshing it occasionally, even if it isn’t improved in a significant way, is one way to deal with that rot.

Even if the product/library/framework never changed much, rewrites would still be necessary to keep it going as new generations of programmers shuffle through. Otherwise we wind up with a Verner Vinge dystopian culture of software archeology.

Exactly this. I view rewriting a code base, even if I am copy and pasting some things from the existing one a way of either learning or refreshing my knowledge on how the system works.

An added kicker, I have actually gone down this road, and later realized the only thing I needed to do was make a small change to the existing system, so I did just that and discarded the new work.

Business got their new features, delivered on time, the system was still stable and reliable, everyone was happy, and now my domain knowledge has increased significantly so that new features and development or bug fixes will be even faster.

Code does not rot. It gets complicated and starts to look rotten, but that's the hard won complexity of features and bug fixes. To think you can start over and not re-introduce bugs that have already been fixed seems foolhardy.

The underlying dependencies (OS, libraries, runtime, compiler, etc) slowly get out of date.

Out of date dependencies can be seen as code rot, regardless of quality/state of the actual code.

(Not saying that it should be rewritten because of that, but I do consider outdated dependencies to be code rot.)


Trying to make the software "theirs" seems to be an issue at Google, at least with their open source software and has seemed to have lead to it being less reliable.

For example, Angular Material v1 was one of the most complete and stable front-end packages on the market a few years ago. Then, back in 2017 the lead developer for the project was replaced with a new dev.

This new dev then went about assigning every issue and pull request to himself, modifying or rejecting pr's that had previously been approved, closing issues that had in progress pr's as won't fix, locking discussions, and just generally breaking stuff. (I've personally had to peg my project to v1.4 because everything since 2016 has been a regression)

If you go to the Angular Material v1.x Github page today, you'll see the same dev on pretty much everything.

This isn't productive ownership as it prioritizes the engineer over the customer and has lead to a generally broken system from one of the most stable properties out there. Not to mention, these open-source projects are most people's first exposure to Google's code... Having them be unpredictable regarding the functionality of their software with little concern for users/contributors in the name of making their devs feel special seems like a bad model to learn from.

One thing I think that is important to point out is the, “working beautifully” part. I agree that rewriting stuff just because is a waste of time and money. There are plenty of solutions out there that may solve a problem and may be working fine, but have now hit their scaling ceiling and people become frustrated. Rewriting stuff is a part of software, but should almost always be done incrementally.

I also don’t think it’s laziness. Developer hubris is real and it can get in the way of actual business value.

You pretty much nailed everything else. I share the sentiment because I’m currently working on splitting up a monolith into microservices. Most of the “quick wins” are things that didn’t need to be rewritten anyway, and the stuff that actually sucks is difficult to address. It’s tricky to get right.

The only reason given was exactly what was quoted by parent. Also refactoring different than rewriting.

Remember, we are talkin about a bunch of good, well supported engineers working within rather well oiled machine with a lot of other teams doing similar work and sharing experience and advice.

Nobody said it is good general advice. There is so much stuff you need to do well to have a good chance at successful rewrite that it become good advice that it is a bad idea. Nobody said rewrites cannot be done. If you have a company that knows how to do rewrite (and makes it constantly which, I guess, helps a lot) then a lot of problems can be solved by starting from scratch and it it may be it is well worth it.

I know this is a battle as old as time, but the other extreme does exist: a system so complex that it directly attributes to attrition and decay, but too business-critical to get rid of. Yes, it's a "serious issue", and yes, it's common at many overgrown startups.

There are always good technical reasons to rewrite things but the reason quoted by the parent i.e. devs not “feeling” ownership and won’t work on code wholeheartedly unless it’s their code - those reasons are evil.

No. Absolutely wrong.

A sense of ownership and responsibility over a codebase is fundamental and essential to proper stewardship and maintenance of that code, and refactoring and rewriting is the most effective way to inculcate that feeling. Sometimes it’s not always feasible, and sometimes it’s not necessary, but the end state is essential. Group or shared ownership of code is a manager’s wet dream but pragmatically impossible, a swamp of mediocrity.

Once upon a time, there was a core software engineering principle of "egoless programming". I'm glad we've thrown that piece of idiocy out the window.

Bollocks. There are tons of people doing maintenance programming on code that they didn't write.

Does the developer not work on any systems unless they feel it is theirs, or is it a particular system that causes this phenomena? If it is the latter case, then the dev's reasoning is really a symptom of a deeper problem.

If you feel ownership over some system, you feel as if the whole system reflects on yourself, whereas if you do not feel that sense of ownership, then you feel as if only the work you do on that system reflects on yourself. If you feel ownership then, you're likely to be more proactive in your maintenance and making sure that everything is up to standard rather than being reactive and fixing things that break. You want people to feel a sense of ownership, and rewriting a project is a very good way of doing that. Rewriting may not be the only way, of course, and it may not be the most optimal solution given other goals.

It's hard to relate without more information about the specific situation.

What is the current programming language used?

Is the project mostly legacy (few or no changes in the last few years)?

Is the project critical to the company?

How many people are working on it today? Are they fully assigned to it or is it a "touch when it breaks" kind of situation?

Is the project following modern CI/CD practices? If not, how hard would it be to adapt in the current software stack? Would it be easier in a new stack?

Are developers spending more time than expected understanding the code base?

Is it a political situation? Is the current owner refusing changes? Could it be that rewriting it is just a costly way for removing ownership from that developer/manager?

Yes, there are many legitimate reasons to rewrite like dieng frameworks/languages. However we are discussing here the case where reason proposed for the rewrite is solely to satisfy physiological need of some developers to work on code they can call it their own. A lot of things can be done by refactoring but the devs with these issues often just start new projects of their own that does same thing. In companies like Google, for example, there are at least 5 different products for messaging.

Ok, understood. I think the point I was trying to make (and apparently didn't write it down correct thus the downvotes) is that sometimes the ownership argument could just be an easier, less politically charged way to engage the audience. Who would argue against letting some new developer take ownership of something (if they have the skills, of course)? While discussing all those other questions I posed could open a can of worms.

Do you mean psychological or really physiological? I get the former, the latter seems dubious at best.

Indeed, "physiological" would mean developers die or suffer serious injury if they don't rewrite. Seems unlikely.

It’s not a physiological problem, it’s a technical problem. They do it so they work gets easier. It’s much easier to write code than to read code.

> It’s much easier to write code than to read code.

The fact that "write-only code" is apparently considered a part of sensible software-engineering practice speaks volumes about what their technical culture is like more generally. One would think that code should be much easier to read and survey than it was to write.

But even something like a CS101 assignment takes longer to read than to write, at least for me. This is a common enough starement that I assume I'm not alone.

In fact, I'll often sketch out a block diagram or some pseudocode if I'm having trouble grokking something I'm reading just to help me get into the proper mindset. I agree this is a problem with the current state of the field, but I haven't seen any good solutions, only hacks and workarounds.

The whole point of a computer language is so that another human being (or a team of human beings) can read it.

If the intention is for a uniquely skilled human to give a computer instructions, and the computer to execute them - then just write it in machine code.

If you have engineers with psychological problem of “not invented here”, you have a very serious issue.

We need to be precise about what the problem is, exactly. Here is a more salient formulation: "Not Invented Here" is a form of bias that distorts cost/benefit analysis when people are deciding whether to reuse existing code.

They get ticked off by one imperfection here and other over there and immediately run for exit shouting “I could do so much better”.

Given the formulation of NIH as a form of bias that distorts cost/benefit analysis, what can we do at this point? How about quantifying the cost of the "one imperfection here and other over there" instead of going by the subjective being "ticked off?"

Instead of understanding why things are the way it is, they fantasize about how they can one up original authors and claim their own hero title. They go on to throughly underestimate the time to recreate what has taken years of learning.

To complete the cost/benefit analysis, we can then quantify the development cost of the library they are considering re-implementing.

Meanwhile competition has moved on to V2 laughing their way to the bank and customers scratch their heads why you are still stuck in same place for so long. Then our new “owners” gets their promos after massive marketing of how much better everything is now.

This needs to be quantified and put into the cost/benefit analysis.

But to everyone’s surprise they soon leave the project because working on bugs and incremental features has became boring and BTW, the new stuff is just as complex as old stuff. New devs roll in and we start the whole cycle again.

This also needs to be quantified and put into the cost/benefit analysis. Instead of doing cost/benefit analysis attached to individuals, it should be done by product or by project, so that these turnover costs are also accounted for.

A wide gap separates NIH syndrome and a healthy culture of collective code ownership (with the automated testing, shared style guidelines, frequent small commits, and continuous integration that usually go along with it). The situation you're describing sounds closer to the NIH side, but don't throw the baby out with the bathwater the next time someone wants to rewrite someone else's method to improve its readability!

To be clear, "Not invented here" isn't about "don't rebuild things", it's about "don't build things that exist outside of the company".

It's perfectly reasonable (for the reasons in the submission) to rewrite code that exists already.

I don't think it is laziness. Some developers simply can't ever shake off the need to be the smartest person in the room. They have been told all their lives how smart they are. Now among peers, they're ... well ... average and they haven't been given the tools to understand how to cope with that. Some developers never outgrow that, sadly.

> Instead of understanding why things are the way it is, they fantasize about how they can one up original authors and claim their own hero title.

Hell, if they really care about claiming that hero title, why not spend that effort to understand and refactor the parts of the code that were initially hard to deal with. What you wrote above is a distillation of what a "tech bro", "Type-A player", ego-based culture looks like. It is the opposite of true professionalism.

This makes sure there is never 1.0 ever. I think this is one of the biggest mistake in software. We just keep rewriting things that are already doing what they supposed to. Like the Gmail UI. It got rewritten 3 times already and every iteration it gets shittier.

I think you're imagining much larger rewrites.

A system such as Gmail is composed of many smaller parts. If one of those parts was written years prior for a world that has since changed, it may be accruing technical debt as it's continually extended to fit new requirements. An occasional rewrite helps address this type of decay.

Without the rewrite you may find yourself 10 years later with a system that's both critical and kludgy, and at that point the rewrite will be a much larger project.

> every iteration it gets shittier

I like how you state this like it's an objective fact. I've always been happy with the gmail UI and the latest iteration is great too. Outlook on the other hand...

Outlook has always been shit, but at least it has been shit in the same way for the past 15 years.

It is not an objective fact but sampling the gmail users around me gives me an idea. Obviously is it not representative.

This is interesting to me because something like this only works if there are lots of tests and they can be run after every change. If you rewrite code constantly and potentially create new bugs by e.g. not understanding edge cases previous developers put in, then this isn't feasible. With a focus on testing this becomes practicable.

It's common for code at Google to be rewritten without reusing most of the existing tests -- instead, new tests are written for the new code. Yes, this risks not understanding edge cases. But not all of those edge cases are still important.

Maybe a compromise would be that if you rewrite old code you individually go through old tests and have to sign off on deprecating them, and say edge case X is no longer important, so you keep most tests while not using ones that don't matter.

After working with banks that still run production code on obscure and obsolete platforms written by people who retired decades ago, I totally endorse this practice ... as long as I’m not doing the rewrite.

The dance between death by ossification and death by excessive chaos is a delicate one.

Be too quick and you're constantly chasing shadows… wait too long and you're immobile.

Hey, I just wanted to say that I really love your gensim library. I've used it for my Master's thesis a couple of years back and it's been a tremendous help. Thanks so much!

Thank you! That's so nice to hear :)

To get on-topic: Gensim could use such a rewrite as well… The ML world changed, expectations and requirements changed, ecosystem and APIs changed. I changed too.

Thanks for highlighting this. To me it seems an important idea that contradicts conventional wisdom, similar in the way that most people over-encourage DRY, blind to the fact it increases coupling.

I frequently find myself drastically refactoring code to understand it. I don't commit those changes because it's not worth the effort to justify the cleanup to people who treat these rules as gospel.

Apparently me spending half a day reading code is no big deal but cleaning it up is a waste of time.


I think the objections to rewriting code are sometimes justified, rather than just blindly following rules. As you said, refactoring code helps you understand it. This means that you can end up feeling like your code is objectively clearer than before you started, but sometimes it's just an illusion caused by the fact that you just (re)wrote it. If there are other people in the company that already understand that functionality, you're disrupting the effort they put into understanding the old version of the code. Combine this with the risk of introducing bugs or missing some obscure bits of functionality, and there is valid reason to object.

I find this is particularly common mistake by junior programmers (not saying this applies to you), presumably because they aren't used to reading other people's code. Frustratingly, this is often coupled with an attitude that missing out large chunks of existing user functionality is acceptable if it makes the code a bit simpler.

Of course, sometimes rewrites/refactors really are an improvement. Sometimes code really is fragile and confusing, either because of who wrote it or because it has had many small changes tacked on in the easiest places. Or perhaps the last person that understood that code has left the company, so it's OK that you find the code clearer only because you just wrote it! But in any case, it is fair to ask for real justification for a rewrite.

You're talking about rewrites. I'm talking about cleaning up messes left by lazy people long ago, so long ago that most of the juniors are scared to clean it up despite it being impossible to understand without a days reading.

You can say it disrupts others understanding of shitty old code, frankly I don't care. Code is not immutable and maintaining status quo helps nobody except the old guard stay relevant.

I said:

> Of course, sometimes rewrites/refactors really are an improvement

If that's what you were talking about in the first place, fair enough and my apologies.

No apologies needed. It was a fair assumption and I do agree with your points, although I think I've come to move more in the other direction the last few years.

Those perfect one line fixes are great but they approach their limits and eventually need to be refactored. This is natural and should not be frowned upon. Likewise you can't just refactor all the time, one line fixes are faster and less costly in all sorts of ways. The key is to pick and choose when to use each strategy.

I think we're too often looking for simple rules. There are none. Just a bunch of guidelines.

(Genuine question.) Do you think "small changes" shouldn't be "tacked on in the easiest places"?

I'll try to give an example I hope is realistic:

One of the heaviest things an application can get is a complete theme system. Suppose you don't have one. It's not a requirement.

Adding a theme system when there is none is months of work and might impact basically every line of code that displays anything.

So you're not doing it. Now for some exceptional system there is just one case where somewhere you are displaying something under an external widget that doesn't meet its size constraints or whatever - long story short your text is invisible, you want to inverse the font to get it working. You don't have any code like that.

Do you think it is OK to add it to the easiest possible place: in this case perhaps you add an optional argument called "need_to_invert_color" (this awkward phrasing tells you it's a hack) to a single function, default it as false, comment it as: //invert the color of the font. This is needed where an external graphing widget with a black background leaks onto our canvas due to not respecting our pixel boundaries, so that our text displays over it.

And then where you call comment the same thing, that //currently a bug in the widget code makes the widget leak xyz pixels below its bottom border. As a temporary fix we introduce an argument need_to_invert_color into our display function. As of this writing 3 Jan 2019 we are just using it from here. The correct fix would be for the widget to stop leaking instead, and when that is done white text is unnecessary - and we might not notice. So we start by testing whether the area we will be overlaid over is indeed the wrong color.

ETC. In other words a quick hack for a corner case, that doesn't fix the underlying bug (workaround) and even as a hack makes use of something that doesn't exist (a theme system), instead adding and documenting a half-assed thing tacked on.

Of course this is just my opinion, but in many situations I think tacking one small change on to the easiest place is fine. My comment earlier wasn't meant as a criticism of that practice. It's just I would consider refactoring after this happens multiple times: at some point, all those little tweaks can add up enough that the original design of the "core" code gets lost in the noise.

When exactly is the right time can be hard to figure out. Especially because, if enough hacks have accumulated that code really does need a reshuffle, then the job of refactoring is harder, which actually increases the temptation to put it off. But I wouldn't make a big sweeping change just because of one small hack that's a bit ugly.

Thanks, this answers my question. I had always thought documenting half-assed solutions was particularly important not just for anyone looking at the code (to understand the whole code quickly, if they're new to it, make changes, etc - or even if the author comes back 6 months later, to be reminded of the assumptions that led to the hacks) but because whoever finally sits down to architect the rewrite will have some indications of the "hacks" (degrees of freedom) that must be made possible in the proper version. I thought they (or I) would sit down, have all the documented hacks on one side of their table, then use them to architect a nice solution that the hacks can then use instead, kind of like a requirements document.

This was my thinking on a lot of projects. But you know what? That rewrite never became necessary!

So not only did I not start with the "right" architecture - I didn't end with it either!

That's what "many small hacks in the easiest of places" reminded me of and I wondered if you in fact do approve of it. It has always seemed fine for me. Just no problem at all. But with documentation right there and the worse the hack, the clearer the documentation right there explaining and justifying it, up to and including "I don't know why this works but this system call makes the next line succeed, whereas removing it causes the next line to fail sometimes - this is tested in testxzy." Obviously a hack, a terrible hack if you don't know why it works. And a project can end up with a lot of these.

Working Effectively with Legacy Code recommends just this idea. They call it scratch refactoring. Refactor without a lot of forethought to see how the system works, then just throw away your changes but keep your newfound understanding.

This is effectively "it." But I do like to branch off the more useful refactors into their own commits.

I often see some bit of code, or a dependency in a project that I think is over engineered and too complex and start rewriting; first I get something basic working, then I discover some edge case, then another, and another, and eventually I realize that I have reimplemented the original code. It always makes me feel silly, but at the same time those have been the best learning experiences I've ever had.

The proper response to that is to actively document these edge cases (within the original code) so that they don't get missed as the code is maintained in the future.

I do, so I can remember how it works when I see it again in six months. :)

Five people understand the system.

You refactor it.

Now one person understands the system.

Initially the system took 8 hours of reading and 2 days of refactoring to grasp.

Now it takes 2 hours.

I’ll take that.

Take a moment to consider that someone else in your team might actually understand the code in question, and when you rewrite it (and commit), you destroy their knowledge about how it works. Sure, from your perspective, you made the code better, easier to understand. From their perspective, even if they do the code review themselves, they will not understand it as thoroughly as you do, and they will be burdened with both the knowledge of how it used to be, and how it is now. The net result can be negative, and eventually no one understands any code that they didn't refactor in the last ~6 months, because if it's any older than that, someone's rewritten it.

Sometimes you gotta sit down and bite the bullet and read and ask questions.

Writing code is hard, so there's nothing wrong with reading it being hard too. (I'm only half kidding).

Any company that can rely on an ads cash cow and a large, competent engineering team can probably afford to rewrite most of their software periodically. This won't apply to most companies, hence the conventional wisdom.

> team can probably afford to rewrite most of their software periodically

I think Google partially does this in order to keep its engineers happy, as you are more happy when you develop something from the ground-up compared to just maintaining something already built. By going down this route those engineers are kept in a “happy state” so there’s less risk of them flying off to other pastures, where they could potentially build the next product that could “kill” Google. Sort of invisible golden hand-cuffs, if you will. I personally find it tremendously wasteful at a societal level but I can see the value of this strategy for Google as a company.

Like most things, it depends. There are happy mediums between rewriting absolutely everything and making the fewest possible changes. This is true and viable in organisations of any size.

Wait, huh? How does DRY increase coupling?

I mean, I guess the duplicate code/class is now coupled to the two places that use it, but I have a hard time seeing how that is worse than two duplicate instances of the code.

Here is what I've seen (and been guilty of):

Two pieces of code in different parts of the codebase are very similar. They have nothing to do with each other - even semantically. But the code is very similar. So someone thinks this is code duplication and creates a function/class/whatever that both pieces of code can use. Repeat all over the place.

Then one day, one of those two places needs custom behavior. I can either change that function/class and create complexity (have to now support two use cases). Or I can stop using that function/class in that place and go back to the old solution. Sometimes, this is quite a lot of work as aggressive "DRY" leads to a fair amount of coupling - there could be a few layers of DRY'd code there to untangle.

I put "DRY" in quotes because none of this really is DRY. DRY originally was about requirements - not code. No requirement should show up in multiple places in the code base. In this example, even though the code was almost identical in both places, there was little else common. They dealt with different requirements, for completely different reasons. They should never have been refactored to use a common function/class.

These days people keep talking about over-use of DRY, but they're really complaining about overabstraction of disparate code - not the DRY in the requirements sense.

This gets back to the https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle. I should be able to override just the behavior I want to change, rather than permanently edit the shared implementation eliminating the behavior you want.

A better solution is "Just don't couple the two pieces of code!"

It takes a fair amount of work to design a good class where one can easily apply the open-closed principle. And one should where it's needed. But tying together two completely unrelated parts of your code base with such a class just because the two pieces of code are almost identical is the wrong approach. Then going ahead and designing it for the open-closed principle merely adds complexity.

It increases coupling in exactly the way you just described.

Sometimes this is beneficial. Sometimes it isn't.

My point is — more often than not — conventional wisdom is that DRY is always preferable, whereas the reality is not that simple.

Lots of mental gymnastics to dress up the fact you can’t get promoted without shipping new code.

Disclaimer: I work for Google. I speak only for myself.

In my honest opinion, frequent rewrites are by-and-large a disastrously bad idea, for several reasons. If there is one thing I would change about Google, it would be to slow down the frenetic pace of change inside. Rewrites just make the pace of change untenable. And I say this as one who is totally part of the problem: I helped rewrite significant parts of V8, the JS VM in Chrome, particularly the optimizing JIT compiler, TurboFan. (Don't get me wrong--I am not knocking any one specific project, my coworkers, even my leadership, etc). I've been at Google 9 years, and I don't know how barely any of it works anymore.

1. The assumption that requirements and environment around software change so frequently that it must be burned down to the ground and rewritten is a big part of the problem. Why do the requirements of software change? A. Scale. B. Because the software around it changed. Bingo.

2. Rewrites actively destroy institutional expertise. Instead of learning more as time goes on, engineers' knowledge becomes obsolete as old systems are constantly changing and rewritten for unclear benefit. Experts can no longer rely on their knowledge for more than a couple of years. This is extremely bad for critical pieces of infrastructure. In short, no one ever masters anything. This is due not just to incentives but due to change itself.

3. No one ever has time to do an in-depth followup study on whether the rewritten artifact was better than the original. Instead people go on their gut feeling of having rewritten something they often did not write themselves (and did not fully understand) with something new and shiny of their own creation. The justification of the outcome is done, in short, by the people who have a big vested interest in declaring success. (And yes, me too).

4. The idea that the software requirements keep changing around software is promulgated by the exact same people who never spend any time up front simply writing requirements down. Well, no fracking wonder the requirements seem to change somewhere in the middle or years later: they were never anticipated in the first place! We'd do better overall if the industry in general did some good ole requirements engineering. Most people I talk to have never even heard of this. Instead, we never have any time to stop and think about doing things right, but we always find time to rewrite from scratch.

5. Zero incentive to do things right. As a field, as industry, we are actually not very serious about writing good software. Instead, we're just going to trash it after 5 years. So software is constantly bad. But the next rewrite!

The drive for rewrites is mostly a swindle in my opinion. The reality is that some software needs to be shot in the head, some needs to be rewritten, but most software needs to be just maintained. That means bugfixes, performance improvements, scalability improvements, and sometimes, yes, refactoring too. But bugfixes and incremental performance improvements don't get anyone promoted. Even more cynically, but very realistically, "old" software that is maintained by experts means a dependency on those experts, and they end up being expensive. Corporations hate when their employees have job security! Rewrites are A.) driven by an influx of young talent who want to make their mark, B.) incentivized by the promotion process and C.) driven by a corporate pressure (everywhere, not just Google) to make sure that programmers and software are commoditized to avoid dependencies, bus factor, and job security.


So much 3, 4, and 5.

I was so spoiled at my last employer. Leaving was a mistake, but I can't go back because I relocated. Seems like nobody in this entire town "gets it". Seriously.

I have learned this to be true from work on my open source projects. I tried to explain this concept at my previous employer and my boss looked at me like I was stupid. Stupid makes sense when you justify your existence according to story points.

What I have noticed from the travel industry is that backend developers tend to write a ton of original code and rarely refactor anything as though they are scared an improvement is always a regression. Frontend developers tended to not refactor anything either, but then they were scared to write any code at all (note: I am a frontend developer).

Part of this fear was justified because their automation was shitty. Often things were properly code reviewed, but changes would only be accepted with the smallest possible footprint. It hurts when the diff is a static text comparison counting every character, which means a bunch of comments explaining things or white space changes looks like a ton of code changes. It also keeps you from removing unnecessary code or reorganizing things. The killer though were frameworks for everything including test automation in various different flavors for the same sorts of things, which is like writing tests for testing of tests. In this case testing became a block to check off that had little or no real value.

I am wondering if the rewriting rule applies to AdWords and search, I mean they probably would get very upset if their cashcow stops working, all of a sudden.

I think we should not take "rewrite" too literally.

But both, search and adwords, have been rewritten multiple times since their launch.

You can have a look at the papers created by Jeff Dean (1) and Sanjay Ghemawat (2) which mention some of the new concepts/technologies/features used in those products.

1: https://ai.google/research/people/jeff 2: https://ai.google/research/people/SanjayGhemawat

Thanks, very interesting articles!

The AdWords API is now being re-written, the biggest change in ten years. It is migrating from XML to gRPC / protobuffers.


Disclosure: I co-author the Python client library and have written a few of the docs on the site listed above.

Thanks for the link!

Productivity hack or make-work project?

What I don't understand is how they accomplish larger collaborative changes. The paper says:

"Almost all development occurs at the 'head' of the repository, not on branches."

Googler Rachel Potvin made an even stronger statement in her presentation about "The Motivation for a Monolithic Codebase" [1]:

"Branching for development at Google is exceedingly rare [..]"

In the related ACM paper she published with Josh Levenberg there is the statement that:

"Development on branches is unusual and not well supported at Google, though branches are typically used for releases."

I my world when we have to make a bigger change we create a branch and only merge it into the trunk when it is good enough to be integrated. The branch enables us to work on that change together. I don't understand how they do this at google. As far as I understand in their model they either have to

- give up on collaboration and always have just a single developer work on a change.

- share code by other means.

- check in unfinished work to the trunk for collaboration and constantly break trunk.

[1] https://youtu.be/W71BTkUbdqE?t=904

[2] https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

Unfinished work is not typically checked into master (and it's certainly not regularly broken).

What is more common is that very large changes are checked in as a series of individually compatible changes, and often broken up across the repository (there are of course tools to help with this). It's relatively rare for multiple developers to work on a single changelist; it's much more common to break the work into separate changelists.

Haven't worked there for some years now so I'm a bit rusty on some of the detail.

It is amazing to see how we declare something as too obvious or natural. Like trunk based development, so obvious that questioning it makes one a fool :-)

Git and its model was the best thing few years back. Now since google is doing all its dev in the main trunk/master, it must be correct and more intelligent.

Wouldn't it be a case that they went with what they had at a certain time and continue to use it as everyone is used to it and it still works? Not sure if google analysed if branching was bad and then chose trunk based development?

I cannot understand how a company that has a well defined process doing branches, is doing it wrong? or how it is so not optimal etc. I guess it is a matter of processes and culture. None of the great companies are great because their source control strategy (or code) was excellent.

We developers always over analyse everything and come up with excellent logic and some of us are gifted with words more than others.

You'll have to remember that all of those big companies have their tools and processes customized for their scale.

Example: Instead of branching you would just create a `changelist` (a commit, a set of changes to files) and work on that. You can show it to your colleagues. You can build and test it. You can send the id to anyone to have a look at it, or test it themselves. You can have multiple changelists depending on each other, without being commited.

This might be an interesting read for you: https://paulhammant.com/2014/01/08/googles-vs-facebooks-trun...

You can use git forks or whatever for development. This philosophy just says that you only push to production environment from one standard head trunk.

"Google use Perforce for their trunk (with additional tooling), and many (but not all) developers use Git on their local workstation to gain local-branching with an inhouse developed bridge for interop with Perforce.

"Branches & Merge Pain

"TL;DR: the same

"They don’t have merge pain, because as a rule developers are not merging to/from branches. At least up to the central repo’s server they are not. On workstations, developers may be merging to/from local branches, and rebasing when the push something that’s “done” back to the central repo.

"Release engineers might cherry-pick defect fixes from time to time, but regular developers are not merging (you should not count to-working-copy merges)"

You could also gate experimental code behind feature flags which aren't set in prod

I agree that feature flags can be a solution sometimes. The presentation and the paper I linked to in my question discusses this, but they also mention large-scale refactorings and this is where I don't see how feature flags can help.

For example: How do they untangle a wad of code that is large enough that it takes longer than a few days and more than a single developer to get the code back into a state that is acceptable for trunk?

The changes required for this kind of refactorings can be all over the place, regardless of any organizational boundaries in your code. I can't see how changes of this nature can be put behind feature flags.

- build an interface to the code to be changed

- make all code that uses said code use the interface instead

- build the feature switch into the interface

You don't do that - you lots of small incremental refactors. I think Google also have tools to make a large refactor into a small change.

(i.e. you create the new API, migrate stuff to use it, deprecate the old one, migrate the rest, retire the old one).

Try like hell to use dependency injection in the first place.

There’s a wealth of reading available to you if you look up “trunk-based development” as a keyword. Likewise with “continuous integration” (the actual practice, not the build tooling). Jez Humble for instance has written extensively on this.

I know about this but haven't found an answer to my question. This was in part my motivation to post this question here.

You develop features behind compile or runtime flags and keep it off until it's ready to ship. This is what chromium.org does so that might be a more accessible way to see it in practice.

Yhere are a bunch of ways to land code that doesn’t get run yet, by adding new functions or default parameters and conditional logic.

You don't have to check in unfinished work, you can break your work in parts and make people work on independent parts, every one of which moves progress forward incrementally.

Well, you could actually call that "unfinished" because in the beginning the code doesn't accomplish the task, but progressively it will become more useful.

> you can break your work in parts

You can when you can but you can't when you can't.

I 100% agree with you that we should work this way whenever possible and we should work hard to keep our code in a state that lets us cleanly divide work. In my experience it is not always possible to split up work that way. Think of untangling dependencies of a larger part of the code as an example.

Sometimes you think you can't because you either haven't learned the right tricks, or because it takes more effort and thus you opt to fork off your work in a separate long standing branch in order to optimize your development velocity at the expense of possible surprising costs during merge (if other people also make your same choice).

Other times it's genuinely necessary to make a long standing branch. In those cases, you just do it. Trunk based development should not be a dogma, just a different default choice.

I found this for you:


I only scanned through it, but it seems similar to the de facto way of doing things before distributed version control systems became popular (in the late 2000s?).

I think you mistunderstood that link is arguing for. It's basically github flow with tagging on what you release, only goes into a bit more details and discussed alternatives and suffers a "bit from too much information"


Is a cleaner and more obvious guide.

The idea is you have a constantly usable master, and your branches should be short lived so you don't hit a brick wall trying to get reviews and merge on your massive change sets.

Ultimately it means you want to test and review your change before it goes into master as opposed to creating "production", "staging" and "develop" branches, which largely just kick the can down the road and is a different way to solve that "what's deployed where" issue.

Thanks - I should have read through it more.

I know about this and it doesn't answer my question. This was in part my motivation to post this question here.

None of the other replies try to explain specifics of how this works, so let me illustrate an example of two teams collaborating to add Feature X to the monorepo without branching:

1) Team A checks in their code to provide Feature X. Their code is not used anywhere in the codebase yet, however full unit test coverage exists for the public API; this is required for code review.

2) Team B checks in their code to turn on Feature X in their product, gated under a command-line flag which by default uses the old behavior.

3) Team B checks in an integration test that flips the flag and makes sure everything works as planned.

4) If Team B requires changes to Feature X to get expected behavior, they communicate those changes to Team A and someone from either team (using available human resources) makes the changes.

5) Team B checks in a small change to flip the flag by default.

6) Team B monitors their product. If things go awry, only the very latest change is reverted and repeat (4).

7) Once stability is achieved, Team B checks in a change to remove the flag.

I've been at FB for a few years now (similar style) and this model of a monolithic repo, no branching, and simply submitting 'difs' (a list of changesets / patches) which get merged directly into master after the 'diff' is accepted and you land it seems much easier to me. Maybe it's just because I got used to it, but now whenever I have to touch it I find the branch-based development confusing.

It might be a practical thing. I've heard from a Googler (a couple of years back) that getting changes in can take ages, and by the time the change lands, there's a good chance that there are merge conflicts, and the cycle starts over. Branches would make this even more painful.

Depends on the size of the change. Small changes are preferred in most trunk-based-dev companies.

FYI: There are (multiple) tools in Facebook and Google which are an abstraction on top of their VCS. (e.g. which feels more like git, where you can work on a stream of changes which depend on each other without actually pushing anything to head)

Branches actually make this much less painful. Just reverse merge from the trunk back to each branch on a frequent basis.

Can you clarify why branching helps collaboration? In other words, why is it harder to commit to trunk when you have several developers working on a feature?

Branching enables me to share unfinished work with my collaborators. Sometimes I don't want to commit to trunk yet but still share code and collaborate on a part of the code base.

If they're using Perforce one can "Shelve" a CL (changelist, similar to a commit in git) to make it available for others to unshelve. This can be used as a workaround, albeit limited, to share work-in-progress stuff.

So you can prepare a change for trunk, and share the change with your colleagues to collaborate on.

Don't see why branching should be needed for this.

That's what I meant in my question with "share code by other means". It works but in my opinion it is a large pain and I can't believe people at Google work by sending patches back and forth.

It's not like we (I am a Googler) email patch files around. Everything is integrated into the system. You create a CL (change list), it automatically gets a number. People can review it, test it, or fork it (make a new CL using your CL as a starting point) as much as they want, all from that CL number.

So basically it's a branch?

Think of a CL like a Pull Request that has (and can only have) a single commit.

It's visible in code review UI, has a description, has tests run on it, it can be merged by other people and it can be referenced from anywhere. Eventually it's merged into the head or dropped.

So this is identical to squashing and rebasing your feature branch?

Technically yes, but people don’t use it and think about it this way. Changelists are supposed to be small, couple hundreds of line changes at most. You don’t develop a complete feature of thousands of lines in a single CL, that would be insanely hard to review. What happens is that work gets split into small chunks, and each one is submitted separately, not to feature branch, but straight to head.

This sounds identical to our workflow with git for all practical purposes. 1) New story gets a branch. 2) Branch gets squashed and rebased on most recent master before PR. PRs are generally under 1,000 lines changed. 3) PR is merged to master after code review.

It's missing a lot of what people expect in git branching, like history within the branch and arbitrary digraphs for forking and merging.

If every branch was always merged back into head before doing anything else, and always had its commits flattened into one, and someone forking off of your branch was basically opening it up, copying the changes in your clipboard, and pasting it into a new branch with no attribution or history, then sure.

No, it's more like a patch. In branch, you drag with you all the dependent changes, while with patch, you have only the actual change, and information which CL -- or PR in github terms -- is dependent.

With branches, if someone updates the branch you depend on, your work is based on stale stuff, and it can get ugly. Just try to do it on github :-)

Functionally, this may be different but I am struggling to see how conceptually this makes any difference in the development process.

What makes this different from a branch?

It's actually really easy to create a "patch", so people usually create small "patches" and send them to people if they need any feedback on those.

A "patch" is actually just a commit (actually a changelist) which can be viewed, commented and edited in the browser based code review and IDE tool.

Imho I find it much easier to get an url of a "patch" and comment on it inline, instead of having to checkout a branch etc.

If you have a question to a specific example I'm happy to answer it in the way it would've been done within Google/Facebook.

Thank you, I appreciate your effort to help me understand this better and our exchange helped me to make progress.

One thing I infer from your answer is that it seems that there is an established process and dedicated tooling for working with patches at Google. I think a lot of my pain with patches stems more from the lack of process and lack of an agreement on formats and standards in my environment than from the use of patches per se.

Where I still see an advantage of branches is that they facilitate documentation of what has been done by whom and when. All of this documentation is in the same place and form as the documentation of changes in the trunk. It all is in commit messages whereas patches are only documented somewhere else, possibly in the Email or IM used to send the patch. Even if most of the branch documentation does not survive on trunk when we squash the final merge it is still there and easy to find as long as the branch doesn't get deleted. When I want to look up why I applied a certain patch I'll have to dig through my messages. I think that makes it harder to work with patches than with branches.

I think what you are missing is that Google's SCM has different concepts and terms than Git.

Google's system is derived from Perforce, which has the concept of a changelist (think: commit), which can be "pending" and stored on the server for review/cloning by other developers: https://www.perforce.com/perforce/doc.051/manuals/p4guide/07...

This allows you to share work without (in Git terms) pushing to master. Branches in Perforce-like systems tend to be more heavyweight and permanent (IIRC you have to branch an entire path of files, it is not the same as the Git concept of "branch" which is just a commit that points to another parent commit).

You can think of the system as enabling you, in Git terms, to create pull requests without the creation of an underlying branch.

A "patch" in Google/Facebook/Twitter is the same as a commit. It has a (mostly) descriptive commit message, references to bug tickets and might contain links to documentation, screenshots and mocks.

You basically work on a "patch" (changelist), get feedback from others and send it out to review at the end. Before you can submit (commit) it, you'll have to sync to "head" (to have the latest changes) and run all tests. ^ most of this happens automatically, and as most changelists ("patches") are small, this happens very fast and async in the background.

FWIW coreboot is an open-source project that uses a similar style, where you need to upload your change to the review tool (https://review.coreboot.org/, which is using gerrit https://www.gerritcodereview.com/) and people comment and LGTM in there and then it gets committed to the master branch once everything looks good.

Your changes aren't sitting on your machine. They are hosted on a server or in a git fork. After test and code review, you merge to head before deploying to production or other people make follow-on changes.

Make the changes smaller and more valuable and do it on trunk.

This should always be Plan A but in my experience it is not always possible. Think of untangling dependencies between a large number of components as an example.

Untangle one dependency at a time.

I think they don't branch code because perforce branches are horrible and merging them back to HEAD is extremely painful process for monorep.

Even git or hg branches are horrible. Once you have multiple people working on the same codebase and touching the same files it is pretty horrid to manage. I know several companies not using branches because the merge conflict resolution takes too much time.

The PDF explicitly calls out the time consuming part:

"Almost all development occurs at the “head” of the repository​, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes."

Rebased branches in git are nice (for one developer only unfortunately), there are some pain of course but when rebase is performed often (once a day) it consumes not so much of time. The real pain begins when some huge commit is pushed to HEAD but it still manageable.

Anyway, I feel sad that so much efforts were put on really nice VCS concepts and almost no one use them in enterprise development.

Google havent been using p4 since around 2013

Piper interface and workflow is heavily influenced by perforce.

Except it’s now highly scalable that is taking care of branching performance. Nothing technically making it difficult to branch afaik. In fact rapid (grape) used it pretty heavily to track rollouts if I remember correctly.

So, use a better tool?

Or, don’t branch. Is branching so essential?

Isn't it essential for mental organisation? How do you think about what's different about a set of changes without some sort of DAG?

They are grouped by linking them to issues in the issue tracker. All commits will then get a link to the issue and the issue gets a link back to the commit. This way you can easily track and read the full context of old changes.

Example issue, note that public ones are not associated with commits: https://issuetracker.google.com/issues/122326181

Just do one thing at a time? Today, I am working on X; my commits are for X, and details are in the commit message.

That breaks as soon as you have to interrupt working on Nice To Have Feature X to working on Inportant Bugfix/CVE Y.

How often does that happen to an individual developer though?

Once a month? In an averagely well run company even that may be towards the higher end.

Should your entire development strategy be based on a once a month occurrence?

> Once a month?

Closer to once a week for me.

Make one change at a time.

If you don't want to be stuck while a colleague does changes that conflict with what you're changing then yes, you need branching.

You don't need branching for this.

If 2+ people are working on the same file which might result into a conflict, you can either: - handle the conflict as soon as you merge your branches somewhen in the future or - handle it when trying to commit your change to head

Only difference is whether you handle the conflict now, or in the future.

You need to communicate with your colleagues to coordinate your work.

What do you mean? Branching is not a solved problem once people are editing the same files.

Git is actually pretty good at automatically resolving conflicts within files; unless you edit the same lines, it’s easy. If you do edit the same lines, merging is pretty straightforward.

This whole conversation the last day or two on HN has been kind of nuts. Like everybody agrees you shouldn’t put all your code in a single file, right? Why not? It would let everyone see all of the source code in one place! But it would be huge and hard to avoid conflicts. So we split things into files. Then “trees”, etc...

Basically it sounds like googles monorepo is really a bunch of repos glued together with changes in one triggering changes in others. The difference, it seems, is that google does not get to benefit from the things OS developers like about git. It’s like google developed custom versions of GitHub, circleci, and other tools and are marketing that as a better solution (just build several billion dollar solutions to manage your monorepo!).

And even after all that, google has a bunch of separate repos for important open source or secret work.

Sometimes one-line commit could cause merging conflict that requires hours of communications. Or even days in distributed teams.

Sure, so here is a challenge for you.

Original file:

a = 1

Contributor X:

a = 5

Contributor Y:

a = 0

Both contributors created a pull request and submitted it. In the description they both state that the new value should be the one they put it. How would you resolve this issue in a timely fashion making sure you do not take down a service accidentally and do not slow down development too much. I intentionally gave you a very simple example but if you want we can go into rolling out new features, fixing security bugs and a lot more where such issues arise. And, no git will never be able to solve these issues.

I don't think that HN going nuts (except few zealots) and these problems come from the nature of software development in general. We have seen how Google solves these (monorepo, custom CI/CD, etc. etc.) and there are other companies solving it different ways (maybe have a branching model, using Github). People are just putting out here they experience and based on that and they level of understanding the perceived solutions.

What does google do? You’re working on a line of code and the trunk changes. Your local no longer aligns with it. You have a conflict.

Someone’s changes get committed first. That’s a business decision, not a code tooling one. Second pr has to adjust. Same on both mono and poly repo, just using different words.

At least branches let you have the choice, which cannot be said for branchless.

I think there's a bit of confusion about what exactly Google is doing. IIUC they use (and develop) gerrit.

Effectively this workflow means everyone is working on the same branch and the first commit to pass review gets in. The next guy will have to rebase.

In the end one wins as you said, but the level of detail is rebasing individual commits, not merging entire branches.

So this ends up being the age-old rebase vs merge discussion.

So, the thing that is still buzzing in my head now and not mentioned in the article (maybe I didn't read carefully enough), what actually gets released into prod after a change is reviewed and merged.

If the monorep contains let's say five different products and in a day only one of them gets a merge, then Blaze still builds all five and all five are released (based on successful integration testing)? OR only releases the changed product (and any other ones which depended on the changed one)

EDIT: Also, the "canary" server is still for testing ? There may exist practically a set of canaries running very different versions ? Is there any correlation or any version "roll-up" constraints between various canaries ?

Releases are controlled by each team. For example, my team has multiple binaries that run in production. Some of them are stable and don't receive any active code changes but are pushed every time we do a release. But for those that don't get direct code changes, they will still pickup any changes to shared libraries.

Canaries are live/production traffic only. When a release is deployed, it goes to the canary instances of a job first, and it will take a small subset of traffic. This allows the job owners to see if the new binary has any adverse effects before rolling out more widely. More details about canaries can be found here[0].

Once important thing to think about with Google's source control is that it is closer to SVN as far as versioning goes. There are no (for the most part) feature branches or anything like that. Everyone is always working on HEAD. when you do a release, you will start near HEAD when cutting the release.

[0] https://landing.google.com/sre/sre-book/chapters/testing-rel...

>If the monorep contains let's say five different products and in a day only one of them gets a merge

Releases are done by each team. Look into blaze/bazel. It allows me to say "I depend on these sources only". So a day with only one change, you might only build and release the changed artifact (in practice this never happens).


More mature teams do a lot of complex stuff. Large teams may have multiple stages of canary, multiple canaries, feature experiments, etc.

Like kyrra said. Releases are handled by independently by each team. Teams are in various states of maturity in relation to their release practices. Some are fully automated. Some involve manual QA. I've supported teams across that continuum as a SETI at Google.

I'm not as well-versed in canarying though I've set it up for a team or two. I've only ever seen a single canary version for any particular binary.

Canarying is done. I haven't seen canaries running multiple versions of the same binary. Though teams will often guard new behavior behind experiment flags.

Thanks for all replys above (Kyrra, Joshuamorton and ASinclair), these have been very helpful.

The protocol buffers stuff seems pretty cool. At my last job (small web dev shop), we had constant headaches over the class definitions of endpoints changing and some js file not having its model updated to correspond. I thought we should be writing XML files that both ends could be reading in - we never got up the political will to make that big change though.


Yep, protocol buffers are like a cross-language type system for the data that moves between systems with the side benefit of compact serialization. They're awesome and definitely a big productivity boost above a certain scale.

Something like this would have indeed saved us a lot of time in the long run. Oh well.

I was not a big fan of monorepos until I work at a large company that use them. The use of a monorepo and protocol buffers make communication between systems soo much easier.

I was the maintainer of a third-party library used by thousands of dependent applications at Google. I have to admit, I still have not seen on the outside a system that allows me to change the version of numpy, and know that thousands of dependent applications either work or break, within an hour of making my change.

Being able to write and use a mapreduce with a high level of confidence that my code would continue to work years later was another nice benefit. MRs I wrote in the first year at Google still compiled and ran with minimal changes almost 10 years later(!) which is amazing given the amount of environmental change that occurred.

That said, somebody could still halt development across the company by changing and checking in a core file (like proto defs for websearch) without testing.

Whatever social system led to google3/borg and the amazing productivity associated with it, it was a special moment that hasn't been replicated many times.

Are you allowed to share if the internal repository a custom version control system or is it one of the open source ones?

it's already been published. Google used Perforce for at least a decade and then cloned the perforce wire protocol when it was clear perforce wouldn't keep scaling (there's a great presentation online about google using RAMSANs to store the perforce index), but backed the repo in something like bigtable or spanner.

I'll give google credit for one thing: it can change backends a lot without too much user visible pain.

>I'll give google credit for one thing: it can change backends a lot without too much user visible pain.

Amusingly, this comment may explain how someone in this thread can be incredulous about the idea that things get rewritten every 2-3 years (which yes is an exaggeration).

Google is very good at making sweeping infrastructural changes (generally improvements, I might add) without significant user pain.

It is, like most of the tools in Google/Facebook/..., a custom solution. These companies need their custom solution because of their scale.

It begs the question : are big companies like Atlassian, GitHub or say VolksWagen required, by their scale, to have a slick solution in order to perform on the international market?

People who like multirepos are always saying how easy it is to pin dependencies but like you I haven’t seen anyone doing it right since I left Google. The monorepo third-party system works well in practice.

Ps thanks for getting scipy into third_party all those years ago.

And for anyone that wants to learn more about how Google manages third-party code, all of our docs are public at https://opensource.google.com/docs/thirdparty/

> The monorepo third-party system works well in practice.

It's worth noting that this is only viable at Google because they don't use git. Git's insistence on every client having a full copy of all history of every file in the repository makes monorepo much more expensive.

I see conflicting reports over whether google use Perforce or something proprietary called "piper"?

> Git's insistence on every client having a full copy of all history of every file in the repository makes monorepo much more expensive.

You might be happy to know Microsoft created a Virtual File System for Git[1] [2] so you do not have to have every file checked out in your working directory. Microsoft uses Git in a monorepo (for Windows, and it's 2.5 million files/300GB[2])

1. https://github.com/Microsoft/VFSForGit

2. https://vfsforgit.org/

This paper has details and history regarding Perforce and Piper: https://ai.google/research/pubs/pub45424

They used to use Perforce and then actually outgrew it (!), implementing an initially backwards-compatible backend called Piper.

(It may still be backwards-compatible, but it's been years since they turned off the last real Perforce and I haven't worked there for years myself. So it may have diverged.)

I used a git wrapper for google3 repo. It wasn't great. There are a number of semantic differences between piper/perforce and git that made it awkward. Especially code review- git doesn't handle code review well (I still find this to be an issue with github and other sites that have code review). but it was not an officially supported solution and I believe the replacement for it is based on another DVCS, Mercurial, for some silly software engineering reasons I don't like.

I used the git wrapper for a few days until I hit a day-ending `git gc`. That's when I knew git was terrible.

Afaik it's a custom system with download-and-cache-on-demand.

Note that there's nothing forbidding you from writing a virtual git filesystem that fetches objects from some centralized repo as files are open()ed. Git on cloud steroids.

Microsoft actually had done it and pushed the whole Windows repo into git.


AFAIK they used perforce in the past and then built their own later.

FWIW, there is [used to be?] a git interface for the centralized monorepo. Really handy for managing multiple dev braches at the same time.

Vfsforgit let's you do that

Pinning dependencies is easy, and gets one stuck on Java 1.3 and IE 6. Since the library maintainer doesn't know it broke dependent code, the dependent code will be unlikely to have a smooth upgrade path. It's just a matter of having the luxury of picking up a time to pay the cost of dependency upgrades. Which cost may be high in a world where your dependencies also pin dependencies, likely at different versions.

The underlying assumption it that the project will fold in 2 years anyways, so one may get away with never doing dependency upgrade.

That’s a good point. Do we have any solution from companies who use git/multirepos for their libraries? Is there anyway to compile refactored libraries against existing code?

For various public package managers, there's tools that automatically send e.g. a GitHub PR for an updated dependency file when a dependency updates, which then is tested by CI. I guess something like that could be made for private repos and test versions of packages too, reporting back to the authors of the dependency.

What’s really strange is how many people in that recent monorepo discussion were advocating for it in the context of a small org. I’ve only encountered monorepo in a large org, where I’m pretty convinced it’s more efficient than the alternative.

I think the problem is more just a reversal of perspective. It’s up to each different app to detect and respond to a breaking change from a dependency.

The idea of wanting to make a change to a third party library and then see all the downstream consumers who would be broken by that change if they updated to start consuming that change is an incredibly stupid thing to want, and it’s no measure of success whatsoever to build something that gives you that information.

It’s like the most giant case of coupling you can imagine (letting the statuses of thousands of consumer apps act as any type of constraint on the developer choices of the third party library, as opposed to all those consumer apps opting in to changes on their own terms by updating their dependencies).

Imagine if I have shared a bunch of copies of my resume with a bunch of recruiters. They are out there selling me as a candidate or whatever. Now I decide I want to change my resume, but I don’t know if it’s going to upset the approach some recruiter is taking.

If I can’t update my resume unless I first consult a big oracle that tells me which recruiters will be negatively impacted, that’s a problem, and not at all some type of live-with-able “customer service” positive thing. It’s just plain old bad coupling.

Creating such a system that could automatically diff the old resume’s usage constraints against my proposed changes would be a gigantic waste of time. The exact opposite of something to celebrate.

I say this as someone who routinely writes in-house software libraries used by dozens or hundreds of other apps, various teams, and even a few that are open source.

The primary thing gauging the health of our development is that we are decoupled from any consumers. We are free to make whatever changes we want, and whether downstream teams would like to receive those changes is wholly an opt-in process with versioned dependencies and easy rollbacks controlled by those consumers.

I agree in principle, but in practice its important to remember that your package only exists to be consumed by its dependencies. That is its sole purpose. If your changes aren't serving those consumers, then they're the wrong changes.

It's also useful information in the edge case where consumers are relying on undocumented or unintentional behavior in your package.

Yes, you don't want a hard-constraint of no-breaking changes ever, but knowing immediately when a change is breaking change (especially if you didn't intend it to be) is useful.

When you say,

> “in practice its important to remember that your package only exists to be consumed by its dependencies. That is its sole purpose. If your changes aren't serving those consumers, then they're the wrong changes.”

I agree completely and that’s exactly why you want downstream consumers to opt-in to your changes.

As the library writer / maintainer, nobody knows better than you how to implement the behaviors downstream consumers want. Sure, those other folks know what they want, but are not at all a trustworthy signal for how to solve it for them.

If you are constrained by what breakages your new approach would introduce, this is backwards, exactly from the “in practice” perspective you described. That means you are not able to actually solve your consumers’ problems, create new solutions, refactor old bad ways of working, because you are coupling the what with the how.

The fact that you are beholden to your consumers is all the more reason to decouple the development process from the delivery process. It only makes this idea of wanting a big oracle to tell you what would be broken not because of functional incorrectness but because of a consumers lack of accommodating the new changes all the more egregious.

"Individuals and teams at Google are required to explicitly document their goals and to assess their progress towards these goals"

This seems attractive for other large organizations. Any positive or negative experiences from readers?

On an individual level, this feels like a box-ticking exercise at the company I work at. Project goals are hard to set out a year in advance as a developer as there's a good chance that within 6 months the direction is changed yet again from product. Personal career development goals are not really considered.

When it gets tied into career progression, you end up with not very productive goals (but easily measurable!) like "Reduce eslint warnings in legacy project X by 50%", because business value are either not easily measurable (how do you quantify better knowledge of the overall system architecture? Bugs not caused? But how do you know the developer didn't just stay in their comfort area/have easy projects this quarter/year? I saved Joe 4 hours on Friday since he didn't have to investigate what the system does with foobars as I had the answer? That just sounds petty. ), or not directly under the developers control (revenue).

When it gets tied into career progression,

I've come to the conclusion it's folly to pursue any one companies career progression maze. Because you get corralled into all sorts of sillyness like this; vying for projects with your peers, acrimonious code reviews, chasing silly metrics, etc. I find it's much more effective in time, money and title (and work life balance, and mental health) to simply switch jobs for the higher title.

In other words don't fall for the "work your ass off for a possible future bump in pay and title" game many companies play.

Or find a good company. I've worked hard and gone from mid level (actually slightly lower) to principal over several years at the same, growing company. My responsibilities have drastically changed over time towards greater impact and my pay is nearly 3x from where I started.

The same could simply be accomplished with good people management. A good 10X people manager is worth their weight in gold.

> A good 10X people manager

I've worked for a few. In my experience they end up getting pushed out by the bureaucracy after a few years.

I had this when I worked at a large enterprise several years ago.

If I recall correctly we had to state goals for each of 5 major categories and another set of 5 supplementary categories. The categories were things like - deliver customer value, enhance team work, collaborate well across teams and various other enterprise buzz words, I forget exactly..

As a coder, my goal was pretty much to write good code and avoid sitting through pointless meetings as much as possible (a very hard task in that place). I would basically have to spend half a day coming up with various different ways to word this so that it would fit each of the five goals.

Every six months (one mid year review, and then the final review) I would have to meet with my manager to provide evidence that I was achieving my goals. That would be another half a day twisting words around to try to fit what I had done to the goals. My manager would have to do the same for me. I was then scored on each of the goals. Then the score was totalled up and was used to determine the salary increase that I would get at the end of the year.

We all despised the system. A collective groan would go around the meeting room when it was announced that it was review time again.

My first "goal-setting" experience (five jobs and 20 years ago) was like this: upper management would define company-wide goals (that ended up being incredibly vague) and their direct reports would define their own goals that supported the company-wide goals, and their direct reports would define _their own_ goals, ad nauseum, until it finally trickled all the way down to me, the lowly programmer. So I took it seriously, read my managers goals, his manager's goals, his manager's manager's goals, all the way up the chain so I could try to define some goals of my own that a) I thought I could actually achieve and b) supported everybody else's goals. I had a lot of things like "increase unit test coverage" and "speed up build times" sort of things in there. My manager reviewed my proposed goals (remember, we were supposed to be defining our own) and rejected all of them, giving me a set of completely unachievable and meaningless goals - mostly related to the "flavor of the day" project that I was already waist-deep in. Things like "reported bugs are down 30%", "my peers consistently rate me as a solid team player" and "project fleebleflub is in production and is consistently producing $100,000/month in revenue" (I'm not kidding). Back then I was young and naive, so I argued with him that these goals were effectively impossible for me to achieve alone and he said, "well, these are 'stretch goals' and that's good". Of course, two months later project fleebleflub was cancelled, I was redirected to a troubled project with tons of bugs and my peers hated me because I was always turning away their requests for help because I was frantically trying to meet my own goals. Performance review time started to loom large and I was starting to have an existential crisis - I had printed out my goals and pinned them to the wall of my cube so I kept them in mind and I knew that I hadn't come anywhere close to achieveing any of them. I was five years out of college and starting to panic: my degree was in computer science, so programming was the only career option I had, but it was starting to look like I was terrible at this. I updated my resume the night before the performance review because I knew for sure I was going to be fired. So I went into the performance review, my manager looked over my goals, said, "well, let's see if we can get this up in the next six months" and that was it; I got a "meets expectations" and kept working there. Lather, rinse, repeat for every single song-and-dance goal-setting/performance-review exercise I've ever been through.

Sounds like that was a classic case of confusing "lag measures" with "lead measures." I'd suggest to anyone in a situation with a manager who tries to over-focus on lag measures to explicitly bring up the concept with them.


That's unfortunate (and all too common). Goals should be meaningful to you and the company and be few. Achieving them over time should feel good. And reviews should be an opportunity to show off a bit and learn where you can improve a bit.

Mostly negative experiences:

1) While writing up the goals, you don't have a full view of the problem. Goals change, but once written down, there is a strong pressure to implement what has been written down.

2) It selects for people who are good at writing convincing design docs. Often these people write sub-optimal code and the designs only look good on paper.

Actually, the products (aside from search+ads) that come out of Google look exactly like the have been produced using this methodology; and that's not a good thing.

> people who are good at writing convincing design docs. Often these people write sub-optimal code and the designs only look good on paper.

In my experience, the ones who write good design docs are the ones who write good code.

Design doc writing is not simply overhead and marketing - it is concisely describing what and how you want to do something, and inviting feedback and other ideas.

The exercise of writing a good design doc brings you through the process, thinking of every non-trivial aspect.

It also typically only takes a day or two (or maybe a week for something more complicated) - far less time than the corresponding code takes. And if a colleague points out something that could be done better, you won't have wasted weeks or months writing the wrong code - only hours writing the wrong design. Much less costly to fix, and much easier to move on from, emotionally.

Most large companies operate this way. Its great as long as the powers that be actually action these assessments and validate confirm the assessment with other members of the team.

My experience is that the difference in raises between an employee that got an “Exceeds Expectations” and one that “Meets Expectations” isn’t significant enough to be worth wasting the time worrying about it.

The best way to make more money is to change jobs. Google may be different.

10s of thousands of dollars per year difference in bonus, stock refresh, forget about salary. From personal experience having been at both meets and exceeds as a senior engineer.

How many 10s?

If you are at an appropriate tier company for your skill level/aptitude then you likely will have to work a lot of extra hours to be (and to be seen as) an 80th percentile performer.

You won't know if that time investment gets you anything until the end of the year. You won't even know what the bonus pool will be or if you'll even have the same manager by the end of the year.

All for an extra 10-30k a year (pre-tax)?

Your time is probably better spent building your reputation in the industry and trying to get a higher paying job.

That’s what I expected and why I left a lot of wiggle room in my post between my experience and reality at FAANG companies.

Google is indeed different. While you might not find the delta in compensation for one performance review cycle to be worth the effort, employees with a track record of exceeding expectations can expect to come out significantly ahead of where they would be if they had only met expectations.

Well, if you exceed expectations, doesn't that mean you didn't set your goals high enough?


But working at regular non SV companies, I’ve learned just to focus on doing as well as possible while still having a sane work life balance, keeping on top of industry trends and job hopping when my salary and the market were out of whack.

Google pays such above market salaries, the strategy would be different.

At Google you are expected to exceed expectations.

But seriously, you are evaluated against the role description (software eng? product mgmt?) and your level. If you exceed expectations consistently over several review cycles, you are encouraged to apply for a promotion.

The goal is to get you promoted into a role and level where you can consistently meet expectations.

Specifically the opposite. The peter principle implies you'll make it to a point where you flounder and can't manage. This is the opposite: You make it to a position where you do well, but aren't spectacular (compared to your peers in the same position).

Roles/levels are calibrated so that expectations at L+1 are generally speaking aligned with strong performance at L.

Ideally expectations are relative to your position at the company, not what people think you can accomplish.

Ah yes! It's not enough if you're great at your job, or even if you do other people's jobs...

Instead you have to have this checklist of your quarterly goals, on which you can go through with your engineering manager on biweekly 1-on-1 meetings! And of course you should make a nice spreadsheet and a confluence page documenting your progress, since we're data driven :)

Did you fix a major fuckup in some legacy component? Where are the numbers? Ah, then it's not visible enough for a promotion, here's your 1% raise instead. Do you see Paul over there? He made great progress this quarter! One of his goals was to write a blogpost each week, and guess what, he did! He's on a great growth trajectory, and well deserves his promotion.

Long story short: it's a great way to drive away your best talent while keeping the confluence page and blogpost writers.

> Did you fix a major fuckup in some legacy component?

What if that legacy component was no longer in use? Is there some bigger picture in play?

I've been at bad companies before, so I understand being cynical. But, having people just do whatever they feel like also does not work. From the outside it looks like Google has quite a bit of this, so they are clearly trying to get people on some path. The messaging app situation shows it hasn't quite worked yet (from the outside anyway).

>What if that legacy component was no longer in use? Is there some bigger picture in play?

Said legacy component was (and is) making a substantial chunk of the companies revenue.

> What if that legacy component was no longer in use?

He was actually assigned by his manager to investigate the problem. Source: am a programmer.

All of life is a series of negotiations. Those who can negotiate more effectively come out ahead. This is true whether you are discussing your compensation when joining a new company or whether you are hashing out what you work on.

If your manager asks you to do something, ask them how the thing they are assigning will help advance your career. Ask the manager how -- if you deliver on what they ask -- they will go to bat for you when they are sitting in the room with their peers justifying your evaluation score. Be willing and able to simply state, "I don't know how you can expect me to spend my time and energy on something that won't help me advance my career." If your manager can't understand or respect that, then that's great! You have a clear warning sign that it's time to fire your manager.

It's OK if we lose people who don't care about contributing to the organizational goals and coordinating with teammates, who just want to mess around on whatever amuses them.

W-hat? Where did you get that from?

I'm telling you, fixing a serious issue in a legacy component is anything but amusing. I'd be happy writing blogposts about the current framework of the week, but if I uncover an issue while working on my regular tasks, I'm gonna try to fix it instead of sidestepping it.

We do this at a smaller scale (~1500 employees) with success. It enables everyone to work towards a goal and to discuss with their superior any issues that get in the way, as well as every success story along the way. Both are very important parts of the journey.

Could you bring a any examples of individual goals? How are they worded for devs and how different they are from the product goals?

It's not easy to list goals from the top of my head as it's not my strong side, and especially not if they have to be measurable (which good goals should be, rather than your boss evaluating whether or not you did something), but here goes:

Goals could be "Pick up language X in order to help development on project Y" or "Get formally introduced to all R&D team leaders, and get introduced to their roadmaps" or "Facilitate 10 job interviews together with team leaders in marketing"

Thank you. I asked because in the previous company I worked for we had difficulties setting measurable goals for devs and ended up tracking only product goals.

Product can be highly variable (or uncertain) so dev goals would focus on skills necessary to achieve said product. Ideally these skills a general and transferrable.

Have you ever read a white paper? You see how they manage to say that the product will solve every problem you have and nothing specific at all? Same idea.

What goals/steps are there? Is it anything more than something like

Goal: get promoted/get higher salary

Steps to the goal: did my job well


You essentially set arbitrary metrics for yourself (create X widgets, approve Y new hires). And then you do those things. And then you pat yourself on the back for doing those things. And then you get passed over for promotion anyway because "maybe next time". It's a way for companies to not promote you based on nits they picked with your own write-ups.

"Did my job well" might be one, but it's hard to measure... otherwise see above :D

My understanding is that OKRs at Google are visible throughout the org, and coleagues are able to (and do) provide feedback on them.

Most orgs that adopt OKRs only make them visible to the person's manager, and without transparency and good feedback, the other problems mentioned here proliferate.

This is simply part of how they've implemented Jack Welch's Vitality Curve[1]. It's pretty common practice for most large companies, although I understand more and more are starting to move away from this approach. As it turns out, most people don't enjoy playing Game of Thrones with their co-workers every quarter.


OKRs are straight out of Andy Grove from Intel. If you read further along, the author indicates that getting fired from Google is very rare. So it's not a fire the bottom 10% every year.

Andy and Jack knew each other well. Where do you think Andy's strategy came from? It's the same thing as Jack's Vitality Curve, only without firing the bottom 10%.

I used to be proud of many things in the article when working at the G.

Not anymore. Let me talk startup anti-Google pattern here.

* Most of Google’s code is stored in a single unified source-code repository, and is accessible to all software engineers at Google

This can be the worst nightmare from a management POV in a startup. Sure it sounds wonderful everyone can see/fix anyone else code but 99% people shouldn't have time to do so (if they do their work load is not full, increase the load). The 1% I guarantee all your codebase has just been stolen by an ex-employee with malicious attempt. Instead divide your codebase into different projects/roles and people only gain access when needed.

* Software engineers at Google are strongly encouraged to program in one of five officially-approved programming languages at Google: C++, Java, Python, Go, or JavaScript.

We use Go for backend programing and vue (javascript) for frontend. Don't use Java if possible, keep away from C++ and definitely never Python which is a maintenance nightmare.

* The next step is to usually roll out to one or more “canary” servers that are processing a subset of the live production traffic.

Not necessary when your misery not-product-market-fit-yet website only gets 100 users. Just roll-the-f-out , let it break and fix later. Building the canary system is a huge overkill in the early stage.

* All changes to the main source code repository MUST be reviewed by at least one other engineer.

Same as above. Just build and RTFO.

> Sure it sounds wonderful everyone can see/fix anyone else code but 99% people shouldn't have time to do so (if they do their work load is not full, increase the load).

That's a sweatshop mentality. Startup, big corp, whatever: your engineers should have flexibility to work on things that they recognize the importance of. That's not the same as free time or a lack of tasking; that's treating people like adults who might notice things you do not.

And if you're intentionally trying to manage your engineers in a way that keeps them from having time to notice bugs in others' work, you're doing yourself a fucking massive disservice: when people have collaborative ownership of something, they get invested in its quality/growth/feature-set, and productivity goes up. "Don't look at other people's code, just keep your head down and churn out the feature in time, ignore the larger picture!" is a recipe for low-output, uninvested, hard-to-reassign, burnt out engineers. Doesn't matter if it's a team of 2 or 2000.

Your words are sweet, sir.

> definitely never Python which is a maintenance nightmare

Did you just say that all startups should never use Python? This is such a ridiculous statement I could hardly imagine where to begin with it.


But you also argued that things like canary servers are overkill for a startup - just get the thing built and worry about it later. Well, why not just build it in Python if that gets it built, and worry about it later?

Dropbox started in python (and is still using it a lot)

...and we all know how horribly they failed.

Whether a startup will succeed has nothing to do with the language used. Google started with Python and TikTok with PHP (wtf). However when you start, go with the better choice since every line of code becomes a liability later.

Well this makes me feel good, because I apparently write better Python than the average Googler, since my Python is entirely maintainable.

Agree about the monorepo thing though, it just seems like people are optimizing for the wrong things with monorepos.

How does it make sense to compare the processes of a company with >100k people touching code and products with more than 1B qps, to a startup with <5 people touching code?

This comparison makes absolutely no sense.

Because there are people who think things work at Google must work magically for their company. There are many patterns that seem perfect in theory but fail miserably in the real world. Always think different and take nothing for granted.

"good enough for Google" == "nobody got fired for buying IBM"

> Not necessary when your misery not-product-market-fit-yet website only gets 100 users. Just roll-the-f-out , let it break and fix later. Building the canary system is a huge overkill in the early stage.

Being a startup does not excuse this kind of cavalier attitude

When you face life-or-death scenarios everyday as a startup founder.

Not a single startup was killed by software bugs or design faults. Never. Many other things do.

It actually does - having only 100 users means it's impossible to implement an effective canary system. It also means that your entire system being down affects way less people than an issue on the canary system at Google.

You have way more important things to worry about in a startup than optimising uptime.

> * All changes to the main source code repository MUST be reviewed by at least one other engineer.

> Same as above. Just build and RTFO.

Deploying without review is not only a development nightmare (if you keep deploying without review, you'll eventually break something or introduce security vulnerabilities, unstable code etc.), but it can also get you in massive trouble with your compliance audits.

Peer review is very important in production code.

Why are you comparing stuff that works at big company with a lot of users and stuff that is pretty much required at a company like that, to a startup?

I would like to know more about your reasons for thinking Python is a maintenance nightmare.

>Sure it sounds wonderful everyone can see/fix anyone else code but 99% people shouldn't have time to do so (if they do their work load is not full, increase the load).

A developer shouldn't have time to fix other people's bugs?

Unless you work on the project, report to the owner who has better knowledge of the code and ask them to fix. Otherwise you are not helping them but end up using their time teaching you how the code really works. Not good for the total productivity of the whole team.

The environment you're describing is more siloed than many 5000+-engineer companies I've worked at or read about. Which is concerning, given that you describe it as a startup.

What resources do you recommend to new SWE hires with no Go experience in order to ramp them up?


Go through this tutorial and gobyexample.com and they should have enough knowledge to work on a project at the end of day three, because the language is so simple. If not, fire your new hires.

Whoa, are you THE thetechlead?


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact