
Software Engineering at Google (2017) - weinzierl
https://arxiv.org/abs/1702.01715
======
Radim
Buried in the "2.11 Frequent rewrites" section, but a great hack for
"productivity via a sense of ownership":

 _" In addition, rewriting code is a way of transferring knowledge and a sense
of ownership to newer team members. This sense of ownership is crucial for
productivity: engineers naturally put more effort into developing features and
fixing problems in code that they feel is “theirs”."_

~~~
sytelus
If you have engineers with physiological problem of “not invented here”, you
have a very serious issue. I am currently seeing this in real time in one of
the projects and I was told _almost exact_ same words as “reason” to recreate
what we already have and working beautifully. It was clear to me that some
developers are just too lazy to dive in to complex system. They get ticked off
by one imperfection here and other over there and immediately run for exit
shouting “I could do so much better”. Instead of understanding why things are
the way it is, they fantasize about how they can one up original authors and
claim their own hero title. They go on to throughly underestimate the time to
recreate what has taken years of learning. So they spend next many months
sweating out, copy as much code from “old” stuff as they can, dropping
important feature here and there, adding new and old bugs - very often
arriving at more or less same place they started off. Meanwhile competition
has moved on to V2 laughing their way to the bank and customers scratch their
heads why you are still stuck in same place for so long. Then our new “owners”
gets their promos after massive marketing of how much better everything is
now. But to everyone’s surprise they soon leave the project because working on
bugs and incremental features has became boring and BTW, the new stuff is just
as complex as old stuff. New devs roll in and we start the whole cycle again.

~~~
klodolph
You can call it a psychological problem if you want but calling names is not a
solution, nor does it really provide a good path to finding a solution. The
labor market being what it is, people will leave steady jobs with good pay for
more exciting work with riskier prospects and less pay. This happens _all the
time._

Bug fixes and incremental features will generally not get you promoted for
good reasons, we expect senior engineers to have system design skills and
there is simply no way to demonstrate that without having your engineers
design systems. If you only assign tasks based on the needs of your product
and not on the needs of your workforce you could easily find yourself with a
critical skill shortage.

There's a certain inefficiency to this, but only if you put your blinders on.
If you have engineers churning out meaningless work then you certainly need to
address that problem, but if you prioritize short-term product success over
team health you are only trading one problem for another.

~~~
emperorcezar
> Bug fixes and incremental features will generally not get you promoted for
> good reasons.

This is exceptionally bad, but sadly true.

If you have an engineer who can unblock teams and fix issues in an hour that
others take a week or cannot fix at all, they are gonna jump ship if they
can't be recognized.

At that point you've lost a valuable resource.

~~~
Novashi
Recognition is stupidly cheap.

If I can do that without it being a fluke, they’d better bump my salary.

~~~
mcguire
Would you settle for a certificate?

~~~
wbl
My landlord doesnt take certificates.

------
weinzierl
What I don't understand is how they accomplish larger collaborative changes.
The paper says:

 _" Almost all development occurs at the 'head' of the repository, not on
branches."_

Googler Rachel Potvin made an even stronger statement in her presentation
about _" The Motivation for a Monolithic Codebase"_ [1]:

 _" Branching for development at Google is exceedingly rare [..]"_

In the related ACM paper she published with Josh Levenberg there is the
statement that:

 _" Development on branches is unusual and not well supported at Google,
though branches are typically used for releases."_

I my world when we have to make a bigger change we create a branch and only
merge it into the trunk when it is good enough to be integrated. The branch
enables us to work on that change together. I don't understand how they do
this at google. As far as I understand in their model they either have to

\- give up on collaboration and always have just a single developer work on a
change.

\- share code by other means.

\- check in unfinished work to the trunk for collaboration and constantly
break trunk.

[1] [https://youtu.be/W71BTkUbdqE?t=904](https://youtu.be/W71BTkUbdqE?t=904)

[2] [https://cacm.acm.org/magazines/2016/7/204032-why-google-
stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext#)

~~~
sigsergv
I think they don't branch code because perforce branches are horrible and
merging them back to HEAD is extremely painful process for monorep.

~~~
raverbashing
So, use a better tool?

~~~
philosopher1234
Or, don’t branch. Is branching so essential?

~~~
lordnacho
Isn't it essential for mental organisation? How do you think about what's
different about a set of changes without some sort of DAG?

~~~
yjftsjthsd-h
Just do one thing at a time? Today, I am working on X; my commits are for X,
and details are in the commit message.

~~~
dharmab
That breaks as soon as you have to interrupt working on Nice To Have Feature X
to working on Inportant Bugfix/CVE Y.

~~~
addicted
How often does that happen to an individual developer though?

Once a month? In an averagely well run company even that may be towards the
higher end.

Should your entire development strategy be based on a once a month occurrence?

~~~
idontpost
> Once a month?

Closer to once a week for me.

------
stillworks
So, the thing that is still buzzing in my head now and not mentioned in the
article (maybe I didn't read carefully enough), what actually gets released
into prod after a change is reviewed and merged.

If the monorep contains let's say five different products and in a day only
one of them gets a merge, then Blaze still builds all five and all five are
released (based on successful integration testing)? OR only releases the
changed product (and any other ones which depended on the changed one)

EDIT: Also, the "canary" server is still for testing ? There may exist
practically a set of canaries running very different versions ? Is there any
correlation or any version "roll-up" constraints between various canaries ?

~~~
ASinclair
Like kyrra said. Releases are handled by independently by each team. Teams are
in various states of maturity in relation to their release practices. Some are
fully automated. Some involve manual QA. I've supported teams across that
continuum as a SETI at Google.

I'm not as well-versed in canarying though I've set it up for a team or two.
I've only ever seen a single canary version for any particular binary.

Canarying is done. I haven't seen canaries running multiple versions of the
same binary. Though teams will often guard new behavior behind experiment
flags.

~~~
stillworks
Thanks for all replys above (Kyrra, Joshuamorton and ASinclair), these have
been very helpful.

------
lexpar
The protocol buffers stuff seems pretty cool. At my last job (small web dev
shop), we had constant headaches over the class definitions of endpoints
changing and some js file not having its model updated to correspond. I
thought we should be writing XML files that both ends could be reading in - we
never got up the political will to make that big change though.

[https://developers.google.com/protocol-
buffers/](https://developers.google.com/protocol-buffers/)

~~~
gradys
Yep, protocol buffers are like a cross-language type system for the data that
moves between systems with the side benefit of compact serialization. They're
awesome and definitely a big productivity boost above a certain scale.

~~~
lexpar
Something like this would have indeed saved us a lot of time in the long run.
Oh well.

------
dekhn
I was the maintainer of a third-party library used by thousands of dependent
applications at Google. I have to admit, I still have not seen on the outside
a system that allows me to change the version of numpy, and know that
thousands of dependent applications either work or break, within an hour of
making my change.

Being able to write and use a mapreduce with a high level of confidence that
my code would continue to work years later was another nice benefit. MRs I
wrote in the first year at Google still compiled and ran with minimal changes
almost 10 years later(!) which is amazing given the amount of environmental
change that occurred.

That said, somebody could still halt development across the company by
changing and checking in a core file (like proto defs for websearch) without
testing.

Whatever social system led to google3/borg and the amazing productivity
associated with it, it was a special moment that hasn't been replicated many
times.

~~~
shereadsthenews
People who like multirepos are always saying how easy it is to pin
dependencies but like you I haven’t seen anyone doing it right since I left
Google. The monorepo third-party system works well in practice.

Ps thanks for getting scipy into third_party all those years ago.

~~~
pjc50
> The monorepo third-party system works well in practice.

It's worth noting that this is only viable at Google because they don't use
git. Git's insistence on every client having a full copy of all history of
every file in the repository makes monorepo _much_ more expensive.

I see conflicting reports over whether google use Perforce or something
proprietary called "piper"?

~~~
dekhn
I used a git wrapper for google3 repo. It wasn't great. There are a number of
semantic differences between piper/perforce and git that made it awkward.
Especially code review- git doesn't handle code review well (I still find this
to be an issue with github and other sites that have code review). but it was
not an officially supported solution and I believe the replacement for it is
based on another DVCS, Mercurial, for some silly software engineering reasons
I don't like.

~~~
shereadsthenews
I used the git wrapper for a few days until I hit a day-ending `git gc`.
That's when I knew git was terrible.

------
growtofill
Previous discussion:
[https://news.ycombinator.com/item?id=13619378](https://news.ycombinator.com/item?id=13619378)

------
MeteorMarc
"Individuals and teams at Google are required to explicitly document their
goals and to assess their progress towards these goals"

This seems attractive for other large organizations. Any positive or negative
experiences from readers?

~~~
scarface74
My experience is that the difference in raises between an employee that got an
“Exceeds Expectations” and one that “Meets Expectations” isn’t significant
enough to be worth wasting the time worrying about it.

The best way to make more money is to change jobs. Google may be different.

~~~
webmaven
Well, if you exceed expectations, doesn't that mean you didn't set your goals
high enough?

~~~
steelframe
At Google you are expected to exceed expectations.

But seriously, you are evaluated against the role description (software eng?
product mgmt?) and your level. If you exceed expectations consistently over
several review cycles, you are encouraged to apply for a promotion.

The goal is to get you promoted into a role and level where you can
consistently meet expectations.

~~~
sjg007
Ahh the Peter Principle.
[https://en.wikipedia.org/wiki/Peter_principle](https://en.wikipedia.org/wiki/Peter_principle)

~~~
joshuamorton
Specifically the opposite. The peter principle implies you'll make it to a
point where you flounder and can't manage. This is the opposite: You make it
to a position where you do well, but aren't spectacular (compared to your
peers in the same position).

Roles/levels are calibrated so that expectations at L+1 are generally speaking
aligned with strong performance at L.

------
thetechlead
I used to be proud of many things in the article when working at the G.

Not anymore. Let me talk startup anti-Google pattern here.

* Most of Google’s code is stored in a single unified source-code repository, and is accessible to all software engineers at Google

This can be the worst nightmare from a management POV in a startup. Sure it
sounds wonderful everyone can see/fix anyone else code but 99% people
shouldn't have time to do so (if they do their work load is not full, increase
the load). The 1% I guarantee all your codebase has just been stolen by an ex-
employee with malicious attempt. Instead divide your codebase into different
projects/roles and people only gain access when needed.

* Software engineers at Google are strongly encouraged to program in one of five officially-approved programming languages at Google: C++, Java, Python, Go, or JavaScript.

We use Go for backend programing and vue (javascript) for frontend. Don't use
Java if possible, keep away from C++ and definitely never Python which is a
maintenance nightmare.

* The next step is to usually roll out to one or more “canary” servers that are processing a subset of the live production traffic.

Not necessary when your misery not-product-market-fit-yet website only gets
100 users. Just roll-the-f-out , let it break and fix later. Building the
canary system is a huge overkill in the early stage.

* All changes to the main source code repository MUST be reviewed by at least one other engineer.

Same as above. Just build and RTFO.

~~~
elmozyz
> Not necessary when your misery not-product-market-fit-yet website only gets
> 100 users. Just roll-the-f-out , let it break and fix later. Building the
> canary system is a huge overkill in the early stage.

Being a startup does not excuse this kind of cavalier attitude

~~~
thetechlead
When you face life-or-death scenarios everyday as a startup founder.

Not a single startup was killed by software bugs or design faults. Never. Many
other things do.

------
cpeterso
I'm curious to learn more about Google's approach to project planning and
management, beyond OKRs for personal or team goals.

Do all teams use the same standard process or are teams allowed to do their
own thing? What tools and cadence do they use to track project milestones and
progress? With Google's monorepo and a centralized bug tracker, they could
make some nice project dashboards.

~~~
lucasmullens
Each team does their own thing. There's some project management tools that are
popular, but no Google-wide standard.

------
patothon
Not worth a long discussion, but is the Blaze build system they are referring
to in fact Bazel?

~~~
laurentlb
Yes. Blaze is the name of the binary inside Google. 90% of the code is in
common. Differences are 1. integration with Google-internal tools, and 2.
legacy misfeatures we plan to remove.

(I work on Bazel)

~~~
jokh
What kind of internal integrations, if I may ask? Does it automatically deploy
code for you on a successful build or something like that?

~~~
netheril96
For example, blaze has integration with Google's build cache. Most of the time
it takes 0 seconds to compile a given file, when someone else has compiled it
before with the exactly same contents, dependencies and compiler flags.

------
auiya
Page 17 - "any employee can nominate any other employee for a 'peer bonus' \-
a cash bonus of $100 - up to twice a year..."

Is there a hard cap/limit on this policy now? I might have to take the rap for
that, having peer bonus'd my entire office on my last day there back in 2012.
My bad.

~~~
joshuamorton
I think that's "you can nominate a specific other person twice a year".
There's a soft cap on how many people you can nominate per year period, but
its pretty high.

------
bsvalley
Other than the 20 percent free time, compensation and offices, the engineering
culture at Google doesn’t seem to have anything special based on this well
written pdf. It is in fact very similar to other large tech companies. Just in
case you were still wondering what really attracts great engineers.

~~~
hknd
I think it attracts a lot of great engineers that they can (mostly) work on
something which is used by billions of users. You can basically improve the
life of 1B ppl by writing code.

Also: having a FAANG company in your CV makes it super easy join any other
tech company.

~~~
bsvalley
Reality is that most of the engineers at Google don't work on google search
nor google map. They work on "smaller" projects that don't necessarily reach
billions of users. It doesn't change the fact that everything has to be
engineered in order to work for a large amount of users, true, but do you
really get that reach? nope unless you're in a very hot and selective team at
G.

Also, your statement is valid for companies like Facebook, Uber, Apple,
Amazon, etc. It is not a valid argument anymore in my opinion. But I totally
get your point.

~~~
jogjayr
Google has 7 products with 1+ billion users.[1]

1\. [https://www.popsci.com/google-has-7-products-
with-1-billion-...](https://www.popsci.com/google-has-7-products-
with-1-billion-users)

~~~
bsvalley
Now let's look at the ration billion-user-products versus the total number of
products. That'll give you the ration of employees having a huge impact:

[https://en.wikipedia.org/wiki/List_of_Google_products](https://en.wikipedia.org/wiki/List_of_Google_products)

And we're not even looking at the number of active users per product versus
the total amount of gmail ID's enabled for all google products by default
(e.g. Google+)

~~~
jogjayr
> Now let's look at the ration billion-user-products versus the total number
> of products. That'll give you the ration of employees having a huge impact

That logic doesn't make any sense. No company allocates staff equally to all
their products. More popular products (or higher revenue-generating products)
are always better-staffed than less popular products.

~~~
bsvalley
You can't say "always". This doesn't make sense either because it's on a case
by case basis. You could have a simple product used by billions of people,
which doesn't require many devs, or you could have a complicated
infrastructure no one knows about that gets billions of requests from 10 other
products per day. This could be handled by an army of engineers.

Is your hot product in a maintenance mode? If so, just a few devs can handle
it.

Then you have companies with groups like retail, legal, hardware, etc. They
require tones of software engineers to build internal tools. They definitely
don't reach billions of users and you see a lot of these teams. What I found
out while working at some of the Faang's is that the hottest teams are usually
very lean. You'd be surprised how one single rock star engineer can handle.
When you start having +10k engineers in your company, only a minority of folks
will end up working on the hot stuff.

------
d_burfoot
I opened this document expecting to have my mind expanded by learning about
amazing new techniques for high-level software engineering. I was
disappointed. A lot of it just seems to be about release engineering. Yeah,
sure, release engineering is really hard and important for a company like
Google, but it's of limited general interest. I was particularly disappointed
by Google's indifference to the use of smarter languages than the Big 5 (C++,
Java, Python, etc).

As an example of something that _did_ expand my repertoire as a software
engineer, check out this article by Jane Street:
[https://blog.janestreet.com/testing-with-
expectations/](https://blog.janestreet.com/testing-with-expectations/)

------
mbrodersen
Software developers that can't work with code they didn't write themselves are
not skilled Software developers.

------
alexandercrohde
>>Most software at Google gets rewritten every few years.

>> This may seem incredibly costly. Indeed, it does consume a large fraction
of Google’s resources. However, it also has some crucial benefits that are key
to Google’s agility and long-term success.

Uh... No? Chrome, gmail, search, youtube, analytics, android, documents all
usually have 2 or 3% functionality changes each year. What a ludicrous
proposition stated so matter-of-factly.

~~~
throwawaymath
Is it possible most software at Google is not Chrome, Gmail, Search, Youtube,
Analytics, Android or Documents?

Also do you work at Google? If you don't, you might not be able to perceive
all the changes to those applications on the backend. The user interface may
change at a more constrained pace, but it can still be true that most backend
code is in a constant state of revision.

For example, what defines "Search" in your example? The google.com search
interface rarely changes (logo notwithstanding), but what do we have to go on
for how often search infrastructure or processing code changes?

~~~
alexandercrohde
Let me be clear (and I don't work at google). I'm not saying google doesn't
spend a lot of engineering time doing rewrites. I'm saying if google DOES
spend a lot of the engineering time of its 20,000 engineers doing rewrites
that is BAD by all measures.

Maybe it would explain how a company with 1,000 startups worth of "top-talent"
engineers could not produce any new functionality in the last 5 years though
(small exception of AI)

~~~
wolco
This is a good point worth exploring. If google has 1000 startups worth of
talent (which is another point worthdiscussing) the corporate structure has
clearly slowed down new functionality or new product ideas. It seems like it
is 1000 times harder at google which levels the playing field.

------
dashmug
Where's Dart?

------
thatfrenchguy
This all looks nice and perfect, but what's the catch ?

~~~
Izmaki
That at the end of the day, this is all controlled and used by human beings,
who are notoriously known for sometimes being mean and selfish?

Examples: the 80/20% project work that is described is a really good idea
IMHO, however if somebody choses to work 50/50 in a period, becuase that
person is "almost finished and longing to present it", then it might turn into
a problem for the team leader and the employee.

Software sometimes fails, even highly sophisticated build automation.

Rewriting software may introduce a sense of ownership to employees, but it may
also lower the quality of the code that is being rewritten - worst case to the
extend that the code needs another rewriting to even function as intended.

...but why let yourself get blocked by the negatives, instead of being
inspired by the positive?

------
diaktifkan
Which part discusses what Does with your IP when you're not sitting in the
office?

