
Scaling Engineering Teams via Writing Things Down and Sharing – Aka RFCs - jbredeche
https://blog.pragmaticengineer.com/scaling-engineering-teams-via-writing-things-down-rfcs/
======
prepend
This one of those things that sounds simple to do, but is rarely done. And
there’s lots of cultural forces that work against it.

I think it addresses a big problem with important tech decisions- how to weigh
engineering value and focus energy on the technical merits, not on political
buy-in.

I work within a decent size government org where leadership is almost
exclusively non-tech (makes sense because we’re not an org with a tech
mission), and almost all tech management is program mgmt, business analysis,
strategy. And the building is almost exclusively contractors.

This means to make a technical decision, it’s hard to evaluate because the
tech manager doesn’t known whether MySQL or Postgres is better, and
contractors have incentives to not invite review by other contractors and
making it “good enough” for contract acceptance.

This means if a developer picks mysql, and the project manager is happy
because it’s projected to be delivered in time, then there’s two big forces
that don’t want comment on design.

We just started to try a request for comment process, but the extra effort to
review is challenging. I think that on the surface it’s because letting large
groups see your design before start introduces organizational risk.

And then a bit deeper is that it requires a greater level of technical depth
in decision makers.

~~~
mikaraento
Many people call this "Enterprise architecture" and although painful tends to
be worth doing.

You can think of it (simplified) as 4 steps: 1) figuring out and deciding
which technologies to use for which domains and problems (these tend to be
called capabilities) 2) getting rid of the ones you don't want to use 3a)
enforcing use of existing, approved capabilities for new projects 3b) building
out new capabilities

1) Tends to have the problem you pointed out: "it requires a greater level of
technical depth in decision makers". Something that may work is to have some
senior engineers in-house and embed them in the outsourced projects doing part
of the real work. 2) This can be a multi-year program that needs to be
separately funded. Think of it as enterprise-level technical debt. 3a) This
can be the easiest part: your architecture should be part of the RFP process,
with a well-defined escape hatch to 3b) 3b) Building out new capabilities
needs to be i) funded separately but best ii) done as part of a real project.
Otherwise you'll i) have crossed incentives with the projects as you pointed
out, or ii) architecture astronauts.

HTH. There's plenty of literature on enterprise architecture, but there's no
silver bullet, it's just hard work.

~~~
prepend
Yes, I agree, and that's why I'm trying to take this approach.

Interestingly I work in a world where enterprise architecture means something
completely different. PM me for more details.

------
doublerebel
This sounds like a description of a Design Document, which are very valuable
when the problem space is well understood. It can serve as an alternative to
sprints and sprint planning for a very small team, or when working with
contractors.

When a problem is not as well understood, e.g. can't be solved just by
engineering planning upfront and needs user input, I like Google's Design
Sprints which uncover the critical features using a 1 week process.

With my experience leading engineering teams and growing startups, I agree
completely that documentation is the #1 blocker in scaling engineering teams,
all else being equal. If the design, planning, and execution process are well-
documented then engineers can be onboarded as soon as they are hired without
slowing down the team too much.

~~~
derefr
To me, a design document is something you write for yourself, once you’ve
decided on the design; whereas an RFC is something you write for others, and
others write for you, to _communicate_ a potential design when a final design
hasn’t yet been “selected.”

It’s effectively a way of brainstorming that doesn’t get quashed by “that’ll
never work” half-way through the 1000-ft abstract, because all the details are
already there on the page to prove it _will_ work. It’s just a question of
which of the proposed designs works _best_ (and then of attempting to maybe
incorporate some of the alternatives’ ideas into the winner, though not
necessarily.)

Or, in short: it’s a debate where everyone is doing an “argument by
constructive proof” for their POV.

The cryptographic-primitive standardization “competitions” for things like
SHA, are RFC processes under another name.

------
faster
The most successful company I ever worked for (late 90's, the founder was a
billionaire for a few months after the IPO, 99.9% of HN readers use an
evolution of what we built every day) had a policy of rejecting any proposal
for a tech feature or significant improvement unless it was properly
documented.

We had a collection of templates for the types of documents we used. I could
create a feature proposal in an hour or two, and the template ensured that all
first-round questions were answered out of the box, saving many hours of staff
time.

The tech founders set this up and forced everyone else to use it and once we
saw the value, peer pressure kept it going. It worked incredibly well.

We weren't writing RFCs, but we had a doc process that respected (and saved)
everyone's time and attention. That helped us move fast together. It really
worked.

~~~
username3
_99.9% of HN readers use an evolution of what we built every day_

Electricity?

~~~
quickthrower2
Browser? 0.1% use curl maybe.

------
fogetti
I am sorry but I will be cynical here. Does the author know how many shit
people give about clarifying and planning things these days? Let alone
documenting. ZERO.

I worked in many companies scaling from small to huge and in-between and
generally the process is the exact opposite. Small minded managers and
insecure engineers want to build things as fast as possible thinking this will
be their big chance to leave a mark in history. So they implement some crap
(not uncommon to even see different teams building the same thing at the same
time!!!) with the speed of light. Then 6 months later they keep patching bugs
every week! If you ever mention them to slow down or try to find a working
example before building or at least organize the work let's say by starting to
explore the problem domain they would just bully you into oblivion...

Case in point: Facebook. 'Move Fast and Break Things'

One thing I learned is that being successful in an IT engineering company in
the 21st century is let the idiots do their idiotic ways and concentrate on my
work and if they propose stupid things than let be it. Politics are always
stronger then engineering considerations.

Maybe Uber got it right from the beginning, I cannot confirm that claim. But I
am very skeptical that any established organization would change in this
direction proposed by the author.

~~~
wjossey
As someone who runs a business with a co-founder, we both are constantly
making trade offs. How do we move quickly while remaining stable? How do
increase our featureset without getting over our skis? How do we grow our
business quickly, but not take on the wrong customers?

Incentives can win, and should win, on both sides of the equation at different
moments in time. The most mature organizations can feel the pain from one
direction or another, and adjust accordingly.

~~~
anon49124
Shortcuts are great when survival time is bought. This can turn into a self-
defeating problem when there is extreme habituation to recurrently and
disproportionately sacrificing effort, cost and/or burdens on others in future
for the now such that the shortcut wastes vastly more net time and money than
it could've ever saved. I've seen founders shoot themselves in both feet and
unable to move fast, or anywhere, because they knowingly stuck themselves with
keeping 50 plates spinning while trying to swim through quicksand...
especially solitary founders because they lack the check and balance of a
second or third founder to keep them honest with reality and on-track.

------
madrox
This isn't bad advice, but I would add to it. If you're a large organization
and building in such a way that large groups need to understand the details of
what you're doing, then you're doing it wrong.

 _Ideally speaking_ , implementations can be worked through with small groups.
Only the interfaces need to be exposed and documented. Unless you're doing
something beyond CRUD, drowning teams in unnecessary details often result in
distraction.

The arguments against this I usually hear are about things like security
audits, architecture reviews, and other internal processes designed to ensure
engineering quality. However, I'd encourage these teams to also think like
platforms that have machine interfaces. People make terrible APIs.

~~~
vageli
> This isn't bad advice, but I would add to it. If you're a large organization
> and building in such a way that large groups need to understand the details
> of what you're doing, then you're doing it wrong.

> Ideally speaking, implementations can be worked through with small groups.
> Only the interfaces need to be exposed and documented. Unless you're doing
> something beyond CRUD, drowning teams in unnecessary details often result in
> distraction.

This is assuming that attrition is not a thing. The set of engineers that are
on your team is likely smaller than the total number of engineers that will
ever work on your project. When people want to know six months from now why
you chose RabbitMQ over other tech at the time, having an RFC lets you point
to an artifact versus conjecture on past motivations.

~~~
toast0
Do the reasons for the choice six months ago even matter today?

Today you have six months of history of it being a good choice or not. And six
months of development assuming it was the choice, that may or may not apply to
something else.

And the choices available today have likely changed from six months ago, too.

(I'm a little concerned about your team that has no connection to decisions
from six months ago, but we can adjust the time frame and the rest still
applies)

~~~
seb1204
Of course they matter. Ideally the decision that was made 6 months ago under
the conditions/restraints was the right one back then. Since then
circumstances might have changed that would result in a different decision 6
months ago if you had known. Only if you have a record of some sort (ideally
with background information and not just the time of the decision) people will
be able to understand and re-evaluate the past.

Money and time was invested since then and the decision to keep the course or
to change again should consider the history of the decision.

------
dkhenry
One of the things I love about open source projects is they are almost forced
to do this and it works great. Having designs written down and iterated on
helps in planning and designing, and helps when its time to make changes.
Being able to see why something was done and the arguments at the time can
really help prevent people from making the same mistakes twice as they make
revisions on a project.

------
dominotw
> Have a few, select people approve this plan before starting work.

How are these ppl selected.

This contradicts org structures where "architects" are supposed to give
"guidance" and enforce a "uniform vision".

~~~
qznc
Your use of quotes suggests some grudge against architects. I am an architect
and I consider my role as one to give guidance and enforce a uniform vision
(although "enforce" sounds like I would actually have significant power over
this chaos here). For example, last week I learned about situation which
caused frustration to our developer for months already. After a few hours I
confidently could give them guidance what they should do in the short-term,
what will happen long-term and how we should handle it.

I agree that architects sometimes solve too general problems instead of
addressing the immediate problem. I'm guilty of this myself. An architect has
the responsibility to think further and wider than a developer while also
being able to consider the details. I sometimes switch to the wrong gear.

Back to the actual topic: The relevant architect is just one of the select
people to approve the plan.

I would love it if I would hear about new plans in written form.
Unfortunately, it is usually in some meeting. This modus operandi forces me to
sit in lots of meetings which are rarely relevant to me. Even worse, sometimes
it is only through rumors which means lots of followup emails to clarify.

We actually do have a similar process to the one in the article, but without
step 4 "Send this planning document out to all engineers". I suggested
something like this a few times but most people complain that they get too
much mail anyways. I also did mail to everybody a few times and it was
helpful. So I fully agree with the article, but I lack to power to implement
step 5 "Have everyone follow the above steps" so far.

------
ahuth
One twist on this concept that we've used effectively at Mavenlink is that
there is no approval step.

RFCs function more as a communication and participation process before an
effort starts, and approval just hasn't felt like a necessary part of that.

Our org is around 50 engineers and has a very collaborative culture already,
and maybe approval would be necessary for other environments.

Another benefit from the RFC process in general is that it's very easy for
technical leadership and management (as well as everyone at the company,
really) to see all the technical efforts underway.

~~~
weliketocode
I guess it depends on what you consider approval. Are all comments being
addressed before the RFC is worked on, and are all RFCs receiving regular
comments?

In a small team with decent communication, RFCs can be prioritized somewhat
easily. But they still should be tracked so that you can which are RFCs are
being written, currently receiving/addressing comments, and which are being
implemented by the team.

~~~
ahuth
Great question!

There's no enforcement mechanism, but I can't think of any examples of
comments not being addressed before being worked on. Also, so far all RFCs
receive comments (between 5 and 50).

Tracking RFCs is a good point, and we do it by putting them in a git repo. The
document itself is markdown, and comments happen on a pull request (which also
helps with notifying the entire team).

So RFCs with an "open" PR are the ones currently soliciting feedback.

------
karmakaze
On the previous Platform Team I worked in, we called these Tech Docs and used
a template for anything moderate-sized or required new software/service
selection or was judged to be complex.

In every case, we had one person responsible to putting together the doc, but
the team or a subset of the team participated in brainstorming, whiteboarding,
and submitting additional input prior to the first draft. The draft was shared
and commented/ammended with a target finalize date by which time it had been
signed off by identified key people as well as others who reviewed it.

------
seb1204
I'm curious, does anyone know of documented cases where RFCs have been used in
non-software related engineering environments? E.g. manufacturing, equipment
design, chemical process design etc.

~~~
cjalmeida
Yes, but they're called ISO, ANSI, DIN, etc. standards

~~~
seb1204
I'm aware that these standards exist and are being used. My question was more
on the engineering process that uses them. E.g. a Process design that is
developed into Flow sheets , then PI&D until it is built. Developments of PFD
and PI&D are covered in intentional standards and procedures. But the
underlying knowledge on what and how it is designed is part of the process
knowledge the engineering company has. My question was related to that
knowledge and if there are examples of RFC being used out there. The RFC or
equivalent would then discuss the reasons behind e.g. the selection of a
specific pressure vessel head for a certain application. While there are many
head options that are code or standard compliant the engineering company has
reasons to chose one. The documentation of these reasons and the history
behind it is what interest me.

------
the_arun
What are the collaborative steps followed by other big companies to move fast
- concept to seeing something in production?

~~~
gregdoesit
Op of the post here. I’ve been talking to a few people at other large
companies and here’s the information I have. Most of this is anecdotal so
treat it as such. Appreciate input or corrections from people working at the
companies.

\- Facebook is the most lightweight on docs. Code is/was still king there and
even planning docs might be written after the fact. The downsides I’ve heard
is tech/architecture debt building up fast and lots of throwaway stuff built.

\- Amazon is quite rigid and requires a concise planning doc. Depending on the
org you work, there might be a few levels of more formal approvals required.

\- Google has a process similar to that described in this post, with planning
docs being circulated. Due to the large size of the company, docs are routed
to specific committees within orgs who give feedback on them.

\- For smaller companies it will very much vary. Interesting that some do
follow something similar, apparently Cockroachdb has a process close to this
one:
[https://twitter.com/vivekmenezes/status/1047827698956079104?...](https://twitter.com/vivekmenezes/status/1047827698956079104?s=21)

Note that the process I described works well when you have a clear idea of
what you are building and have few dependencies. For prototyping or for
large/complex projects, planning can get way too slow. That’s when a “war
room” with a small team building a prototype, skipping all the docs part, will
work a lot faster. All bigger companies I know use this when a better fit.

------
tnolet
It’s fascinating to see people reinvent some form of ITIL and genuinely think
they stumbled on some unique or new way of doing things.

~~~
lvh
Do you feel like the process described in this post is a facsimile of ITIL? Do
you feel like ITIL is generally associated with repeated shipped product?

~~~
tnolet
I can see my tone being snarky. Sorry for that. But yes, I can draw a ton of
parallels between how the change management and RFC process (request for
change in this case) is used by more traditional companies and the OP’s post.
ITIL is no holy grail, far from it, but some things invented in enterprise IT
in the 1980’s have a tendency to resurface in some guise or form years later.

~~~
quickthrower2
Just googled ITIL, its an interesting word in Indonesia

------
Animats
They just re-invented the waterfall method.

