I think it addresses a big problem with important tech decisions- how to weigh engineering value and focus energy on the technical merits, not on political buy-in.
I work within a decent size government org where leadership is almost exclusively non-tech (makes sense because we’re not an org with a tech mission), and almost all tech management is program mgmt, business analysis, strategy. And the building is almost exclusively contractors.
This means to make a technical decision, it’s hard to evaluate because the tech manager doesn’t known whether MySQL or Postgres is better, and contractors have incentives to not invite review by other contractors and making it “good enough” for contract acceptance.
This means if a developer picks mysql, and the project manager is happy because it’s projected to be delivered in time, then there’s two big forces that don’t want comment on design.
We just started to try a request for comment process, but the extra effort to review is challenging. I think that on the surface it’s because letting large groups see your design before start introduces organizational risk.
And then a bit deeper is that it requires a greater level of technical depth in decision makers.
You can think of it (simplified) as 4 steps:
1) figuring out and deciding which technologies to use for which domains and problems (these tend to be called capabilities)
2) getting rid of the ones you don't want to use
3a) enforcing use of existing, approved capabilities for new projects
3b) building out new capabilities
1) Tends to have the problem you pointed out: "it requires a greater level of technical depth in decision makers". Something that may work is to have some senior engineers in-house and embed them in the outsourced projects doing part of the real work.
2) This can be a multi-year program that needs to be separately funded. Think of it as enterprise-level technical debt.
3a) This can be the easiest part: your architecture should be part of the RFP process, with a well-defined escape hatch to 3b)
3b) Building out new capabilities needs to be i) funded separately but best ii) done as part of a real project. Otherwise you'll i) have crossed incentives with the projects as you pointed out, or ii) architecture astronauts.
HTH. There's plenty of literature on enterprise architecture, but there's no silver bullet, it's just hard work.
Interestingly I work in a world where enterprise architecture means something completely different. PM me for more details.
What is the risk to be specific? Just curious.
Maybe the easiest to deflect is "You're developing in Ada, and we prefer Lisp, etc. etc."
When a problem is not as well understood, e.g. can't be solved just by engineering planning upfront and needs user input, I like Google's Design Sprints which uncover the critical features using a 1 week process.
With my experience leading engineering teams and growing startups, I agree completely that documentation is the #1 blocker in scaling engineering teams, all else being equal. If the design, planning, and execution process are well-documented then engineers can be onboarded as soon as they are hired without slowing down the team too much.
It’s effectively a way of brainstorming that doesn’t get quashed by “that’ll never work” half-way through the 1000-ft abstract, because all the details are already there on the page to prove it will work. It’s just a question of which of the proposed designs works best (and then of attempting to maybe incorporate some of the alternatives’ ideas into the winner, though not necessarily.)
Or, in short: it’s a debate where everyone is doing an “argument by constructive proof” for their POV.
The cryptographic-primitive standardization “competitions” for things like SHA, are RFC processes under another name.
Design docs get a bad rap for “big design up front” because, I think, they get fetishized as risk management rather than accomplishing their functional task.
For this post, I think the best part is being able to gather good review and edits, and track changes as decisions are made and why.
We had a collection of templates for the types of documents we used. I could create a feature proposal in an hour or two, and the template ensured that all first-round questions were answered out of the box, saving many hours of staff time.
The tech founders set this up and forced everyone else to use it and once we saw the value, peer pressure kept it going. It worked incredibly well.
We weren't writing RFCs, but we had a doc process that respected (and saved) everyone's time and attention. That helped us move fast together. It really worked.
I worked in many companies scaling from small to huge and in-between and generally the process is the exact opposite. Small minded managers and insecure engineers want to build things as fast as possible thinking this will be their big chance to leave a mark in history. So they implement some crap (not uncommon to even see different teams building the same thing at the same time!!!) with the speed of light. Then 6 months later they keep patching bugs every week! If you ever mention them to slow down or try to find a working example before building or at least organize the work let's say by starting to explore the problem domain they would just bully you into oblivion...
Case in point: Facebook. 'Move Fast and Break Things'
One thing I learned is that being successful in an IT engineering company in the 21st century is let the idiots do their idiotic ways and concentrate on my work and if they propose stupid things than let be it. Politics are always stronger then engineering considerations.
Maybe Uber got it right from the beginning, I cannot confirm that claim. But I am very skeptical that any established organization would change in this direction proposed by the author.
Incentives can win, and should win, on both sides of the equation at different moments in time. The most mature organizations can feel the pain from one direction or another, and adjust accordingly.
Technical debt is like the national debt: interest gets paid along with way, but sooner or later there will be a reckoning.
There is often too much ego-investment, NIH and sunk costs bias for people to diverge much from whichever direction they were moving, even if it's not a good direction to go.
There is no perfect; and in fact, the perfect is the enemy of the shipping; and there's a infinitely massive continuum from fragile to awesome.
Ideally speaking, implementations can be worked through with small groups. Only the interfaces need to be exposed and documented. Unless you're doing something beyond CRUD, drowning teams in unnecessary details often result in distraction.
The arguments against this I usually hear are about things like security audits, architecture reviews, and other internal processes designed to ensure engineering quality. However, I'd encourage these teams to also think like platforms that have machine interfaces. People make terrible APIs.
Unfortunately, it is too easy to actually talk of a shared infrastructure design. As soon as you have a shared message queue/repository or some other piece of infrastructure, you no longer have a solid contract between teams where each can operate independently of the other.
But I've seen large orgs that at least strive for this ideal vs orgs that have thrown their hands in the air and given up. One is definitely a lot easier to operate in.
> Ideally speaking, implementations can be worked through with small groups. Only the interfaces need to be exposed and documented. Unless you're doing something beyond CRUD, drowning teams in unnecessary details often result in distraction.
This is assuming that attrition is not a thing. The set of engineers that are on your team is likely smaller than the total number of engineers that will ever work on your project. When people want to know six months from now why you chose RabbitMQ over other tech at the time, having an RFC lets you point to an artifact versus conjecture on past motivations.
Today you have six months of history of it being a good choice or not. And six months of development assuming it was the choice, that may or may not apply to something else.
And the choices available today have likely changed from six months ago, too.
(I'm a little concerned about your team that has no connection to decisions from six months ago, but we can adjust the time frame and the rest still applies)
Money and time was invested since then and the decision to keep the course or to change again should consider the history of the decision.
RFCs function more as a communication and participation process before an effort starts, and approval just hasn't felt like a necessary part of that.
Our org is around 50 engineers and has a very collaborative culture already, and maybe approval would be necessary for other environments.
Another benefit from the RFC process in general is that it's very easy for technical leadership and management (as well as everyone at the company, really) to see all the technical efforts underway.
In a small team with decent communication, RFCs can be prioritized somewhat easily. But they still should be tracked so that you can which are RFCs are being written, currently receiving/addressing comments, and which are being implemented by the team.
There's no enforcement mechanism, but I can't think of any examples of comments not being addressed before being worked on. Also, so far all RFCs receive comments (between 5 and 50).
Tracking RFCs is a good point, and we do it by putting them in a git repo. The document itself is markdown, and comments happen on a pull request (which also helps with notifying the entire team).
So RFCs with an "open" PR are the ones currently soliciting feedback.
In every case, we had one person responsible to putting together the doc, but the team or a subset of the team participated in brainstorming, whiteboarding, and submitting additional input prior to the first draft. The draft was shared and commented/ammended with a target finalize date by which time it had been signed off by identified key people as well as others who reviewed it.
How are these ppl selected.
This contradicts org structures where "architects" are supposed to give "guidance" and enforce a "uniform vision".
I agree that architects sometimes solve too general problems instead of addressing the immediate problem. I'm guilty of this myself. An architect has the responsibility to think further and wider than a developer while also being able to consider the details. I sometimes switch to the wrong gear.
Back to the actual topic: The relevant architect is just one of the select people to approve the plan.
I would love it if I would hear about new plans in written form. Unfortunately, it is usually in some meeting. This modus operandi forces me to sit in lots of meetings which are rarely relevant to me. Even worse, sometimes it is only through rumors which means lots of followup emails to clarify.
We actually do have a similar process to the one in the article, but without step 4 "Send this planning document out to all engineers". I suggested something like this a few times but most people complain that they get too much mail anyways. I also did mail to everybody a few times and it was helpful. So I fully agree with the article, but I lack to power to implement step 5 "Have everyone follow the above steps" so far.
- Facebook is the most lightweight on docs. Code is/was still king there and even planning docs might be written after the fact. The downsides I’ve heard is tech/architecture debt building up fast and lots of throwaway stuff built.
- Amazon is quite rigid and requires a concise planning doc. Depending on the org you work, there might be a few levels of more formal approvals required.
- Google has a process similar to that described in this post, with planning docs being circulated. Due to the large size of the company, docs are routed to specific committees within orgs who give feedback on them.
- For smaller companies it will very much vary. Interesting that some do follow something similar, apparently Cockroachdb has a process close to this one: https://twitter.com/vivekmenezes/status/1047827698956079104?...
Note that the process I described works well when you have a clear idea of what you are building and have few dependencies. For prototyping or for large/complex projects, planning can get way too slow. That’s when a “war room” with a small team building a prototype, skipping all the docs part, will work a lot faster. All bigger companies I know use this when a better fit.
I worked at a big, regulated, company that had a very strict change control process. Too many RFC's for implementations a week out were met with a response like "what! why did you buy/build a system to do X? We already have three systems that do that!!", but it was too late by then to rethink x. The problem was it's difficult to deny a plane in the air permission to land - it's coming down eventually, like it or not. So we implemented an architecture review board that was "permission to take off", hopefully before sinking a lot of money and time into something, run the idea around past some connected folks. It was default approve - someone could raise an objection and ask for more information, but if there were collective shrugs, you were good to go.