It's a bit like eating well and working out. We all know we should do it, but most people don't actually eat well and work out. Then, when you see somebody who does it and looks great, you ask them, "What's your secret?". But it's not a secret, it's just that most people don't do it, because it's hard :)
One story: When I was at FB, I happen to know that a team of size S conducted X experiments in 6 months (I can't disclose the number). As it happens, I have worked in similar size teams in other companies, and there the number was ~X/20, and sometimes 0. It don't matter how good your software architecture is, if you're trying out just 1 thing instead of 20 things... I call this velocity.
Another good story: many semi-successful companies end up in a place where there is an initial product/software which gets them a lot of growth, and then ~5 years into the startup, the bigger, more mature, bigger team decided that the old legacy code is holding them back, and they're going to REWRITE IT. Maybe in some fancy new language, or a fancy new architecture like micro-services. The estimate is 6-9 months to get to first light. But it will probably end up taking 3-5 years, because the legacy had a lot of fine-tuning in it, and it turns out many of the problems are hard to fix, moving the production to a new thing is REALLY HARD, and all those fancy new technologies are actually far from perfect, plus the current team doesn't have a lot of experience with it. Compare to this what Facebook did with its PHP codebase: at some point it became a bottleneck (the crappy language and the runtime speed), but there was a never a from-the-ground-rewrite. Instead they ended up writing several iterations of better runtimes, and since they were changing the runtime anyway, they "fixed up" the language (while keeping it mostly backward compatible). The new language is called Hack and the new runtime is called HHVM. The cool thing is, in all this time, there wasn't a rewrite, so they were able to keep shipping new features on Facebook, run A/B tests, iterate on the product. Compare this to one of the companies I worked at, where they did a rewrite , and now customers have to chose between the old and a new thing, it's not transparent, because <software issues>. There's a book about Hack/HHVM, one of the Facebook guys wrote it, iirc the first chapter is about this whole story. See this blog post for links: http://bytepawn.com/hack-hhvm-second-system-effect.html
In general, the principles I've seen to work really well:
- write good code, but don't do big rewrites
- cont. delivery: always be shipping to master and production in small increments (don't have big git branches that aren't in production, at the end it will be scary to merge it and put it into production)
- cont. integration: have tests and run them on every commit (you probably need some testing nazis to enforce this...)
- if you (your team) can't write a good monolith, you (your team) also can't write a good MSA
- invest heavily in linting and other automated ways to catch and conform code when it is committed
- programming language doesn't matter that much, just pick one for each domain (eg. web, mobile, etc), and stop thinking about it, and don't let people waste their time on arguing over it too much; instead invest heavily in tooling that supports all the other aspects (like code reviews, perf, experimentation, etc)
- 1 other person should review and okay the code before it goes into production
- aggressivley remove obstacles from shipping stuff to production
- make it easy to run experiments, and run a lot of experiments
- don't hire people who just want to write code, or think their job stops there
Exactly! FB has the most productivity per developer of any company I've seen, and this efficiency is a direct result of management systematically lowering barriers to writing code and investing in developer productivity tools. Other companies have elaborate rules and procedure and committees and style guides built around stopping code. Facebook encourages writing code.
(And before you say: no, the codebase doesn't take a quality hit. Turns out that this ceremony around writing code is just unnecessary, contrary to population opinion.)
Which part of the codebase? Facebook's back-end systems do a very impressive job given the scale involved. On the other hand, its front-end systems appear to be mediocre in most important respects. A business without Facebook's advantages that wrote a UI as slow and buggy as Facebook's main or advertiser UIs often are could be in serious trouble.
But I would say software architecture matters more for infrastructure like storage systems, operating systems, and programming languages than it does for products. I think a lot of the literature on software architecture is about those domains.
Those things are harder to do iteratively. And language choice matters more in those domains.
Programming is a huge field now, and a lot of choices are domain-specific. Including how much you should care about architecture and programming languages.