
The System Design Primer - donnemartin
https://github.com/donnemartin/system-design
======
contingencies
High level observations:

1\. Business level constraints (time, human, fiscal and other resources,
stakeholders) trump technical constraints _every time_. Identifying these
should be _step zero_ in any design process.

2\. A business-level risk model assists with appropriate design with respect
to both security and availability and should ultimately drive component
selection.

3\. Content seems very much focused on public IP services provided through
multiple networked subsystems. While this is a very popular category of modern
systems design, not all systems fall in to this category (eg. embedded), and
even if they do many complex systems are internal, and public-facing
interfaces are partly shielded/outsourced (Cloudflare, AWS, etc.).

4\. Existing depth in areas such as database replication could perhaps be
grouped in a generic fashion as examples of fault tolerance and failure /
issue-mitigation strategies.

5\. Asynchronicity and communication could be grouped together under
architectural paradigms (eg. state, consistency and recovery models), since
they tend to define at least subsystem-local architectural paradigms. (Ever
tried restoring a huge RDBMS backup or performing a backup between major RDBMS
versions where downtime is a concern? What about debugging interactions
between numerous message queues, or disparate views of a shared database (eg.
blockchain, split-capable orchestration systems) with supposed eventual
consistency?)

6\. Legal and regulatory considerations are often very powerful architectural
concerns. In a multinational system with components owned by disparate legal
entities in different jurisdictions, potential regulatory ingress (eg.
halt/seize/shut down national operations) can become a significant
consideration.

7\. The new/greenfield systems design perspective is a valid and common one.
However, equally commonly, established organizations' subsystems are
(re-)designed/upgraded, and in this case system interfaces may be internal or
otherwise highly distinct from public service design. Often these sorts of
projects are harder because of downtime concerns, migration complexity and
organizational/technical inertia.

~~~
donnemartin
Thanks for the feedback! I'll see if I can work in some of these suggestions.
Pull requests are welcome :)

------
phamilton
I very much want to hear the words "failure isolation" during a systems design
interview. Usually as the answer to "Why did you break that functionality out
into a separate service?". The answer should involve "independent scaling" and
"failure isolation".

~~~
gravyboat
I tried to explain this once in an interview and the interviewer was not
happy. They didn't like the idea of deploying a dozen different services. They
also didn't like my ideas around keeping the services relatively simple and
scaling out as needed. I can say I'm glad I didn't take that job.

~~~
riffraff
to be fair, it is quite possible an app does not need a dozen different
services, and using them would just introduce unnecessary complexity.

E.g. slashdot used to run with, IIRC, only 4 roles: memcached, reverse proxy,
apache, and mysql.

Your interviewer might have just failed in explaining the problem constraints
properly.

~~~
javajosh
_> Your interviewer might have just failed in explaining the problem
constraints properly._

The simpler and more common explanation is that the interviewer was looking to
have his biases reflected by the interviewee. In my experience it is difficult
for a certain kind of mind to distinguish between "This person is
stupid/crazy/wrong for the job" and "This person knows something I don't".

------
alkonaut
Does "system" here mean "system of internet services"? I'm designing large
systems and hope to learn more - but none of my systems have servers.
Anywhere.

~~~
dwringer
Indeed, I think the headline is far too general. I came here hoping to find
something from the perspective of systems theory or cybernetics.

------
arslanahmad
I recently gave interviews and did my preparation from
[https://www.educative.io/collection/5668639101419520/5649050...](https://www.educative.io/collection/5668639101419520/5649050225344512).
It was pretty useful.

------
reacharavindh
Haven't read the entire guide yet. But, I hope it has a few lines somewhere
about over-engineering a solution. Yes, fault-tolerance, Asynchronousism,
individual scalability are virtues you want, but not for a super simple
problem that needs functional work. I've been in so many discussions with
people that talk about all these virtues and speend too little a time on
making that core function do what it is supposed to do.

------
movedx
Brilliant work. I may convert this into MkDocs "formatted" project using the
Materials theme. I've done the same thing for the Open Guide to AWS which I'm
still working on. It vastly improves the readability and accessible of the
information.

~~~
Dangeranger
Can you please post a link to your Open Guide to AWS if possible?

~~~
movedx
Most of the sections are in DRAFT mode as I move them away from just being a
list of bullet points to something more readable and formatted.

[http://opsfactory.com.au/wisdom-aws-guide/](http://opsfactory.com.au/wisdom-
aws-guide/)

------
0x54MUR41
That's amazing. Thank you for creating this. It's very useful for preparing an
interview in system design.

I think you miss "Show HN" on your post.

------
javajosh
I wonder what this would look like for Erlang system designers.

~~~
cholantesh
I am totally ignorant to Erlang (though it is in my top 5 of languages to
learn); why do you think it would be different?

~~~
javajosh
I am only learning Erlang now (through the futurelearn course[1], posted here,
about a month ago now). One of the reasons it interests me is that it was
designed from the start to support high-availability concurrent, distributed
processes. It's a functional, dynamic language meaning that you can reprogram
a system at runtime if you want (see: gen_server).

Like I said, I'm only starting and I don't know how real Erlang systems are
built. However, I suspect that they tend to eschew the orthodoxy of treating a
relational database as a single source of truth, with stateless app servers
(these two features are the core of all the systems in the OP's thing) and
embrace distributed, redundant statefulness. If this can be done (without
becoming impossible to reason about) I suspect it represents an optimal server
system, in terms of resource usage, availability, and probably even in
performance in a world where most organizations' datasets can fit entirely in
RAM, from when they are a twinkle in someone's eye to their eventual
dissolution.

I should stress that "questioning orthodoxy" is something of a hobby, which
probably biases me.

[1] [https://www.futurelearn.com/courses/functional-
programming-e...](https://www.futurelearn.com/courses/functional-programming-
erlang) [2] [http://learnyousomeerlang.com](http://learnyousomeerlang.com)
keeps coming up as a good learning resource.

------
crudbug
Great resource.

Which tool is used for creating diagrams ?

~~~
donnemartin
Thanks! I'm using OmniGraffle for the diagrams.

------
pfista
This looks like a great guide, thanks! Makes me wonder how effective things
like Google's app engine are in autoscaling your web apps. "Serverless" code
seems too good to be true.

------
abraae
Nice concept!

A missing area is identity management. Most likely this should be separated
from your system (e.g. don't have a table somewhere with username, password in
it).

In consumer facing systems, OpenID Connect (better) as practiced by Google,
OAuth is used by most others.

In enterprise software, SAML is the common parlance.

That leads naturally to questions about API authorization (are API calls made
on behalf of system users? If not, start probing further).

Always enlightening to start asking questions about identity management very
early on in designing systems.

~~~
donnemartin
Good point, the security section could be beefed up some more. Pull requests
are welcome :)

------
chid
This looks awesome, is there a collection of these around other fields?

------
oferzelig
Beautiful document. Clearly written and clearly drawn.

------
sna1l
Great idea, thanks for doing this!

