Ask HN: Learning Distributed Systems as a Junior Engineer

mapme · on April 30, 2021

The easiest path is to read Designing Data Intensive Applications by Martin Kleppman. Builds from the ground up and by the end (or after a second read) you will have a deep understanding. Book hits on all of of the tech you mentioned except Kube. For that read one of the original google papers on cluster management systems eg. Omega

ocdbg · on April 30, 2021

Martin Kleppman also has uploaded distributed systems course lectures on his channel

https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...

ingvul · on May 1, 2021

While I recommend reading DDIA, I think buzzcut_diet may end up disappointed. Reasons:

- it takes a while to read DDIA. Probably around 6 months of focused reading. Perhaps more

- one can learn a really good chunk of theoretical stuff... but probably not applicable to day to day work

- zero practical experience will be gained regarding Kubernetes, Spark, Kafka, EMR, Redis

So, I would recommend a more practical approach:

- start already reading the documentation of K8s, Kafka, Spark, etc. Choose one and go for it. I would recommend Kafka since its documentation is well written

- while reading documentation of the tooling above, one will inevitable stumble upon theoretical stuff that will not be explained in detail: that's exactly when you pick up DDIA (or similar books) and try to find the topic in the index and read it.

bwh2 · on April 30, 2021

Designing Data Intensive Applications was surprisingly detailed in terms of data storage.

I also enjoyed Release It! by Michael Nygard to learn about making distributed systems more resilient.

jitl · on April 30, 2021

I want to hire Martin Kleppman!!!! URGH!!!

otras · on April 30, 2021

MIT's 6.824: Distributed Systems (taught by Robert Morris, of both Morris worm and Viaweb/Y Combinator fame) is completely open and available online, and it includes video lectures, notes, readings, and programming assignments from as recent as Spring 2021 or 2020 (including half of the lectures recorded from home as the pandemic strikes). The assignments even include auto-graded testing scripts, so you can verify your solution to the assignments.

https://pdos.csail.mit.edu/6.824/

alexpetralia · on April 30, 2021

I tell everyone, just read this!

https://github.com/donnemartin/system-design-primer

rramadass · on April 30, 2021

Seconded! Very good starting point.

immnn · on April 30, 2021

Actually, you could go for Erlang programming. Erlang has a distributed system that aims to run on multiple hosts in parallel.

http://erlang.org/doc/reference_manual/distributed.html

Following, as you understand the concepts, you can dive into it‘s internals. Not by studying Erlangs compiler, but by trying to solve distributed computing by yourself. For this, at first you should learn about RPC. Then have a deeper look at sockets. Definitely, you should go with C to understand these techs. High level languages are abstracting RPC and sockets way to much.

pestatije · on May 1, 2021

No F way I need C to understand Kafka

nesarkvechnep · on April 30, 2021

^ This!

Buttons840 · on April 30, 2021

Well, for starters, if you don't know what Redis is, enjoy the following link. It's fun, and you'll leave knowing what Redis is at least.

https://try.redis.io/

ipnon · on April 30, 2021

The best way to learn is have skin in the game. Doing something yourself will force you to do what actually works. So it seems your current professional employment is excellent in that regard.

Formal study seems to work best after real experience. I read Martin Kleppmann's Designing Data-Intensive Applications based on its inclusion in teachyourselfcs.com.[0] I did not find it useful because I had nothing to apply it to once I finished. However I don't think this will apply to you as it seems you already have some problems in mind to consider.

[0] https://teachyourselfcs.com/#distributed-systems

juancn · on April 30, 2021

Besides reading a lot, I always found it useful to build toy distributed systems parts. At least the blocks: rpc frameworks, locking and transactions, leader election algorithms, consensus algorithms, etc.

Even if incomplete and many times broken, the effort in attempting at least a PoC or a toy implementation has helped me immensely in understanding the challenges and limitations of each.

It’s also important to play with real, production quality software, and stress it and understand its limitations under load or failure. Distributed systems fail in surprising ways.

austincheney · on April 30, 2021

None of the technologies you mentioned speak to distribution. They speak to automation in various forms.

Distribution means different things based upon who you ask, but it largely speaks to separated across one or more networks. If you really want to understand distribution model for third order consequences, such that A gives a thing to B who gives it to C. What does that mean to A if A and C share no relationship or connection? The answer to that question differs based upon the thing that is shared.

For a more practical example I wrote this point to point file distribution application. The point to point nature of the application forces privacy in that each end point knows and trusts each other. The application is also built around a windows like GUI, so user B can easily drag and drop a file from user A to user C which could violate A’s privacy or compromise C.

andrewf · on April 30, 2021

Exposure to production systems is a new advantage you have. You can learn a lot from the failures and problems.

If your workplace has written postmortems, read them. Ask more senior engineers about the times things went wrong. Figure out how to politely spectate, or productively involve yourself, when emergencies/incidents happen.

Try to understand the pros/cons of the choices your workplace has made (people will certainly be whining about the downsides!). How could different choices have avoided the cons? What different downsides would those choices have introduced? (Be careful not to come off as critical. It's common to see, with hindsight, that different choices would have worked out better, and sometimes anti-social to point it out).

rahimnathwani · on April 30, 2021

If your company has design documents that describe not only how systems have been designed, but also _why_ they've been designed that way, then you can:

1. Read the problem statement (or context, or motivation) in a particular system's design doc.

2. Flip through DDIA (the book others mentioned) to find relevant material, and try to decide how you would solve the problem.

3. Go back to the design doc and see what your colleagues decided.

By trying to solve problems yourself, and seeing what colleagues did in the same situation, you will learn to improve your thinking.

The good thing about doing this within your existing environment is that you can ask colleagues when you don't understand "why didn't we just use X for this system?".

Olreich · on May 1, 2021

Good advice for the system design level of abstraction in the other comments. But do try to remember, no matter what, everything is just a computer running some code and shoving some data around to various other computers or devices. We lose sight of that a lot and consider the systems we abstract as somehow different from a computer. Avoid it as much as you can.

mnkmnk · on April 30, 2021

The best way imo is to read the design docs and code of the projects you like.

financialize · on April 30, 2021

All those technologies will be gone in less than a decade. Learn fundamentals, don't worry about the current trends. Here's a good starting point: https://columbia.github.io/ds1-class/. I've also found "Operating Systems: Three Easy Pieces" to be surprisingly relevant to system design: https://pages.cs.wisc.edu/~remzi/OSTEP/?source=techstories.o....