Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Learning Distributed Systems as a Junior Engineer
61 points by buzzcut_diet on April 30, 2021 | hide | past | favorite | 20 comments
Hi, I am currently a junior backend developer working on a lot of big data stuff at the moment and the technologies we use such as Kubernetes, Spark, Kafka, EMR, Redis are all very overwhelming.

I was wondering what is the best way to learn more about these distributed systems and form better mental models when dealing with them. Any resources or tips would be appreciated.

Thank you in advance! :)




The easiest path is to read Designing Data Intensive Applications by Martin Kleppman. Builds from the ground up and by the end (or after a second read) you will have a deep understanding. Book hits on all of of the tech you mentioned except Kube. For that read one of the original google papers on cluster management systems eg. Omega


Martin Kleppman also has uploaded distributed systems course lectures on his channel

https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...


While I recommend reading DDIA, I think buzzcut_diet may end up disappointed. Reasons:

- it takes a while to read DDIA. Probably around 6 months of focused reading. Perhaps more

- one can learn a really good chunk of theoretical stuff... but probably not applicable to day to day work

- zero practical experience will be gained regarding Kubernetes, Spark, Kafka, EMR, Redis

So, I would recommend a more practical approach:

- start already reading the documentation of K8s, Kafka, Spark, etc. Choose one and go for it. I would recommend Kafka since its documentation is well written

- while reading documentation of the tooling above, one will inevitable stumble upon theoretical stuff that will not be explained in detail: that's exactly when you pick up DDIA (or similar books) and try to find the topic in the index and read it.


Designing Data Intensive Applications was surprisingly detailed in terms of data storage.

I also enjoyed Release It! by Michael Nygard to learn about making distributed systems more resilient.


I want to hire Martin Kleppman!!!! URGH!!!


MIT's 6.824: Distributed Systems (taught by Robert Morris, of both Morris worm and Viaweb/Y Combinator fame) is completely open and available online, and it includes video lectures, notes, readings, and programming assignments from as recent as Spring 2021 or 2020 (including half of the lectures recorded from home as the pandemic strikes). The assignments even include auto-graded testing scripts, so you can verify your solution to the assignments.

https://pdos.csail.mit.edu/6.824/



Seconded! Very good starting point.


Actually, you could go for Erlang programming. Erlang has a distributed system that aims to run on multiple hosts in parallel.

http://erlang.org/doc/reference_manual/distributed.html

Following, as you understand the concepts, you can dive into it‘s internals. Not by studying Erlangs compiler, but by trying to solve distributed computing by yourself. For this, at first you should learn about RPC. Then have a deeper look at sockets. Definitely, you should go with C to understand these techs. High level languages are abstracting RPC and sockets way to much.


No F way I need C to understand Kafka


^ This!


Well, for starters, if you don't know what Redis is, enjoy the following link. It's fun, and you'll leave knowing what Redis is at least.

https://try.redis.io/


The best way to learn is have skin in the game. Doing something yourself will force you to do what actually works. So it seems your current professional employment is excellent in that regard.

Formal study seems to work best after real experience. I read Martin Kleppmann's Designing Data-Intensive Applications based on its inclusion in teachyourselfcs.com.[0] I did not find it useful because I had nothing to apply it to once I finished. However I don't think this will apply to you as it seems you already have some problems in mind to consider.

[0] https://teachyourselfcs.com/#distributed-systems


Besides reading a lot, I always found it useful to build toy distributed systems parts. At least the blocks: rpc frameworks, locking and transactions, leader election algorithms, consensus algorithms, etc.

Even if incomplete and many times broken, the effort in attempting at least a PoC or a toy implementation has helped me immensely in understanding the challenges and limitations of each.

It’s also important to play with real, production quality software, and stress it and understand its limitations under load or failure. Distributed systems fail in surprising ways.


None of the technologies you mentioned speak to distribution. They speak to automation in various forms.

Distribution means different things based upon who you ask, but it largely speaks to separated across one or more networks. If you really want to understand distribution model for third order consequences, such that A gives a thing to B who gives it to C. What does that mean to A if A and C share no relationship or connection? The answer to that question differs based upon the thing that is shared.

For a more practical example I wrote this point to point file distribution application. The point to point nature of the application forces privacy in that each end point knows and trusts each other. The application is also built around a windows like GUI, so user B can easily drag and drop a file from user A to user C which could violate A’s privacy or compromise C.


Exposure to production systems is a new advantage you have. You can learn a lot from the failures and problems.

If your workplace has written postmortems, read them. Ask more senior engineers about the times things went wrong. Figure out how to politely spectate, or productively involve yourself, when emergencies/incidents happen.

Try to understand the pros/cons of the choices your workplace has made (people will certainly be whining about the downsides!). How could different choices have avoided the cons? What different downsides would those choices have introduced? (Be careful not to come off as critical. It's common to see, with hindsight, that different choices would have worked out better, and sometimes anti-social to point it out).


If your company has design documents that describe not only how systems have been designed, but also _why_ they've been designed that way, then you can:

1. Read the problem statement (or context, or motivation) in a particular system's design doc.

2. Flip through DDIA (the book others mentioned) to find relevant material, and try to decide how you would solve the problem.

3. Go back to the design doc and see what your colleagues decided.

By trying to solve problems yourself, and seeing what colleagues did in the same situation, you will learn to improve your thinking.

The good thing about doing this within your existing environment is that you can ask colleagues when you don't understand "why didn't we just use X for this system?".


Good advice for the system design level of abstraction in the other comments. But do try to remember, no matter what, everything is just a computer running some code and shoving some data around to various other computers or devices. We lose sight of that a lot and consider the systems we abstract as somehow different from a computer. Avoid it as much as you can.


The best way imo is to read the design docs and code of the projects you like.


All those technologies will be gone in less than a decade. Learn fundamentals, don't worry about the current trends. Here's a good starting point: https://columbia.github.io/ds1-class/. I've also found "Operating Systems: Three Easy Pieces" to be surprisingly relevant to system design: https://pages.cs.wisc.edu/~remzi/OSTEP/?source=techstories.o....




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: