
Ask HN: How to learn about distributed systems when not possible at work? - askertoday
Hello,<p>I work for a startup, and worked for many before, and all of them are with the monolithic, mid-sized code bases that do great just with a conventional stack (2 or 3 big servers and simple rerouting).<p>My question is: How do people learn about distributed systems if never given the chance in a professional setting? I am sure there is a book or two to go over, but would that give me the same skill-set that would allow me to convince a company asking for experience with distributed systems to hire me?
======
ianleeclark
It seems I'm a few weeks ahead of you on this same problem. What I did was
just notice a discussion in a HN thread and reached out to the guy in the
discussion who seemed the most knowledgeable: joke's on me, that guy's the CTO
of hashicorp.

Anyway, I just asked him how I should go about in learning and he linked me to
(presumably) his alma mater's course list for distributed systems:
[https://courses.cs.washington.edu/courses/cse452/16wi/calend...](https://courses.cs.washington.edu/courses/cse452/16wi/calendar/lecturelist.html)

Moreover, he linked me some fundamentals: * Lamport Time * CAP Theorem * FLP
Impossibility Theorem * Bimodal Multicast * Paxos

What I would advise doing with those fundamentals is going to google scholar
and finding the original papers and adding them to your watch list. Then you
can see wherever they're referenced, go through the list and pick out papers
and read one a day (I advise no matter what, but that's not feasible for
everyone--just keep reading!)

I can't say I'm a guru at all in this sphere, but this is how I'm approaching
it. Hopefully you can take some of my advice and tailor it to best suit your
needs and how you learn best. Beyond that, I'd say build, build, build: start
out at your naive solution and improve over time. I've been working on a
distributed key-value storage
([https://github.com/GrappigPanda/Olivia](https://github.com/GrappigPanda/Olivia))
and once you get deeper entrenched into a problem like this, a lot of the
problems really make themselves apparent and you can learn exactly where your
naive solution differs from more optimal solutions.

~~~
elfuego
Interesting approach. I'd like to know some of the specific lessons learnt
using the naive approach and things to look out for.

------
cjbprime
I got started by doing the online MIT labs linked here:

[http://nil.csail.mit.edu/6.824/2015/labs/lab-3.html](http://nil.csail.mit.edu/6.824/2015/labs/lab-3.html)

You write a MapReduce/Paxos implementation in Golang, and then their test
suite tests your implementation. There are many tricky edge cases, so having
an automated test suite is seriously helpful.

Looks like since last year they've switched from Paxos to Raft (pretty great
idea, the Paxos papers are terrible), here's the latest URL:

[https://pdos.csail.mit.edu/6.824/index.html](https://pdos.csail.mit.edu/6.824/index.html)

I would guess that any company doing distsys work would be impressed by you
having a Raft or Paxos implementation on GitHub, enough to get an interview
etc.

------
jondubois
I would suggest looking at Rancher [http://rancher.com/](http://rancher.com/)
\- It's a great way to learn distributed container orchestration.

I was in the same boat; I started getting interested in distributed systems
about 5 years ago but I never had the opportunity to play with those cool
tools while at work so in my own time I built an open source project with a
focus on scalability: [http://socketcluster.io/](http://socketcluster.io/)

More recently, I've been implementing open source stacks to run and auto-scale
on Docker + Kubernetes (using Rancher) and it's been pretty mind-blowing. I
highly recommend playing around with those technologies - I don't have any
doubt that this is where the software industry is all heading.

Already, I noticed some Kubernetes-related job postings coming up on various
online job portals and they tend to pay REALLY well so it looks like a good
area to work towards.

I think also it's important to read stuff online about CAP theorem and also
various popular algorithms for building distributed systems like Pub/Sub,
Message Queues, Raft (consensus), data sharding (e.g. consistent hashing) and
others... I think if you start with those, you will stumble upon new material
as you go along and you can gradually build up your knowledge.

I think playing around with the advanced networking features offered by IaaS
platforms like Amazon EC2 is also a good way to put your theories (and code)
to the test.

------
pejrich
This starts in two weeks. I'm taking it too!

[https://www.edx.org/course/reliable-distributed-
algorithms-p...](https://www.edx.org/course/reliable-distributed-algorithms-
part-1-kthx-id2203-1x)

------
doug1001
Here's a complete tutorial--from concepts to step-by-step command-line
scripting. Eg, you provision EC2 instances, deploy a Hadoop cluster, and build
a simple data processing pipeline on the cluster.

this is intended primarily for data scientists; two mid-level devs in my shop
worked through it, and said it was solid.

[https://blog.insightdatascience.com/introducing-pegasus-
one-...](https://blog.insightdatascience.com/introducing-pegasus-one-does-not-
simply-pip-install-hadoop-423f1d521d29#.p5ockru3s)

