
Ask HN: How to *really* learn concurrency,parallelism? - bonobo3000
Hello,<p>I&#x27;ve been interested in concurrency for a while now and I&#x27;ve been picking up bits and pieces by stumbling across great posts&#x2F;videos once in a while or working with them at my job. So far I&#x27;m sort of comfortable with threads (not the gory JVM details though) and Akka, and a tiny bit of futures&#x2F;promises and co-operative concurrency type stuff with pythons tornado.<p>I want to go deeper, really grok all the different models and be able to make informed decisions about which model us appropriate when, understand how they map to lower levels, generally gain a comprehensive understanding. I&#x27;m currently going through Java Concurrency in Practice.<p>My question - what resources will help me get a comprehensive understanding of concurrency &amp; parallelism? what toy project would you recommend where i can implement the same thing with lots of different models to understand the difference?<p>Thanks in advance!
======
nostrademons
_Book rec_ :

[https://www.amazon.com/Concepts-Techniques-Models-
Computer-P...](https://www.amazon.com/Concepts-Techniques-Models-Computer-
Programming/dp/0262220695)

 _Language & library recs_:

Java is actually a pretty shitty language to learn concurrency on, because the
concurrency primitives built into the language & stdlib are stuck in the
1970s. There've been some more recent attempts to bolt more modern concurrency
patterns on as libraries (java.concurrent is one; Akka is another; Quasar is a
third), but you're still very limited by the language definition. Some other
languages to study:

Erlang, for message-passing & distributed system design patterns. Go has a
similar concurrency model, but not as pure.

Haskell for STM.

Python3.5/ES2017/C#, for async/await & promises. Actually, for a more pure
implementation of promises, check out E or the Cap'n Proto RPC framework.

Rust, for mutable borrowing. Rust's concurrency story is fairly unique; they
try to prove that data races _can 't_ exist by ensuring that only one
reference is mutable at once.

JoCaml for the join calculus. Indeed, learning formal models like CSP, the pi-
calculus, or the join-calculus can really help improve your intuitions about
concurrency.

Hadoop for MapReduce-style concurrency. In particular, learning how you might
represent, say, graph algorithms on a MapReduce system is a great teacher.
Also look at real-time generalizations of MapReduce paradigms like Storm or
Spark.

Paxos & Raft for the thorny problems in distributed consensus.

Vector clocks, operational transforms and CRDTs. One approach to concurrency
is to make it not matter by designing your algorithms so that each stage can
be applied in arbitrary order (or can compensate for other operations that
have occurred in the meantime). That's the idea behind this, and it perhaps
has the most long-term promise.

 _Project & job recs_:

The best way to really learn concurrency is to take a job at a company that
has to operate at significant scale. Google or Facebook are the most
prominent, but any of the recent fast-growers (AirBnB, Uber, Dropbox, probably
even Instacart or Zenefits) will have a lot of problems of this type.

Failing that, I've found that implementing a crawler is one giant rabbithole
in learning new concurrency techniques. The interesting thing about crawlers
is that you can implement a really simple, sequential one in about 15 minutes
using a language's standard library, but then each step brings a new problem
that you need to solve with a new concurrency technique. For example:

You don't want to wait on the network I/O, so you create multiple threads to
crawl multiple sites at once.

You quickly end up exhausting your memory, because the number of URLs found on
pages grows exponentially, and so you transition to a bounded thread pool.

You add support for robots.txt and sitemaps. Now you have immutable data that
must be shared across threads.

You discover some URLs are duplicates; now you need shared mutable state
between your fetch threads.

You start getting 429 and 403 request codes from the sites, telling you to
back off and stop crawling them so quickly. Now you need a feedback mechanism
from the crawl threads to the crawl scheduler, probably best implemented by
message queues.

You want to process the results of the crawl. Now you need to associate the
results of multiple fetches together to run analyses on it; this is what
MapReduce is for.

You need to write out the results to disk. This is another source of I/O, but
with different latency & locking characteristics. You either need another
thread pool, or you want to start looking into promises.

You want to run this continuously and update a data store. Now you need to
think about transactions.

~~~
PaulHoule
Java's concurrency model may be stuck in the 1970s, but it actually works,
unlike a lot of newer frameworks...

~~~
nostrademons
Most concurrency models work if you understand both your problem domain and
your tools thoroughly, and fail if you don't. Event-driven callback spaghetti,
for all of its flaws, has been used successfully in services with billions of
simultaneous users.

OP was asking for ways to broaden his concurrency knowledge, and studying
message-passing, transactional models, promises, etc. will certainly do that.

~~~
rnovak
Why are you advising him to run before he can crawl? All of those paradigms
are built on top of the 70's era threading-model as used by the JVM (as well
as every other language, behind the scenes).

How can you understand these tools thoroughly if you don't understand the
foundation that they're built on?

Yes, they probably have been used in services with billions of users, but I
guarantee those engineers understood the basics of threading just as well, if
not better, than the tools built on top of them.

OP: This video is pretty good at explaining some core concepts:
[https://www.youtube.com/watch?v=MCs5OvhV9S4](https://www.youtube.com/watch?v=MCs5OvhV9S4)

------
ruslan_talpa
Parallel and concurrent programming is easy in Haskell (because of the amazing
work on lower levels)
[http://chimera.labs.oreilly.com/books/1230000000929](http://chimera.labs.oreilly.com/books/1230000000929)

Switch to haskell and you will no longer view "parallel/concurrent" as a hard
problem that you need to solve

------
dudul
There is a book "7 concurrency models in 7 weeks". I haven't read it (yet) but
I assume that it's a good fit for what you want.

