
Concurrency is the New Memory Management - luu
http://arschles.github.io/2014/06/25/concurrency-mem-mgmt.html
======
overgard
Probably not a popular opinion but: I'm not really a fan of all the user-space
async stuff that's become popular (IE: the nodejs/tornado style of single
threaded apps with non-blocking IO).

Remember Windows 3.1 or old Mac OS with it's cooperative multitasking where
you had to explicitly yield or risk freezing up the entire computer? It's
basically the same thing reinvented with a nicer brand. I know I'm dating
myself here, but it's really the same model. We came up with preemptive
multitasking for a reason: because it's more robust and efficient.

Threads are dangerous, but basically you can avoid the danger for the most
part if you avoid sharing objects across threads, and if you do that then with
blocking IO your code flows linearly and you don't have to deal with callback
hell.

~~~
pron
> It's basically the same thing reinvented with a nicer brand.

Not at all. First, don't confuse preemptive scheduling with time-slice
scheduling. All "true" user mode threads are preemptive (Erlang, Go, Quasar).
Not all of them employ time-sliced preemption. In fact, when we implemented
Quasar on the JVM we had time sharing but then took it out because it didn't
gain us anything (other than increased implementation complexity). The reason
is that fibers in interactive applications follow a certain pattern that
always entails frequent blocking. You are right, though, that some types of
computations do not fit well with the user-mode threading model -- long, CPU
intensive computations. Those are best left for kernel threads that do time
sharing well. Languages that don't give you access to kernel threads might
implement time sharing for their lightweight threads in user mode (e.g.
Erlang).

~~~
jeffdavis
The parent was comparing the event model (e.g. callbacks everywhere) with a
thread/process model; not comparing user-mode and kernel-mode threads. So I
didn't follow your point.

Also, erlang doesn't use time slices, it counts "reductions" (essentially
function calls, though I don't think it's always exactly one function call).
It also penalizes processes more if they send to a mailbox that is already
very large.

Erlang is considered truly pre-emptive, because (assuming you aren't writing
your own functions in C or something) a function can't loop or use any
operators or really do anything without potentially being pre-empted.

Go is generally considered partially pre-emptive, because it only pre-empts on
a function call or during a memory allocation. You can write a simple loop
that doesn't terminate, and it will never get pre-empted.

~~~
pron
> Erlang is considered truly pre-emptive... Go is generally considered
> partially pre-emptive...

Well, in Quasar we started taking the "fully preemptive" route, but we saw
that threads fall into two categories: those that block very often, and those
that don't. Because the JVM, unlike Erlang and Go, gives you access to kernel
threads, too, Quasar will simply warn you if you're using a fiber for a CPU-
intensive operation that doesn't block often. Using reductions didn't work out
so well because a "forcefully preempted" fiber still wants more CPU, which
normally means it's doing something wrong.

~~~
TheBenjaneer
Go actually does give you access to OS threads in the runtime package in the
sense that you can easily reserve a thread for the current goroutine via
[http://golang.org/pkg/runtime/#LockOSThread](http://golang.org/pkg/runtime/#LockOSThread)
and similarly can give it back up to the scheduler with
runtime.UnlockOSThread()

------
scott_s
There is an interesting - but very readable! - academic paper that is kinda
related: "The Transactional Memory / Garbage Collection Analogy":
[https://homes.cs.washington.edu/~djg/papers/analogy_oopsla07...](https://homes.cs.washington.edu/~djg/papers/analogy_oopsla07.pdf)

I say "kinda" related because even TM is a lower-level thing than the author
of this post is talking about. The treatment of the subject, however, is much
more in-depth. Lambda the Ultimate discussion: [http://lambda-the-
ultimate.org/node/2990](http://lambda-the-ultimate.org/node/2990)

~~~
seanmcdirmid
Wow, I wasn't aware of this paper although I went to that OOPSLA. Thanks for
the link!

------
seanmcdirmid
Yep, we also need our version of garbage collection for concurrency; e.g.,
[http://research.microsoft.com/apps/pubs/default.aspx?id=2112...](http://research.microsoft.com/apps/pubs/default.aspx?id=211297)

~~~
woah
Wow, this looks interesting. No time to grok the full paper, but do you have
any examples of a simple app written with this technique?

~~~
seanmcdirmid
There are videos linked in Section 4 of the paper.

It is still in the small program stage although Glitch as a C# framework is
usable now (indeed, I've written an editor, UI, and so on, all very
concurrent). As a framework, it is not really convenient without being a new
language, but it could be interesting to do something like ReactJS using
replay and rollback.

If you like Bret Victor-style demos, I'm working on a web essay that should be
done sometime this summer.

~~~
sitkack
Need.

------
johnpmayer
You know, the embedded systems community has a heck of a lot of tools and
formal logics for designing this stuff in the abstract. Lots of little DSLs
and auto-checkers for thinking about processes and messages, that sort of
thing. I'm not so sure about model -> code generation, I think that is mostly
rolled by-hand.

For concurrency in the large, I wonder if there will be greater adoption of
these sorts of engineering techniques.

~~~
seanmcdirmid
I assume you are referring to the concurrency DSLs like Esterel? These are
quite low level with a different emphasis; I'm not sure they would scale to
larger non-embedded systems, and they like many of the niceties that
programming for larger systems can afford.

------
jeffdavis
"Languages and frameworks like Go, Akka and Erlang have come up now because
they help solve the hard networking and concurrency problems that we need to
build clusters."

What does Go offer for a cluster environment? As far as I can tell, it's still
targeted at single machines (though perhaps with many cores).

------
unoti
If you're interested in concurrency, you owe it to yourself to spend some time
working through this guide on ZeroMQ[1]. It could change the way you think
about software architecture forever, because it makes it so straightforward to
change your app into a multi-node, multi-language fabric of machines. It's
also a fun way to get your feet wet with a new language, because every
language under the sun is supported, and you can easily integrate the code in
your new language with systems written in your old language.

Which is another reason it might change how you think about architecture.
ZeroMQ makes it practical to bring in other new languages and libraries into
your ecosystem.

[1] [http://zguide.zeromq.org/page:all](http://zguide.zeromq.org/page:all)

------
stcredzero
I am not a hardware expert, but from what I understand, hardware architects
didn't get around to supporting VMs and garbage collection as well as they
might have. Systems continued to be optimized best for workloads that looked
like scientific computing in Fortran from the 80's or desktop applications
written in C++ from the 90's. Hardware architectures to support video games
and media, on the other hand, seem to have taken huge strides over the same
timeframe. Now, it seems like they are behind in terms of supporting
distributed systems on multi-core machines. The kind of contortions required
to detect and avoid problems like False Sharing indicate that today's hardware
could be a good ways suboptimal for building these systems.

~~~
anon4
> Hardware architectures to support video games and media, on the other hand

I'll have to disagree about the video games bit. Today's consoles are a
completely regular AMD x64 CPU together with a completely regular AMD GPU, one
running a version of Windows, plus one that uses PowerPC.

Yesterday's were two with multi-core PowerPC and an AMD GPU, one running a
version of Windows, plus one with that really weird PowerPC together with 7
tiny PowerPClets and an nVidia GPU that was famously awful to program for.

Going back even further, we have an Intel x86 together with an nVidia GPU,
running a version of Windows, a PowerPC-based CPU and an ATI-developed GPU
(minor note: the company, ArtX, that was doing the GPU got bought by ATI and
their designs did end up in ATI GPUs, so I'm listing it as such), a Hitachi
SuperH CPU together with a PowerVR GPU, running a version of Windows, and this
weird design from SONY.

And going back a further iteration things get even weirder and so on and so
forth.

I'll agree that the further we go, the more these systems are made from
specialised components (hello Saturn) and don't resemble what you'd get from
e.g. a Dell. But that is mostly because you needed to use specialised
components to get really cutting-edge performance for those tasks. Today you
can more or less stick in 4 x64 cores and forget about it.

~~~
stcredzero
_Today you can more or less stick in 4 x64 cores and forget about it._

That only works for embarrassingly parallel tasks. Start requiring
coordination, and today's hardware makes efficient parallelism plus
concurrency hard.

------
quarterwave
The first question to ask may well be "why does my problem need computation on
a commodity cluster?". A massive multi-player spaceship battle may be best
served by a HPC environment, using C++ and MPI. There's probably only a
limited class of distributed problems that map nicely to a commodity cluster.
Several worthy commentators have pointed out that even if the systems
programming environment offered the same verbs for both, commodity network
latency makes "distributed concurrency" very different from "in-box
concurrency". To loosely borrow a term from physics, time delays can break the
scaling symmetry of a renormalization group.

A systems architect with a sound background in mathematics may be able to
reason out the cluster performance of a social algorithm operating on a Erdos-
Renyi random graph. Plodders such as myself will try to prototype. I'll insist
that a prototype has business value if only because otherwise I have no way of
appearing to be busy.

For prototyping a distributed system on a commodity cluster, my personal
preference would be Erlang. As rvirding commented with great insight in
another post, Erlang has an OS feel to it. From my limited perspective, Erlang
spares me the trouble of knowing Unix and networking (for example: I don't
need to know what a TCP port is). Erlang gives me a minimal & consistent set
of verbs, and that's all I need for prototyping.

------
TheMagicHorsey
I feel like the author of this piece doesn't really have personal knowledge
about what he is talking about.

I use Go and I use Erlang. I've never used Akka.

Go is not in the same class as Erlang when it comes to Cloud Computing. In
Erlang the state and behavior of your entire datacenter can be contained in
Erlang itself. Erlang will take care of bringing up new processes when your
jobs break. Erlang will handle messaging between processes executing on
different machines.

Go doesn't do any of that out of the box. There are some interesting projects
like Go Circuit, which sort of kinda want to emulate Erlang's OTP and bring
your datacenter behavior into Go, but Go Circuit isn't being run in production
by anyone I have heard of. Maybe its ready to go, but it looks more like a
research project to me.

As such, Go has nothing to do with managing a cluster other than that it gives
you some nice single node concurrency features. You will need to build or
import everything else you need.

I can't think of anything else out there which is quite like Erlang/OTP. Maybe
Julia has some tools that are similar but I don't know too much about it.

Edit: For those Go programmers that are interested in what I'm talking about,
check out this concise explanation from the Erlang docs:
[http://www.erlang.org/doc/getting_started/conc_prog.html](http://www.erlang.org/doc/getting_started/conc_prog.html)

Erlang isn't as user friendly as Go, but it has a lot of stuff that can save
you serious headaches if you think you will need to scale. Premature
optimization is a waste of time ... blah blah etc., so maybe you don't need it
for your project. Also Go on App Engine allegedly scales pretty well with very
little programmer effort.

------
marcosdumay
I really can not agree.

\- non-blocking I/O \- user-space concurrency \- easy messaging \- advanced
scheduling

From those, only easy messaging actually helps creating clusters, and it can
be implemented in a library without any problem.

The other may help the language being good, or help one get more throughput
from a node, but offer no help in maintaining the consistency of a cluster.

------
CmonDev
"Languages and frameworks like Go, Akka and Erlang have come up now ... keep
lots of the syntax and semantics from the "languages of yesterday"."

Erlang is really, really old-school actually - 1986. The language of the day
before yesterday.

~~~
CmonDev
I would also look outside of the JVM box and consider C#. It stays familiar,
while having proper parallelism and concurrency constructs: async+await, TPL,
TPL DataFlow, Parallel.

Alternatively there is also F# with it's Erlang-inspired
[http://en.wikibooks.org/wiki/F_Sharp_Programming/MailboxProc...](http://en.wikibooks.org/wiki/F_Sharp_Programming/MailboxProcessor)

------
ianstallings
Seems pretty accurate. So this is probably a great time to mention my favorite
new(ish) platform for handling concurrency in a polyglot way:
[http://vertx.io/](http://vertx.io/)

~~~
signa11
isn't this same as tibco's event-bus offering ?

------
derengel
STM is not appropriated for distributed systems but why hasn't it took off
more in the concurrency world?

~~~
AaronFriel
Coordination with the real world is and always has been the pain point with
STM. It's fine if your STM universe lives entirely in a subset of your data on
a single machine [1], or within a single database [2], or with a single
decision-making unit [3].

Most languages have extremely poor support for separating interactions with
the real world from interactions that occur entirely within one serializable
context (a thread working with thread-local memory, for example). Can you
safely replay arbitrary C, C++, etc.? No, because side-effecting code could
run at any time and occur in any context. So that is one problem that has to
be solved first.

Suppose you've solved that problem. Well, now for a distributed system, you
need to make STM talk to STM. Make one module talk to another over a network,
or a filesystem, or another database, or a client browser. Do you have STM
working in the JavaScript running on client's machines? And even if you
managed that feat, do you have end-to-end STM from your datastore to your
client's actions?

Distributed STM that wasn't painful to use, either to write or in terms of
performance, would be a sort of holy-grail of distributed computing. I don't
think any language or toolchain is there yet.

[1] Haskell, Clojure, et al STM engines.

[2] SQL-compliant relational databases such as DB2, Oracle, SQL Server, as
well as distributed databases like HyperDex that support true transactions.

[3] Paxos, Raft, and other decision-making algorithms only ever externally
appear to be consistent, but are internally complex and might have an internal
tug-of-war.

~~~
seanmcdirmid
> Can you safely replay arbitrary C, C++, etc.? No, because side-effecting
> code could run at any time and occur in any context. So that is one problem
> that has to be solved first.

I spend my time solving this problem. You can do it with a programming model
especially designed for it. Functional programming is not required, though can
be convenient.

> Well, now for a distributed system, you need to make STM talk to STM. Make
> one module talk to another over a network, or a filesystem, or another
> database, or a client browser. Do you have STM working in the JavaScript
> running on client's machines? And even if you managed that feat, do you have
> end-to-end STM from your datastore to your client's actions?

STM is the wrong way of thinking about this problem, mainly because replay is
not an intrinsic part of the paradigm (instead, users manage that themselves).
Rather, go back further to Jefferson's virtual time/Time Warp system [1],
which was designed specifically in the context of distributed systems.

[1]
[http://dl.acm.org/citation.cfm?id=3988](http://dl.acm.org/citation.cfm?id=3988)

There are also many related systems like Concurrent Revisions, LVars, Bloom,
and so on...

