
Sane Concurrency with Go - tzz
https://blog.mozilla.org/services/2014/03/12/sane-concurrency-with-go/
======
pron
When we implemented the Go and Erlang lightweight-thread approach for the JVM
in Quasar[1], we quickly realized that CSP/actors, while going a long way
towards sane concurrency, are not sufficient. Every application eventually
needs some shared mutable state (even if it's implemented at the database
layer). One approach is to have a single fiber (or goroutine) in charge of
managing state (Clojure does that with agents), but that doesn't scale for
multiple writers. Neither does a global lock over the entire data store.

Good concurrent data structures that allow multiple concurrent writers are
necessary. Java has some good basic ones like ConcurrentHashMap,
ConcurrentSkipListMap, and ConcurrentLinkedQueue. Clojure takes that a step
further with transactional refs. For more interesting kinds of data it's
better to have a good in-memory transactional database.

So for truly "sane concurrency" you need both lightweight threads with CSP
and/or actors, _plus_ a transactional, concurrent (preferably in-memory) data
store[2].

[1]:
[https://github.com/puniverse/quasar](https://github.com/puniverse/quasar)

[2]:
[http://blog.paralleluniverse.co/2013/10/16/spaceships2/](http://blog.paralleluniverse.co/2013/10/16/spaceships2/)

~~~
axman6
Sounds like you've basically explained several of the common concurrency
features of Haskell. Haskell has the advantage that any pure data structure
(ie, not using mutable references, which is almost all data structures we use)
can be used concurrently and atomically using IORefs and atomicModifyIORef. If
you need transactional guarantees, then software transactional memory (STM) is
just as easy to use. For both these situations, it's laziness and purity that
gives us these things essentially for free.

Of course you could create more advanced concurrent mutable structures as
other languages have, but IORefs and STM get you a very long way when coupled
with GHC's extremely light weight threading and high performance IO (which is
getting even better in GHC 7.8, to be released soon).

To learn more, Simon Marlow's excellent book is available for free online [1]
and in dead tree and eBook forms from O'Reilly. It's very clear, with a great
balance of how to use things practically, how they work, and how to make them
work their best.

[1]
[http://chimera.labs.oreilly.com/books/1230000000929/index.ht...](http://chimera.labs.oreilly.com/books/1230000000929/index.html)

~~~
tel
There are also things like acid-state [0] if you want transactional, in-
memory, and persisted data stores.

[0] [http://hackage.haskell.org/package/acid-
state](http://hackage.haskell.org/package/acid-state)

------
shanemhansen
Go has some great high level concurrency primitives with channels and
goroutines, but honestly the first function could have been written by
sync.Mutex.Lock/defer sync.Mutex.Unlock.

~~~
voidlogic
I was about to say this, using channels when want you want is a Mutex will
result in awkward code.

>One of Go’s selling points is that there are some very useful concurrency
primitives baked right into the language.

Lets state this, not use any of said primitives, and then complain the
language is broken...

~~~
rakoo
Although more verbose, I find this pattern much better than mutexes.

The problem with mutexes is that they are some kind of cheap way for the
developer to say "stop the world while I think", but they require you to be
quite aware of what happens in an object, too much I think.

In this example you'd have a mutex for the Account, so each time you want to
change the amount you need to lock and unlock it. That's trivial for an object
like that, but it can quickly become complex: one mutex is not enough because
it blocks the whole object, so you start having multiple mutexes. But then you
don't remember which mutex blocks what so you must have some good
documentation about that, but the documentation is not as well maintained as
the rest so it starts rotting. And then your file spreads among so many lines
you can't quite grasp all the ways parallelism can kill you.

In the given example, the mindset is different: _all_ methods called from the
outside are non-modifying, they only stack the change to be made. The actual
data manipulation is done in a very tight loop that does a very simple thing,
and nothing outside of this loop ever deals with memory. As a developer, you
can fit all the code that can be "dangerous" in a mere 7 lines of code,
instead of having it spread among the file.

I concure it's a different mindset, quite different at that, but it's a pretty
good one (ang go makes it very pleasant)

~~~
tomsthumb
This is nifty perspective. Thank you for the useful info.

------
AaronFriel
This is very, very familiar. I'm working on a library in which I'll be
implementing something quite similar very soon. It actually reminds me quite a
lot of the actor model - as each event loop is essentially an actor which
receives and sends messages through channels. Background follows:

I'm the author of a Haskell library[1] for interfacing with HyperDex[2], a
distributed database produced by some folks at Cornell.

The client-side library for HyperDex uses an event loop, and is "thread-safe"
to call into as long as you synchronize access to the pointer. This is an
awkward area to work with in Haskell-land, as the code I write has to deal
with the intersection of the three uglies:

* Foreign function interfaces (and marshalling)

* Mutable resource management (allocating and freeing C-structs)

* Concurrency (synchronizing access to an object)

I've tried a variety of implementations. To speed up testing, I wrote a "fake
HyperDex" client and test harness in which I can insert chaotic behaviors to
test resilience.[3] While the code isn't as clean as I like right now, it
lends itself to the smallest, most compact implementations of what I want.
Each HyperDex connection pointer is paired with an event loop which receives
requests for calls into HyperDex and processes them. When a response is
demanded, the loop begins a busy-wait (will soon be replaced by an
epoll/select on an fd) on results from the database. Each asynchronous request
is an event loop too - a free floating closure sitting in memory that sends
off a message and waits for responses.

GHC's garbage collector will determine when waiting on an MVar or Chan is
deadlocked, and keeping everything in a ResourceT monad ensures that the whole
system gracefully closes when portions fall out of scope and are unreachable.

The approach has a certain elegance to it. I am curious to hear what other
people's thoughts are on such implementations though, because there are
naturally performance implications.

[1] [https://github.com/aaronfriel/hyhac](https://github.com/aaronfriel/hyhac)

[2] [http://hyperdex.org](http://hyperdex.org)

[3] [http://lpaste.net/101084](http://lpaste.net/101084) \- a hodgepodge of
code I wrote to test implementations - ill-documented, this is just for
personal exploration and testing

~~~
rdtsc
> The client-side library for HyperDex uses an event loop, and is "thread-
> safe"

That is a common misconception. Or rather it is a tautalogy. No threads =
thread-safe. But, it is not concurrently-modifying-data-structures safe --
which is the main painful point.

One can get just as easily tangled over a set of callbacks.

Here is a set of callbacks all started from some select/poll/epoll loop. Some
call it a reactor (namely Glyph's own Twisted Python).

cb1 -> cb2 -> cb3|eb3 then cb3->cb4 and eb3->cb5

Processing starts with cb1 and ends with cb4 or cb5. Notice at some point cb2
function could result in generating an errback (eb3) which then ends up
calling another callback cb5.

Understanding that the above, in a large system is just a messier, uglier
concurrency structure than a thread/goroutine/task/actor is crucial.

It doesn't necessarily save you from simultaneous access to same shared data.

Imagine processing starts at cb1 and by the time it reaches cb3 (say cb2 calls
some io or sleep operation), cb1 gets called again. cb1 through cb3 end up
modifying some shared data (hey no need for lock, we are using callbacks
remember!). Now there are two callback chains modifying shared data.

Yes you need locks and semaphores with the above just as you do with threads

For example this exists -- Twisted's own Sempahore:

[http://twistedmatrix.com/documents/10.1.0/api/twisted.intern...](http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.defer.DeferredSemaphore.html)

I had to use it, and not just for throttling concurrency, but also to protect
critical data from being modified concurrently.

Asynchronous/callback/promise/future based concurrency looks really good in
small examples. In large application they get messy quickly.

Threads/actors/goroutines etc are still nicer from a logical, application
point of view. You can even build them on top of the same epoll/select/kqueue
system calls if the language can support some kind of a coroutine structure
(which for example python gevent/eventlet) is doing.

~~~
AaronFriel
I am unsure how to map your reply to my concept of HyperDex's event loop (and
C API). You are of course correct, you have to synchronize access to some
object which is a pain point.

What I am doing in my implementation of a thread-safe wrapper around HyperDex
is similar to what you describe at the very end of your post.

~~~
rdtsc
> I am unsure how to map your reply to my concept of HyperDex's event loop
> (and C API).

I just generalized about "thread safety" and async programming. So it was
nothing specific to your particular code.

------
cronos
I believe you have a race in that code on error channel. You can't guarantee
that the value you send on it will be received by correct receiver. If you
have 2 concurrent transfers and one errors and the other succeds, second one
may receive an error intended for the first one.

~~~
zeeboo
Since the channels are unbuffered, theres no way to read from the error
channel until the loop has accepted your request for the transaction. Also,
since it won't accept another transaction request until it has sent an error
(it always sends an error value), you always read the error for your
transaction.

Any buffering in the request channels would have the race you've described
though.

~~~
cronos
You're right. Thanks for explanation.

------
stcredzero
I've come up with a way, using Clojure, to have a discrete game loop, but
still have multicore concurrency and parallelism. This gives one the
simplicity of a cooperative multitasking model, without having to do explicit
yields. It comes at the price of not being able to achieve full use of all of
a machine's cores in a single instance, but I don't think this is necessarily
a problem. There's this thing called an OS, which would let me achieve nearly
full utilization with only a few instances. And by a few, I really do mean
single digit numbers. For me, this is preferable, since it eliminates a single
point of failure. (To be fair, Erlang also eliminates this if one uses its
distribution capabilities intelligently.)

~~~
minikomi
Care to discuss this in more detail? I'm intrigued.

~~~
stcredzero
The world is divided into sub-grids, which are all processed in parallel,
minus movements that cross sub-grid boundaries. At the end of the parallel
processing, all of the sub-grids are reassembled into a world-grid. (Which
takes O(n*Log32(n))) The world-grid acts as a "global" processor and processes
all remaining movements.

Basically, everything is parallel, except that which has to be processed
globally, which is just a tiny minority of everything. Then there is a non-
parallelized step that knits together and coordinates the global grid. If you
plotted parallelism over time, it would probably look like a saw tooth or a
square wave.

Right now, there are only two tiers, but this could be generalized into a
structure like an R-tree, which would probably make the algorithm cache
oblivious.

It's the "knitting" step that keeps it from utilizing 100% of the cores in
parallel with just one instance. I still hope to be able to support 1000's of
entities with just one instance, however.

~~~
minikomi
Very interesting, thank you.

------
lazyjones
The race detector is great for such issues. But first and foremost, people
ought to read the very enlightening presentations about Go's concurrency
primitives and conventions:

[http://talks.golang.org/2012/concurrency.slide#1](http://talks.golang.org/2012/concurrency.slide#1)

[http://talks.golang.org/2013/advconc.slide#1](http://talks.golang.org/2013/advconc.slide#1)

~~~
kyrra
For reference, here's an article explaining the race detector[0].

[0]
[http://golang.org/doc/articles/race_detector.html](http://golang.org/doc/articles/race_detector.html)

------
kylebrown
Adventurous web developers can try out goroutine-like concurrency patterns in
javascript, with ES6 generators. I've enjoyed experimenting with these two
libraries:

[https://github.com/ubolonton/js-csp](https://github.com/ubolonton/js-csp)

[https://github.com/odf/ceci-core](https://github.com/odf/ceci-core)

~~~
wcummings
[https://github.com/visionmedia/co](https://github.com/visionmedia/co)

------
jaekwon
This is insane.

Go provides you with locks for those cases where you actually want to use a
lock. And transactional balance logic absolutely screams for locks.

------
Intermernet
"The spaces between each command become endless voids of darkness from which
frightening Heisenbugs arise."

This just joined my favourite quote list.

------
r3m6
Alternative: The iMacros addons for Firefox, Chrome or IE. Not exactly an
autofill replacement, more like a macro recorder for web browsers.

------
redbad
The code is super awkward, I'm pretty sure buggy, and unfortunately
illustrates that the author has only a superficial understanding of Go idioms
:(

~~~
nolok
Without actual example of what he does wrong and how it should be done, your
post has very little value to a reader like me

~~~
Jabbles
e.g.

sync.Once() instead of init()

unhelpful named return parameters

didn't use range over channels

~~~
namelezz
sync.Once() instead of init(). Having the Do closed to where the channels
being used makes sure we do not run into deadlock. It's a nice pattern for
lazy initializing of channels and goroutine too.

