
Concurrency in Python: CSP and Coroutines - yingw787
https://bytes.yingw787.com/posts/2019/02/09/concurrency_with_python_csp_and_coroutines/
======
deckarep
About 6 or 7 years ago I was searching for a language that had great
concurrency built in and a sane enough model of concurrency that would allow
me to get things done.

Eventually I found and fell in love with gevent because I could write fairly
straightforward code in a synchronous fashion and reason about the code a lot
easier than other models. But then I found some warts with gevent: what about
good community support? What about exploiting parallelism? Monkey patching 3rd
party libs is a bit ugly and not 100% without faults.

So imagine how I felt when I found that Go offers this model of concurrency
CSP with support for parallelism due to the M to N threading model. Also a not
quite but close enough Python looking syntax and at that point I never looked
back.

Sure there other models of concurrency and other languages that are also
proving to be useful. But Go has been a good bet...at least for now.

And I know this is controversial...but Go’s time is running out...other
languages are getting there with more advanced support for asynchronous
programming...so I imagine in due time I’ll be onto the next thing.

~~~
yingw787
Hey thanks for commenting and sharing!

My experience with golang was pretty short; this was back in the days of `go
get` and I just couldn't get used to the dependency management system. I think
others have had the same problem: [https://bluxte.net/musings/2018/04/10/go-
good-bad-ugly](https://bluxte.net/musings/2018/04/10/go-good-bad-ugly)

IMHO a language is a combination of different factors: the community, the
spec, the libraries, the support, the toolchain, etc. and the concurrency
model is only one small part of that. If your concurrency model is CSP, the
truer that statement may be, because CSP allows you to write highly stateful
code, and because given time all user code trends towards the properties of
the language and CSP doesn't really enforce a whole lot.

------
anaphor
Check out [https://www.pykka.org/en/latest/](https://www.pykka.org/en/latest/)
as well which is based on the actor model. My understanding is that it's used
by [https://www.mopidy.com/](https://www.mopidy.com/) and actively developed
by them.

~~~
yingw787
Hey thanks for commenting! I checked out pykka in a previous blog post about
actor models:
[https://bytes.yingw787.com/posts/2019/02/02/concurrency_with...](https://bytes.yingw787.com/posts/2019/02/02/concurrency_with_python_actor_models/)

I did a deep dive on thespian instead, I think it’s used at scale by GoDaddy.
One of the issues I ran into with both libraries is that there’s not much of
an OTP-like library built on top of them, which makes them difficult for
production usage.

~~~
anaphor
By "OTP-like" library, do you mean basically supervision trees and things like
that? I think one of the issues Pykka (not sure about Thespian) would run into
is that they're using python threads as one of the backends, and I can't see
how you could implement this safely on top of that.

~~~
yingw787
Yeah, some framework to encourage reactive design patterns.

At least for Thespian, it allows you to use different system bases like
multiprocTCPBase or multiprocUDPbase to run your actors on, which all inherit
from a common systemBase class, so you can swap out your base without changing
your user code. It's nice for testing b/c you can explicitly split debugging
overhead into different parts based on the systemBase you use.

But yeah, one of the difficulties of using these Python concurrency libraries
is that the language and runtime don't really help you out. You have to
retrofit a lot of your own stuff on the language to enforce certain properties
of the model.

------
ngrilly
Great article!

Reminds me of a discussion with Guido van Rossum about adding Go's goroutines
to Python:

[https://groups.google.com/d/topic/python-
tulip/BO3KPIgQ_x4/d...](https://groups.google.com/d/topic/python-
tulip/BO3KPIgQ_x4/discussion)

~~~
yingw787
Thanks for commenting and sharing! While I was writing this up I saw this
Google Groups discussion about difference between Python's coroutines and
goroutines: [https://groups.google.com/forum/#!topic/golang-
nuts/Onswx7Fp...](https://groups.google.com/forum/#!topic/golang-
nuts/Onswx7FpdxY)

Russ Cox also makes Guido's point of how channels are baked into golang's
runtime, whereas a Python channel would be implemented as an object on top of
the runtime. Very interesting to see how things similar at face value differ
underneath the hood.

------
iso-8859-1
I don't understand the goal of this post, it seems like it is just a list of
short summaries of whatever libraries have CSP in the their name. What problem
are you trying to solve?

Are you trying to number-crunch across many machines? In that case, I guess
MPI is standard, but how is CSP related to this?

If you're trying to concurrently download ten files, why not just use
asyncio/trio and async/await? You're mentioning await/async, but you're not
explaining how it is relevant to either MPI or CSP, or whatever this blog post
is about.

You claim that CSP helps "efficiently abstracting away dependencies on
underlying hardware", but how is this relevant to CSP? Any decent piece of
software is portable, it isn't a property of CSP. "CSP provides [abstraction
from hardware primitives] by multiplexing and scheduling coroutines on top of
a CPU thread pool". Well, which concurrency frameworks don't do that?

~~~
yingw787
Hey thanks for commenting!

This is mostly an academic exercise for me personally. I’m not familiar with
different concurrency models as I’m starting off in my career and don’t get a
whole lot of exposure to it in production. I was inspired by Paul Butcher’s
book on seven concurrency models in seven weeks, where we actually does solve
problems with different concurrency models in languages that have good support
for them. So I thought if I could write down my thoughts and explain it to
other people I will have gained some sticky understanding for me.

I think Paul took a year or more in order to research and write his book, and
it shows. Conversely it shows how my posts are sort of link aggregations
because this series has taken about two months so far. Python’s also not
terribly great for CPU-bound concurrency work in Python code.

As to your points, I saw MPI and async await as primitive constructs on top of
which you could build something like CSP. Not all software is portable; one
example, if you want MKL acceleration in numpy, you have to use Intel CPUs.

The next blog post I’m writing I’m writing about is hardware-based
parallelism. IMHO this is Python’s bread and butter, and I should be able to
demonstrate how Python comes in handy with code samples.

------
melan13
Great article. mpi4py is the ideal solution presented, but I am wondering how
hard the error tracking will be on Python since MPICH2 doesn't help much
natively.

~~~
yingw787
Hey thanks for commenting! Yeah I think with CPU bound work a lot of the
popular effective Python libraries are going to be mostly C-based with Python
bindings. I’m not sure how error handling between Python and C currently is
but for Python and Java it is difficult to capture even logs thrown from one
side to the other for various reasons.

I was reviewing gevent (greenlets plus event loop) but that didn’t make it
into the post, I’ll take another look at mpi4py!

