
Celery, non-blocking code and quest against coroutines - mdomans
http://blog.domanski.me/how-celery-fixed-pythons-gil-problem/
======
asksol
Author of Celery here (not the article).

Celery solves a different problem: it's a distributed system that will help
you run your tasks on multiple machines, not an alternative to
asyncio/twisted/tornado etc.

The GIL is often easily worked around by starting N instances of your app, but
that doesn't work for all applications (games, video, audio). Celery won't
help you in those cases, but you can write C extensions. 99.99% of the time
you shouldn't even consider using posix threads in Python, as most libraries
(including popular ones) are not thread-safe, resulting in spending precious
time fixing tricky bugs.

Celery is also not in contrast to coroutines, actually it's quite common to
use Celery as a distributed layer on top of async I/O frameworks (top tip:
routing CPU-bound tasks to a prefork worker means they will not block your
event loop).

Another thing, the article says `scrape_url.subtask(args=(url,)),` is not very
readable, but the idiomatic way to write this is: `scrape_url.s(url)` (yup we
have a one letter method name, Django has Q we have .s). See more examples
here:
[http://docs.celeryproject.org/en/latest/userguide/canvas.htm...](http://docs.celeryproject.org/en/latest/userguide/canvas.html)

~~~
mdomans
In all honesty .subtask() is more readable for me than .s()

Also, I don't want Celery to replace asyncio and/or threading.

I was trying to point out that the programming model can be back ported to
Python as a way of approaching concurrency, which, without having to support
"locks everywhere" is simpler to implement with GIL

~~~
asksol
I think it's more readable in isolation, but when you have workflows it
distracts you from what it's actually doing, making it harder to see what the
purpose is at a glance. An API can probably not have many shortcuts like this,
but you learn what .s does once, and then you barely notice it.

But that's my opinion and I have come to learn not everybody value
succinctness in code, so you have the choice of both :o)

Had I not been constrained by backward compatibility I may have made it so
that task(arg) only defines the signature of a task invocation and you'd need
to do task(arg).delay() to call it remotely, and task(args)() to call it as a
function locally.

~~~
mdomans
Not merely a choice of mine to make - working in a team actually makes
.subtask better

------
garethrees
The article makes claims that are way too broad: "fixed Python's GIL problem",
"obsoletes GIL and coroutines". Maybe the claims are reasonable when
addressing the author's particular use case, but they are not true in general
and phrasing them in this way seems designed to provoke.

It's important to be clear about which use cases are addressed by a given
system. For example, if you have compute-bound tasks that operate on non-
shared data then it makes a lot of sense to distribute them via a
multiprocessing system like Celery.

But if you have I/O-bound tasks that operate on shared data then it doesn't —
implementing the necessary locking and communication is very hard. In this
case, coroutines are simple to program and reliable, and because you're
I/O-bound you're not losing anything by being single-threaded.

~~~
mdomans
Here's the deal, most tasks I've seen are both CPU and I/O bound, only in
different part of the same task. Of course you can very aggressively optimise
the design, but sometimes it becomes so unreadable, the cost you incur through
complication makes it not worth the effort.

What I'm trying to advocate is that the distinction of CPU bound and I/O bound
makes sense in CS classroom, but most of the tasks in production code are a
mix of both.

Therefore, the distinction is really between blocking and non-blocking, and
Celery enables "almost readable" non-blocking coding

~~~
garethrees
Thinking about where the computation is bound is only one half of the space of
use cases: you also have to consider the shared/non-shared axis. There's a 2x2
matrix of task types:

    
    
                       non-shared       shared
        compute-bound  multiprocessing  ???
        I/O-bound      either           coroutines
    

I put ??? in the upper right because as far as I know Python doesn't have any
general solutions here.

Also, if "most tasks you've seen" are compute-bound at some points, then of
course it makes sense to use Celery! But you should bear in mind that other
people may have seen other kinds of task.

~~~
mdomans
Well, the point I'm making is that I don't see why we we can't have a simple
API in Python to define work to do, e.g. access something over the web,
without making 30 decisions about the implementation details.

And it's not that I think Celery is the best. It has huge drawbacks.

I merely suggesting that the model Celery and GCD enforce is much better for
writing CPU, I/O and CPU+I/O bound code.

~~~
garethrees
You're not addressing the point about shared data. How do I program my Celery
tasks to operate safely on shared data?

Well, I need a system that is responsible for maintaining the shared data, and
I have to program my tasks to send it queries and updates, and receive the
results, and so probably I'm going to need serializers and deserializers for
all my data structures, and if the tasks have transactions (which they almost
certainly do) then I'm going to have to implement locks to maintain the data
integrity. Whatever it is, this system looks a lot like a database.

Whereas with coroutines each task just updates ordinary data structures in
memory. No need for communication, queries, serialization, or locks.

------
forgottenpass
Design patterns are just that, design patterns. With upsides, downsides,
suitable use cases and unsuitable use cases. I was originally going through
pointing out all the nonsense that was outright wrong or just ruffled my
feathers, but this whole article is punching above it's weight. And boils down
to the following:

The author doesn't like using concurrent programing techniques in a single-
threaded process.

The author does like a job queue that manages task execution across multiple
processes.

This has fuck all to do with the GIL beyond the fact the presence of the GIL
might have contributed to the popularity of the former in python and that
celery uses multiprocessing over multithreading.

Designing a piece of software is more than cargo-culting the latest fads. Who
cares if asyncio exists? Yes, we "really need" it to the extent that people
want to design their programs like that. No, you don't "really need" to use
it.

~~~
markbnj
The title is off I agree, it's not like Celery fixes cpython's problematic
data structures. I don't think it was a bad article, though.

~~~
forgottenpass
I wouldn't have minded reading someones process of learning a new design
pattern if it wasn't asserting so many things that are silly or flat out wrong
during the process. That's why I originally wrote, "this is a bad article." I
edified that bit out because it detracts from my point unless I take the time
to go through and fisk the article.

~~~
mdomans
Well, I'm more about advertising the lockless programming pattern :)

Point out the silly, please, I like it when someone actually makes honest
effort to teach me something.

~~~
scott_s
The title, mainly. Most readers will take that to mean that you fixed the GIL
itself - that is, removed it - rather than are presenting a pattern that
avoids it. Your title sets up an adversarial relationship with such readers,
because their attitude when they start is going to be "no, you didn't", which
will cause them to miss your actual message.

~~~
mdomans
Fair point.

------
sametmax
The TL;DR (tl;dr: this explains how Celery obsoletes GIL and coroutines) is
only telling half of the story.

Celery solves some cases for Python:

\- background tasks; \- tasks queues; \- cron tasks.

And it's nice, I used it extensively. But in NO WAY it totally solves the GIL
problem nor does it render coroutines obsolete.

Here are some celery limits:

\- you need to calibrate the number of worker you have to match your workload.

\- You are limited to x number of blocking operations, where x is the number
of workers. It means you cannot do massively parallel I/O such as network
operations with it (e.g: a web server).

\- Tasks don't have access to your main process memory, and vice versa.

\- Tasks cannot communicate with each others;

\- You must juggle with the workflow of your tasks (is it ready ? it it dead
?). You can use await stuff() with a try/except;

\- Celery is an additional process to setup and start, with backends to choose
from and tuning to do. It's a lot more work than just importing an async lib.

Granted, celery is a very useful piece of software, but not the silver bullet
this article depicts.

~~~
asksol
> \- You are limited to x number of blocking operations, where x >is the
> number of workers. It means you cannot do massively >parallel I/O such as
> network operations with it (e.g: a web >server).

Celery can use eventlet/gevent instead of multiprocessing for executing tasks,
so this should be possible (granted, not sure if using it as a web server is a
great idea)

>\- Tasks cannot communicate with each others;

This is not true, they can send messages to each other

>\- You must juggle with the workflow of your tasks (is it ready >? it it dead
?). You can use await stuff() with a try/except;

If you have to juggle it means your workflow design is not good enough. I.e.
you shouldn't wait for other tasks, you should have callbacks and errbacks.

Also, note that a try/except does not guarantee the operation will be
completed (e.g your asyncio app can be killed).

I'm not sure there's much point in comparing these, they are wildly different
concepts: Celery is a distributed system, asyncio is for async I/O, the GIL is
only a problem if you can't start n instances of your app.

~~~
sametmax
> This is not true, they can send messages to each other

This is a very limited type of communication. Asyncio with stuff like
crossbar.io allow pub/sub.

> I.e. you shouldn't wait for other tasks, you should have callbacks and
> errbacks.

This is the old way of doing something async. Callbacks are hard to reason
with, compared to async/await.

> (e.g your asyncio app can be killed)

So can your celery workers. It happened to me many times. What's more, celery
setups can cause latency problems which will raise errors like something has
died, but no, so very hard to debug.

> I'm not sure there's much point in comparing these, they are wildly
> different concepts: Celery is a distributed system, asyncio is for async
> I/O, the GIL is only a problem if you can't start n instances of your app.

Well, the main point of the article is that celery is solving the GIL. It's
not, it's bypassing it, providing very important benefits and drawbacks, and
can be used nicely for a specific set of tasks. I just want to hilight that.

Asyncio does help to live with the GIL since it unifies threads, coroutines
and multiprocessing with the same await/async interface. And asyncio can be
used to create distributed system even better than celery setup because of
this. Again, crossbar.io comes to mind.

~~~
asksol
>This is a very limited type of communication. Asyncio with stuff like
crossbar.io allow pub/sub.

Celery also supports pub/sub, and other topologies.

>So can your celery workers. It happened to me many times.

With the major difference that your tasks can be redelivered to a different
worker, and so will complete anyway.

>Well, the main point of the article is that celery is solving >the GIL. It's
not, it's bypassing it

I was agreeing with you there, but I guess my reply was not clear on that. I
just wanted to point out some inaccuracies in your reply.

>coroutines and multiprocessing

Be careful using the multiprocessing module, it has some very serious bugs.
I've spent the last 4 years rewriting parts of it for Celery

~~~
asksol
>Be careful using the multiprocessing module, it has some very serious bugs.
I've spent the last 4 years rewriting parts of it for Celery

I regretted this as soon as I submitted it. I would hate for someone to do the
same thing to my projects so I should know better. I've written about it
before, but realize that you probably have not read it :)

I really like the multiprocessing library, it helped me start Celery in the
first place. What it tries to solve is actually very very complicated, and you
would have to test it on production systems for years to be sure it works, and
I think Celery was the app that did that testing. I contributed some fixes
back into Python, but most of it is not merged upstream.

The most complicated issue I had to solve was that multiprocessing.Pool uses
POSIX semaphores to share pipes between processes (that's how the pool
processes receive jobs, and the parent receive results). If a child process is
killed before releasing that semaphore you have a deadlock that's tricky, if
not impossible to solve. So I rewrote the pool to use async I/O instead, which
also had the side effect of drastically improving performance (no locks).
Sadly I'm not sure how to implement that on Windows, so it's unlikely to be
merged upstream. Other fixes and features used by Celery is available in our
billiard (on PyPI) fork of multiprocessing, but the async pool is not part of
that yet as it currently depends on code in celery that does not fit in
billiard (it should be rewritten to use asyncio now).

You can claim to replace Celery using a small layer on top of async I/O, or
claim to replace Celery with a simple Redis list operation, but I think that's
unfair to all the work that went into Celery, and the other features Celery
implements like monitoring, workflows, and a large list of other things that
you don't immediately think of when starting a project. It keeps a repository
of these patterns for the Python community, and even something like
crossbar.io could be supported as a transport.

~~~
sametmax
I would never claim that you can use "a small layer on top of asyncio to
replace celery". I've read celery codebase, it's very, very thorough. Also in
the future, I may even try to integrate celery in the asyncio event loop so I
don't have to start a separate process.

------
zzzeek
> this explains how Celery _obsoletes_ [emphasis mine] GIL and coroutines

> Celery is a whole project and there's tons of coordination before you fire
> your first task - it's true.

well that's why it's not really something that "obsoletes" threading (or your
fancy-pants you-kids-today concurrency system of choice). Geez can people no
longer write without hyperbole as a default position?

~~~
coldtea
> _Geez can people no longer write without hyperbole as a default position?_

Oh, the irony.

(One or a few articles are hyperbolic != "people can no longer write without
hyperbole as the default")

~~~
JustSomeNobody
One or a few? Come on. Most articles to day are this way. And programming
today is hyperbole induced cargo-cultism if it's anything at all.

------
wacowz
Every time I read an article like this, all I can think is, this is what
Elixir and Erlang are built for. I love Python and use it on a daily basis,
but developing concurrent systems in Python is beginning to look like a
questionable choice, while there exists tools better suited for this. Celery
works great, and the integration with django is superb. However, it's
consistently been a memory hog for us. Hell, we had to write a shell script to
regularly check for memory use by celery and take necessary precautionary
steps.

~~~
brianwawok
What level of concurrency is required though? 99.9% of web request only needs
concurrency at the user level, which standard python + fork will do just
nicely.

Making a realtime system to crunch some massive numbers in 50 micros? Then you
might want some C or Java concurrency. But much of the world is not coded at
that level.

~~~
wacowz
It's often the case that some work needs to be done outside of the
request/response lifecycle, eg external api calls, data aggregation, sending
notifications etc. Most of it is essentially IO bound. We happen to have lots
of such tasks and therefore use celery, but there's only so much celery can
do, given it's concurrency level is bound by the number of cores available
since it uses multiprocessing. An actor based approach or sth like Go's
goroutines would be a better fit.

------
sciurus
Celery boils down to a nice abstraction over forking another process to do
your work. You still have the scalability problems that might make you want to
write concurrent single-threaded code, just at a different layer of your
stack.

Take the author's example of scraping 1000 webpages. Let's say their computer
running celery has enough memory to run 50 celery processes. This means they
can request 50 web pages concurrently. Most of the time those celery processes
are going to be sitting idle, waiting on the remote web host to respond. This
is terribly inefficient compared to coroutines with asyncio/aiohttp, or even
using a thread pool since urllib will release the GIL and let another thread
run when it's blocked on network i/o. You could have a single process
performing the work just as fast as 50 celery processes.

~~~
asksol
Celery can use async I/O too! Using the gevent pool you can have e.g. 4 celery
worker instances with 1000 threads each.

~~~
sciurus
Right, but as you mention elsewhere in the thread there are caveats, like
discussed at
[http://docs.celeryproject.org/en/latest/userguide/concurrenc...](http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#concurrency-
eventlet) . At this point the blog post author's contention that celery is an
better alternative to doing async i/o breaks down.

------
denom
Celery's pretty nice, I've used it in several projects with rabbitMQ and it's
useful for IO bound workloads. The main drawback with it is the overhead of
dispatching the job and waiting for the result. With threads, there's no
overhead of starting up a separate process. That said, I'd recommend celery to
anyone where latency is not a problem.

~~~
kerkeslager
Could you explain what you mean by "The main drawback with it is the overhead
of dispatching the job and waiting for the result."?

~~~
denom
Well there are a couple drawbacks. The author is describing celery as a
solution to the GIL thread lock problem. However the time it takes to dispatch
the job[1] and fire up the worker is much greater than the time it takes to
create a thread. So while celery is _a_ solution to the problem, it's not an
equivalent solution. Just something to be aware of.

The type of computation you're modeling is limited by the async/dispatched
nature of celery. Threads can communicate data structures to each other. With
celery, that's awkward with the latency and async semantics.

[1] I've used rabbitmq to dispatch jobs over to celery in the past, here's a
good overview of latency in that process. TLDR requests that hit the wire are
on the order of milliseconds: [https://www.rabbitmq.com/blog/2012/05/11/some-
queuing-theory...](https://www.rabbitmq.com/blog/2012/05/11/some-queuing-
theory-throughput-latency-and-bandwidth/)

~~~
kerkeslager
I guess I'm just not sure what people are expecting here. The point of Celery
_isn 't_ to parallelize and synchronize, it's to fire off a task, probably on
an entirely different machine, and forget about it. If you're trying to sync
up afterward, you're going to run into problems because it's not designed for
that.

------
omni
> In the last 10 years I've never met a case where someone pointed out that
> "this code needs to be concurrent, but single threaded".

I wish I could have slept through the circa 2012 Node.js hype cycle like this
person clearly did.

~~~
omni
To follow up on my own snark:

> Do people use coroutines? Yes, but not in production code.

Now I'm wondering if I'm the one who's confused. Either the author has also
never heard of Go or goroutines don't meet his/her definition of coroutines.

~~~
weberc2
I'm guessing it's the latter, and perhaps not without cause. Traditionally,
coroutines require the programmer to explicitly yield. With Go, you don't have
to think about that much; you can pretty much treat them as lightweight
threads and trust the scheduler to do the right thing. Goroutines aren't pre-
empted, but the yield points aren't explicit (though they are well defined and
thus easy to reason about).

~~~
jerf
"Traditionally, coroutines require the programmer to explicitly yield."

When terms move from academia into common usage, they inevitably blur.
Academia had a specific definition of coroutine that includes that.
Programmers now use "coroutines" in a way that I have a hard time
distinguishing from "threads", Pythonic "generators", and in the worst cases
"something vaguely concurrent". Any given author may have a clear idea of what
they mean (and in particular, I'm not making any specific accusations about
the author of this piece) but the term is rapidly approaching uselessness in
the general programming community.

------
ploxiln
> Do people use coroutines? Yes, but not in production code.

Are you serious?

Well, first, surely in all this talk of cooperative threading, you also
include callback-oriented event-based programming. This is HUGE.

Have you heard of nginx?

But also, I've personally worked on very high volume api servicing and
analytics, largely written in python using tornado (which is very similar to
twisted) for a very popular web service: Bitly. There were also a couple
significant components written in c using libevent, another similar event-ed
i/o mini-framework, involving lots of callbacks.

It can be done right and be readable and debug-able, and it can be super
efficient, more efficient than spawning threads for hundreds or thousands of
parallel i/o tasks (or managing a thread pool or forking processes or
whatever).

It's not for everything but it's definitely in serious production
environments. The more serious, the more likely to use event-ed i/o.

~~~
mdomans
Cooperative concurrency and event-based programming are different concepts.

You need events for the coroutine since scheduler needs to base trampolining
of something.

Alas, you can do event based programming without coroutines.

Yes, I used libevent and libev and I do love nginx :)

------
cyberpanther
Celery does solve the GIL problem and in some use cases is better suited even
if the GIL wasn't a problem. However, it is an additional "system" that you
have to add to a code base. If you're writing a small program that is easily
distributed and needs "non-blocking" features, Celery brings a whole host of
extra requirements. This becomes additionally problematic for Python when
other languages implement these "non-blocking" features into the language.
Thus you can implement these things in a few lines of code, where with Python
you must learn a new framework and have a complicated install to use a simple
feature.

If Python wants to compete with other languages in the future it will have to
keep up with these "non-blocking" features without a huge system overhead
expense.

I see this same problem with Django Channels which lets you implement web
sockets and such with Django. Django Channels relies on Redis and solves more
problems than just web sockets. However, if I just want a web socket, doing it
in Django Channels produces too much overhead. Writing a web socket in Tornado
can be done in few lines of code without the extra "system" overhead.

------
TheGuyWhoCodes
Celery is great and works well but it definitely doesn't fix the GIL problem.

Having shared memory is a must for some workloads, or can save you a ton of
money, especially when taking about caching. Sometimes caching per process
isn't feasible but caching for all threads is, even using multi processing
isn't good enough because you can only use the main process memory if it's
simple arrays.

~~~
mdomans
Talking about shared memory benefits and overheads in Python is a pretty
abstract problem, if you account for how much objects Python interpreter
creates and what OS does under certain load levels.

I'm merely advocating a different model for concurrency in Python :)

------
dikaiosune
I recently opted to use Celery for a similar-ish task to the author's
update_metrics job. Having generally been OK with Python's single-threaded
performance and simplicity, I was not particularly pleased with a solution
that involved an additional backend service (RabbitMQ), separate logfiles,
fighting with the Python module system during setup, and ultimately a lot less
available documentation than for items which live in the standard library. All
of this just to kick off a background process. I haven't had my coffee yet and
am feeling a bit grumpy this morning, but I'm really not sure I would be
willing to use it again for a project where I had to manage the code and the
deployment. Regardless of my mood, celery definitely isn't a panacea for these
problems -- it just happens to ameliorate ones where there's already a larger
deployment to manage.

------
MollyR
Its an interesting use case the author solved with celery.

I've used celery with a few django and flask projects. It's nice, but it added
a lot of overhead for me. So I've used
[http://flower.readthedocs.io/en/latest/](http://flower.readthedocs.io/en/latest/)
to help handle the monitoring.

For some really small sites and very specific use cases, I actually wrote some
small custom helper scripts to just have cron jobs.

If you are using django, I found this library helpful.
[https://bitbucket.org/wnielson/django-
chronograph/](https://bitbucket.org/wnielson/django-chronograph/)

~~~
brianwawok
The biggest overhead is really setting up RabbitMQ right? Because Django and
Celery play nice, you can just deploy one codebase and run 1 Django process
and 1 Celery + Celery Beat process pretty easy...

Bummer you lose so many Celery features when you use Amazon or Google Cloud
Queues instead of RabbitMQ....

~~~
mdomans
AFAIK you can have RabitMQ SaaS these days.

And yes, RabbitMQ is very good.

~~~
brianwawok
cloudamqp? I guess, not terribly cheap though.

~~~
mdomans
I think there are a few other options. But deploying RabbitMQ isn't that hard.
I can actually write something about that.

Is Linode cheap enough for you?

~~~
brianwawok
I mean last time I needed RabbitMQ I just used

[https://github.com/Mayeu/ansible-playbook-
rabbitmq](https://github.com/Mayeu/ansible-playbook-rabbitmq)

and it took 3 minutes. But I hear a lot of people saying not to use Celery
because RabbitMQ is hard.. maybe people just have different standards for
hard?

------
methodover
Celery is neat. Something that sort of bothers me with it though is -- and
this is maybe totally something that is my fault, I just can't figure it out
-- sometimes workers will just stop processing things. For unknown reasons.
I've set up some alerts to automatically slack me when this happens, so I can
go in and see why they're stuck and restart them. Still haven't been able to
figure it out. Happens maybe once every month or two.

~~~
harel
Put a hard time out on the worker and it will be killed and restarted if
hanging for X period of time.

~~~
methodover
Yeah, I've thought about it. I guess I've been holding out hope that I could
figure out the underlying cause. What's wierd is, it always affects just one
type of worker -- the workers that handle sending an email. And it happens on
all workers simultaneously, on all servers, of that type. Just that type.

Meh. I should just put a harakiri timeout like you suggest.

------
aduitsis
I have used Gearman for a long time and I like it a lot, very simple to use,
lightweight, does its job and doesn't get in the way. It's one of the few
cases where something literally just works.

If one would just want a job queue to fanout jobs in a network transparent
fashion, would moving to Celery provide any advantages? With regard to
Celery's broker, which is most preferable, Redis or RabbitMQ?

~~~
mdomans
RabbitMQ - it's simply better, I can elaborate but it'd take an article on
it's own.

Unfortunately I never really used Gearman in high load environment, so can't
comment on advantages of Celery over Gearman

------
fideloper
So if I'm reading this correctly, the author is using a queue system (Celery,
presumably with a backend such as RabbitMQ?) to run background jobs, but then
waiting for them all to complete before finishing a task.

In effect, you sorta get concurrency (assuming multiple workers are processing
jobs).

Or does Celery also provide a way to do threading without using a backend
service (e.g RabbitMQ, beanstalkd, sqs)?

Basically, I'm confused.

~~~
brianwawok
You could totally do concurrency through Reddis or something super lightweight
right on your webserver... could make sense. Really seems better to throw your
task back to the backend pool though, and let a web request finish.

------
seomis
Why is it that Python's single-threadedness is always singled out over every
other language in the general family? Ruby, PHP, Perl, and JS are all single-
threaded (Perl ithreads and JRuby/Jython caveats notwithstanding). Celery
itself is just a framework for fork management, something that has been
available to these languages since their inception.

~~~
nawitus
JavaScript's "logic" may be single-threaded, but it's asynchronous event loop
solves a lot of problems that a single-threaded synchronous model has (like
what Python has by default).

~~~
seomis
Asynchronous event loops aren't a core part of the JavaScript: They're from
the DOM API and Node standard library. Though it is significant that nothing
comparable exists for Python that has reached the same robustness and support.

------
btilly
There is opinionated, and then there is stupidly opinionated. This article
crosses that line then stays there. It offers one useful point, but that one
is hardly news.

For example we have claims like this, _Do people use coroutines? Yes, but not
in production code._ This claim is trivially false. There is lots of
production code that has the word "yield" in it. Heck, plenty of production
code uses Tornado, which at its heart is nothing BUT coroutines. And most
programmers have no trouble reading it.

Moving on we see, _Also, cooperative concurrency aka coroutines was abandoned
a fair amount of time ago. I wonder why?_ I'm sure that people who use Redis
and Node.js would be surprised to discover that cooperative concurrency has no
place in our modern software world.

He even concludes that people don't need concurrency. Which is obviously
ridiculous - just look at any distributed system.

But let's go back to why people move from cooperative concurrency to
preemptive. Answer it will make it clear why we'll always have cooperative
concurrency in the programming mix.

Cooperative concurrency means that you don't need locks and don't have race
conditions. However one poorly programmed routine can block your whole system.
Preemptive concurrency means that a bad routine can no longer block your whole
system, but now you have opened up the guarantee of complex race conditions.
Programmers are really, really bad at understanding race conditions. So bad
that almost all of our software has them. So bad that people who write
automated code verification tools have learned the hard way to not report on
them because programmers won't understand what you are talking about. (Sadly
true. In [http://web.stanford.edu/~engler/BLOC-
coverity.pdf](http://web.stanford.edu/~engler/BLOC-coverity.pdf) look for the
phrase, "...for many years we gave up on checkers that flagged concurrency
errors.")

Therefore if you want concurrency with no locking overhead, and no race
conditions, then coroutines are great. But once your application becomes big
and messy enough, it becomes better to accept the problems that come with
preemption.

Now if you're going to preempt, there is a ton of advice to be offered on how
to do it well. For example you'll generally have fewer problems with coarse-
grained processes than with threads. Queues and worker pools are a great
solution for some types of problem, and I first did that about 15 years ago.
It was hardly an original idea when I first did it - it was suggested to me by
someone who had been using that technique since the 1980s. And it wasn't
original with him either.

However there are a _ton_ of use cases that this doesn't cover. For example it
doesn't do much to help you handle high volumes of traffic to your webserver,
and it won't let you create a distributed data store for low-latency access to
terabytes of data. So put celery in your tool bag, know what it is good for,
and don't make the mistake of thinking that it solves all problems.

~~~
mdomans
I personally quite dislike Redis, which is highly unconcurrent. And don't get
me started on Node.js.

Of course people do use coroutines - I simple don't think they're a good idea.
I think they're a bad idea. They're neither readable nor introduce better
performance.

And yes, I really don't think people think they need concurrency - they just
want this_A and this_B to happen ASAP and with no blocking.

On your explanation of coroutines and preemptive - I agree, though I consider
both equally bad.

The point I was making that Celery runs lockless - and that'd be an
interesting idea to back port to Python.

And I know queues are old idea. What I like about them is the ability to get
rid of "locks everywhere". An example of what I consider good concurrency
model is Grand Central Dispatch with atomic ops running via queues.

~~~
btilly
This is your response to having some of your glaring mistakes pointed out? You
ignore having been proven wrong, then proceed to state your personal opinions
as if people should care?

I have news for you. Redis and Node.js are widely used because they are able
to solve real problems that real people have. They work in practice despite
your personal opinions to the contrary.

Moving on, you're dead wrong about concurrency. How many programs are running
at once on your computer? While you are loading a web page, do you want your
browser to continue to be able to respond to you? Those are two simple cases
where people want concurrency. Those are also two cases that Celery doesn't
handle. At one point they would have been written with cooperative multi-
tasking. Today these are generally written with preemptive instead. (Though
there are some exceptions. PuTTY is a fun one. That implemented coroutines in
C with an impressive preprocessor hack. See
[http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html](http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)
for details.)

And back to Celery, things that don't require locking, can and should get by
without locking. This is a useful observation, but hardly a universally
applicable one. And waving the magic Celery want doesn't change that. If I
have 500 files to process, and that data needs to wind up nicely summarized in
a database, Celery will only let me get rid of locking in my code because I
rely on locking being implemented in the database.

If you care about data and race conditions, at some point you have to solve
hard problems. And you'll need to lock. But only do it when you need to.

~~~
mdomans
I disagree with you and here's why;

Opinions of mine were made after trying both Redis and Node.js and finding
them in poor state. You can read about Redis's problems here:
[http://redis.io/topics/latency](http://redis.io/topics/latency) and here
[http://redis.io/topics/transactions](http://redis.io/topics/transactions)

Node.js is better in regard to handling high load. And the concept of having
the heavy lifting happening mostly away from programmer is very sensible. I
only find the implementation lacking and there seem to be a fair amount of
people now migrating away from Node.js

In regard to your point about two programs running "concurrently" \- that's
actually multitasking, a feature of OS, not the programming language. And if
your OS doesn't support that, you can't write a web browser that will make the
email client work concurrently.

Concurrency only means that two different things seem to be happening at once.

Cooperative multitasking and cooperative concurrency are two different things.
Cooperative multitasking means that if your browser is poorly written and
blocks - your whole machine blocks.

On the other hand cooperative concurrency is a paradigm under which if your
coroutine blocks, your program blocks.

The argument between whether to choose threads or coroutines is an argument
whether it is better to have blocking or race conditions.

So, having traversed "why I'm wrong", let's consider if there are better
models.

Arguably the best language in the class of highly concurrent ones is Erlang.
Erlang for years dominated the space of stable concurrency.

And how it works? Message passing. You can still deadlock Erlang program -
it's just much harder.

Another good example I like is Grand Central Dispatch and it's OS version
libdispatch.

The whole point there is to have blocks or operations that, and this is
guaranteed by the programmer, mutate the state of memory in an atomic way.

Both of those examples are really about having a queue of changes and applying
them. And you could, quite sensibly, argue that a mutex is nothing more
internally than an ordered queue, so why bother.

Well, my problem is really with the API language gives to the programmer. The
point I'm trying to make is that concerning the programmer with how his work
is executed is wrong.

If we can avoid locks inside Python interpreter, and we can, by having a
different syntax to describe what needs to happen, I'd consider it a much
better solution.

It's much easier to say: "drive me to the Station" than to drive there.

~~~
btilly
I am not sure what your trying involved. However I have found use cases that
both Redis and Node.js were great for. And I'm well aware of how to make both
suck. Every programming tool has limitations. Use them within the right
domains, and they are likely to work well. Use them for something they are
poorly suited to, and your experience is likely to be horrible.

That is why Redis talks honestly about what the potential problems and
strengths are. Those aren't problems in the right use case, but they are
things you need to know to help figure out if you have the right use case.

Moving on, cooperative multitasking is a form of cooperative concurrency. It
has all of the same strengths and weaknesses as cooperative concurrency in any
other setting. Hence a single badly coded network call could freeze up Mac OS
9, or Win 3.1. That said if every application is coded correctly, then you can
do a lot of things concurrently. I remember using email, Usenet and a browser
at the same time on Mac OS 9, and it worked fine..most of the time.

Once the system gets complex enough, the fact that any mistake can lock
everything becomes unacceptable. This is why operating systems do not
generally use cooperative multitasking. An individual application may
reasonably choose to go either way. But as they get more complex, there is
pressure to go preemptive.

Moving on, you only have half the story for Erlang. Erlang uses message
passing AND immutable memory. The fact that you can't modify memory in place
is a huge limitation on the programmer. However it also eliminates large
classes of race conditions.

If Python made all data immutable, it could also get rid of most of the uses
of the GIL.

For a less extreme approach, look at Go. In Go data is mutable. But by
convention, you pass objects around using channels, and only one goroutine
owns an object at a time. Since only the owner accesses it, that eliminates
most race conditions. However if you violate the convention, a Go program dump
can dump core very quickly!

~~~
mdomans
Eventually we come to a point where I agree with you, almost.

One exception is Redis/Node.js - I don't like them because they indeed work in
certain cases, but that set is much less than what authors of Redis/Node.js
advertise.

I very fondly, for example, remember antirez being adamant that Redis is
awesome as cache, yet it took me 10minutes to prove that single Redis performs
worse than single Memcached. And no, I don't consider answer: run more Redis
instances valid. You design software to have features that solve problems, not
replace old with new.

And yes, Erlang uses message passing and has locks and immutable memory.

I prefer the Go approach of mutating through channels/queues. Both Go and GCD
in C/ObjC are good examples of this approach.

~~~
btilly
If your use case says that it is OK for data to disappear on you at random,
then Memcached is great. Otherwise it sucks. If the amount of data to cache is
big enough to use up available RAM, then Redis is going to suck. If your use
case involves complex data structures populated over here that you want
replicated there, then Redis might really be good for you.

Know what each tool is good for, and don't believe unwarranted hype.

And yes, coroutines have their place. Python without generators would be a
much weaker language.

~~~
mdomans
It's not hype really - it's me getting fed up with authors arguing that their
software solves every problem. Redis is very good at being data structures
store.

In regard to Python - generators make sense, coroutines, much less. That's my
opinion, but over the span of 10 years most programmers I met professionally
either never used coroutines outside of pet projects or use them sparingly.

I'd actually like to see a big project with coroutines implemented using the
async/await.

~~~
btilly
I disbelieve.

Take working in Tornado. I've been writing in Tornado, and all I've needed to
do is decorate functions with @gen.coroutine then use generator syntax. All
the detailed plumbing to implement the event loop is done for me, and I never
interact with it directly. Nor would there be any value in my doing so. It
needs to be implemented once, and implemented well. Once that is done there is
little to no value in messing with it. You'd just be creating the opportunity
for disaster with little to no corresponding value.

I actually experienced this exact disaster. In this project, Tornado has to
interact with Redis. So I used tornado-redis which tried to be implemented
directly. Unfortunately it was not well implemented, and one coroutine could
easily get a message meant for another. Untangling the mess promised to be a
lot of work.

However it was literally the work of minutes to switch my class to the
synchronous Redis library, write a ThreadPoolExecutor wrapper for my class,
and have Tornado interact with that. It would be slightly better if the native
version had worked, but this got 95% of the benefit for under 1% of the work.

You could describe this as "using coroutines sparingly". I'd describe it as
using them sensibly. Coroutines are a useful bit of plumbing to enable a
programming style, but their flexibility is also a burden. You should use them
to build a sensible abstraction, then use that.

For another example of this, look at Scheme. Scheme internally implements
continuations, and function calls are just a special case of that.
Continuations are, of course, a building block for coroutines, generators, and
many other programming constructs.

However a sensible Scheme programmer just implements functions. They may be
using a project that somewhere does coroutines, continuations, and all sorts
of other fun stuff. But the day to day code you write _doesn 't_ do that.
There is no need to add the mental overhead from using the construct all of
the time "just because it is there".

~~~
mdomans
To a certain (high) degree I agree.

First of all, when I'm wrote this article by coroutines I meant specifically
Python async/await pattern which I don't find easy to use or read.

And I agree on Tornado. Over the years I've seen many APIs that abstract away
stuff like implementation details of cooperation yet are in fact coroutines -
e.g. goroutines in Go, tasklets in Stackless. Even Erlangs processes are, on
the VM level, cooperatively scheduled green threads.

For me there are 3 problems when you design concurrency APIs. One is
performance and language internals. Second is API that you expose to the end
programmer. Third problem is what you can achieve as a programmer using those
API based on how integrated concurrency is into the language.

All the examples of higher level platforms I know are good for highly
concurrent apps: Erlang, Go, Node.js, GCD - they were designed and throughly
integrated into the language.

In that context threading in Python really feels bolted on.

And for something different: seeing how some people reacted I'm diving deep to
make 2-3 articles of tour-de-concurrency. What do you think?

------
abotsis
To all the haters: the GIL is just a constraint that can be worked around.
Clever ways of starting more than one interpreter is a solid way to do that.
If you can't think of a way to do that, then maybe you shouldn't be working on
distributed systems.

For example, I wrote a high-speed distributed database load tool, a perfect
python use-case. It did some transforms, but nothing heavy. The database
(Vertica) could generally keep up. For large files (multiple TB) I'd divide
the file into a number of chunks and each interpreter would just seek into an
offset, read to EOL, then load until it reached the next interpreter's offset.
I built it in a few hours.

Just trying to say that there are very few real-world workloads (were python
is extremely useful) that the GIL really gets in the way. It certainly doesn't
offset the time and ease of developing a solution.

~~~
dang
> _To all the haters_

This is a form of name-calling, which the HN guidelines ask you not to do in
arguments. And the "if you can't think of... maybe you shouldn't be" is also a
jab. Please don't do these things in HN comments. They add no information,
just provocation. Consider how much better your otherwise fine comment would
be without them.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

[https://news.ycombinator.com/newswelcome.html](https://news.ycombinator.com/newswelcome.html)

~~~
abotsis
The point was to be used as a colloquialism to address people who disagree and
encourage them to stop, think and remember everything has tradeoffs and
compromises. Then to think about those before simply saying "this isn't a GIL
workaround". Maybe it had the opposite effect. :)

I mean, I'm a python fan and this has tradeoffs too (external dependency, task
dispatch latency) which doesn't make it suitable every time the GIL gets in
the way.

In any case, I don't believe either of these were any more sensational than
some of the content in this post.

~~~
scott_s
Returning-in-kind is actively discouraged on HN.

------
armitron
Terrible post, worst I've read today by far.

Not only does the author completely NOT understand why the GIL is a problem,
but he also misunderstands asynchronicity which is hilarious given his
recommendations.

~~~
coldtea
At least he gives arguments one can refute...

~~~
mdomans
This comment and nick together ... flawless victory

------
JustSomeNobody
Python doesn't have a GIL "problem". Just stahp.

~~~
mdomans
Have you seen a program of any Python conference :D

~~~
JustSomeNobody
So? Calling it a "problem" is crap.

The GIL was a decision that has repercussions on certain types of programming,
sure, but it's not a "problem".

~~~
mdomans
It is for the community.

Personally I consider GIL to be an awesome idea and it'd work if we went with
a concurrency model that avoids implicit locking

