Structured Concurrency in High-Level Languages

Animats · on April 28, 2018

The control flow issues mentioned there have been addressed many times. Coroutines and their discontents are well known. But the author has some good points about cancellation.

Cancellation isn't well-studied. It has two main cases - canceling a wait, and interrupting actual computation. The first case comes up more often.

If anything that can wait can be aborted, then things unwind better. Language support for this needs to be block-structured, as in Python's

    Lock lok;
    with lok :
      (do something)

Python uses exceptions, so the appropriate mechanism there would be to indicate lock failure with an exception, and not execute the "with" block. In error code languages, the locking block should return an error code.

Once a thread/coroutine/whatever is in cancellation, all waits should abort. So you rapidly fall through the code until the concurrent construct reaches its exit point. Unwinding still takes place. Non-blocking concurrency primitives should be allowed in this mode, so you can add things to a list, but can't wait on I/O. RAII things like file-closing on block exit have to be done that way, because the canceled construct can't wait for them.

QNX real-time messaging works something like this. QNX's basic interprocess communication primitive is MsgSend, which sends a message and waits for a reply. MsgSend can be canceled, or can time out; an error code is returned. All I/O requests are MsgSend operations and thus can be canceled. In QNX, everything that can block can be timed out or canceled. This is used to place upper bounds on how long things can take.

Canceling a compute loop is harder. If the compute loop is operating only on local data, it's not so bad. You need some data flow analysis to determine that the function being forced to error out doesn't have non-local side effects. If, say, you're inverting a matrix, and that needs to be aborted, you need to be sure that the matrix is never looked at again. This is probably easier in functional languages.

Once wait cancellation under control, so the thing you're canceling can't block, having to wait for the other thread/coroutine/whatever to reach its join point isn't so bad. You don't need to kill it; you can wait for it to unwind. So solving cancellation moots the problem of how to handle thread joins for failed threads.

sparkie · on April 28, 2018

The idea of looking at things as a call tree (or graph) rather than a plain old stack is how Schemers think about things. We specifically avoid terminology like "stack frame" and refer to them as "continuation frames" because we don't tie the abstract feature to how it is implemented. A branch in the tree is an invocation of call/cc.

These recent complaints about structured concurrency in popular languages to appear to shadow the complaints about call/cc vs delimited continuations. More info here: http://okmij.org/ftp/continuations/against-callcc.html.

CodeArtisan · on April 28, 2018

Oz's declarative concurrency is worth taking a look at. In Oz, a thread implicitly goes into sleep or wake up on data dependency. for example,

    declare 
      FOO BAR % FOO and BAR are not bound to any value yet.
    in 
    thread
      {print FOO} % the thread blocks here until FOO is bound to a value.

      BAR = 456   % Binds BAR which wake up any thread waiting for it.
    end

    thread
      FOO = 132   % Binds FOO which wake up any thread waiting for it.

      {print BAR} % the thread blocks here until BAR is bound to a value.
    end

Because of this, ordering of operations is easy.

TeMPOraL · on April 28, 2018

This sounds like an easy recipe for race conditions. Consider:

    declare 
      FOO BAR % FOO and BAR are not bound to any value yet.
    in 
    thread
      {print FOO} % the thread blocks here until FOO is bound to a value.

      BAR = 456   % Binds BAR which wake up any thread waiting for it.
    end

    thread
      {print BAR} % the thread blocks here until BAR is bound to a value.

      FOO = 132   % Binds FOO which wake up any thread waiting for it.
    end

I swapped the instructions in the second thread. Does Oz detect that, or otherwise protect programmer from such mistakes?

raattgift · on April 28, 2018

If that's the complete program or alternatively if it's the entire scope for the FOO and BAR variables such that the language guarantees no other thread (or process or anything else) may mutate or even access FOO or BAR, then the implementation should throw an error saying as much, the moment the scope is left (e.g. by reaching end-of-input).

However, if one is still free to do the moral equivalent of BAR = 1 or FOO = argc in a third thread, or even at compile/link time, then why should it a mistake to have these two threads blocking on such a (possible) event?

CodeArtisan · on April 28, 2018

It would deadlock but there is a debugger that shows the state of any thread and where it is blocked.

smaddox · on April 28, 2018

So it's race free.... If you test every single possible system state, and remove all the race conditions. No thanks.

ghkbrew · on April 28, 2018

A race condition is a situation where the result depends on the ordering/interweaving of threads. And that example isn't actually one.

It's a deadlock which are also bad but much more deterministic and easier to debug.

radarsat1 · on April 29, 2018

Just a guess, but it sounds like it could be detected via static dependency analysis and the compiler could throw an error.

yxhuvud · on April 28, 2018

Great follow up to the original argument. I certainly hope some language designers start to look really hard at these arguments.

    Off the top of my head:

    * Cancel all threads in the bundle.
    * Block until all threads in the bundle exit.
    * Block until at least one thread in the bundle exits. Cancel all the other threads.

I think there also is one 'restart any exiting thread' and a variant of the block until all rule where only failed threads are restarted. Hmm, actually - perhaps there should be variants specifying the behavior of failed threads in all cases?

zzzcpan · on April 28, 2018

> perhaps there should be variants specifying the behavior

It feels like Erlang/OTP is being reinvented right here again.

Once you get deep enough into concurrency the ideas start to converge on killable isolated processes with asynchronous message passing and programmable behaviors. Because with shared memory killing a thread has unpredictable consequences, you need to either enforce isolation or avoid consequences in some other way, like by not having multithreading and using an event loop. Isolated processes need to communicate with other processes somehow. You need message passing. But it has to be asynchronous to make it possible to implement various waiting behaviors. With event loops you get there via cancellable contexts and higher order programming, but on a single cpu. Getting into multi core or distributed systems yet again requires asynchronous message passing. At the end it's all actors everywhere.

dnautics · on April 28, 2018

> It feels like Erlang/OTP is being reinvented right here again.

I thought the same thing. There are some cases like low message passing computational tasks where you can get away with simple async futures + fetch_future (being a computationally oriented language Julia encourages this model) but in the end actors are pretty great at generalizing concurrency.

pacala · on April 28, 2018

What would be some usecases for which synchronous is insufficient and one has to reach for the hard to reason about asynchronous?

toast0 · on April 29, 2018

Synchronous messaging is insufficient for making multiple requests at once and waiting for the responses. Also asynchronous messaging allows for optimistic notifications -- for example to clear a cache when coherency isn't required.

With asynchronous messaging it's simple to build synchronous interactions if desired --- send a message and wait for a response; but you can't build asynchronous interactions on top of synchronous messages.

dnautics · on April 28, 2018

Let's say you are writing a parallel crypto hasher... Which if you're part of a pool is the case at the nacro level, the first to get to the answer should probably cancel the rest (though strictly speaking it's not necessary)

samatman · on April 28, 2018

A proposal: what's being referred to as a 'bundle' or a 'nursery' should be called a 'loom'.

A loom has the precise relation to actual warp threads as a virtual loom would to software (green) threads.

irq-1 · on April 28, 2018

How about a "knot" which implies a structured organization of threads (like a knot of rope), and a messy tangle of threads (like a knot in hair.) Plus, it's funny that it's a homophone for "not".

empath75 · on April 29, 2018

What about yarn?

KirinDave · on April 28, 2018

For those interested in various approaches in this space that are very interesting, may I recommend:

* Haskell and in particular Marlow's async library.

* The Erlang language's Actor system along with the OTP framework as a different way of thinking about orphaned processes (via supervision trees).

* The Pony language, which can safely pass ownership of objects (and thusly coroutines) around.

noelwelsh · on April 28, 2018

Isn't this really well trodden ground? Coroutines were known by the 60s, according to Wikipedia, and I have a really hard time believing that just about everything worth saying about them hasn't been said. I feel a little bit of scholarship would help avoid creating problems that have already been solved.

chubot · on April 28, 2018

This comment doesn't add anything to the discussion. You could say this about anything, like say parsing, which was also "known by the 60's".

It's a "middlebrow dismissal": http://www.byrnehobart.com/blog/why-are-middlebrow-dismissal...

It ignores the difference between theory and practice. Whether you do coroutines in C, Python, or another language matters. Language features interact, and those interactions affect how you write specific programs, so treating it abstractly has limits. Python actually has 2 completely different notion of coroutine, both of which were implemented after the 60's.

The ground isn't really well trodden, because most languages don't have coroutines. Probably 99% of programs don't use coroutines (nearly every program in C, C++, Java). Pre-emptive threads are more common.

You also didn't mention any literature which would have "said all the things".

That said, here is a good paper from 2004 from the authors of Lua: Revisiting Coroutines. Among other things, it points out that "coroutine" doesn't mean one thing.

https://scholar.google.com/scholar?cluster=14280883920152261...

noelwelsh · on April 28, 2018

It's hard to give constructive criticism of the article, beyond suggesting background reading, because it seems both ignorant of existing research and naive as to the issues that arise.

Here are some of the relevant ideas and papers:

- "A coroutine is like a control construct"---a coroutine is a control construct. There is no "like". But there is a fundamental difference from a goto because it captures execution state (the stack and so on, also known as the continuation) which a goto does not. A continuation can only jump to program states that make sense (for some definition of "sense"---you can at least count on variables being initialised, for example). A goto can jump anywhere.

- "But coroutine is also like a variable"---a coroutine closes over an environment. I.e. it is a closure. This doesn't really have much to do with being "like a variable". Representing closures is well studied in programming language research and there are many techniques for handling it E.g. http://matt.might.net/articles/closure-conversion/

- "A coroutine can't be orphaned"---those are your continuations. You now need to worry about either heap allocating continuations or copying the stack. E.g. http://lampwww.epfl.ch/teaching/archive/advanced_compiler/20... And when do you free continuations? You probably need to GC them, as your control flow is now more complicated.

- "A coroutine can be orphaned"---this is where you need to think about semantics. What does it mean for a coroutine to be abruptly exited? It probably means leaking resources and hard to track down bugs. A lot of thought needs to go into this. (Somewhat relevant is this paper: https://www.cs.indiana.edu/~dyb/pubs/LaSC-7-1-pp83-110.pdf)

The rest of the article isn't so interesting to me, because I don't think thread cancellation should be possible at arbitrary times.

So that's some of the existing work in the field, but the biggest issue is the need to think about the semantics, or programming model, of the concurrent tools you're introducing. The article does very little of this. Can you actually understand programs written using some proposed concurrency primitives? That's where it really gets interesting, and where recent work like http://www.csc.kth.se/~phaller/doc/haller16-scala.pdf is making some thought provoking contributions.

coldtea · on April 29, 2018

>- "A coroutine is like a control construct"---a coroutine is a control construct. There is no "like".

Pedantic. A coroutine is a control construct in the same way an Exception is a control construct, or in the way a tomato or a cucumber is a fruit.

Adopting a particular naming depends on the categorization and the intention, not merely on the capabilities of a construct.

Even more so in a post, were the notion needs to be introduced to people who only associate the concept of a "control construct" to the more explicit and tamer constructs referred to as such, if and for loops, goto, etc.

>- "But coroutine is also like a variable"---a coroutine closes over an environment. I.e. it is a closure. This doesn't really have much to do with being "like a variable". Representing closures is well studied in programming language research and there are many techniques for handling it E.g. http://matt.might.net/articles/closure-conversion/*

The first part is just a formal definition, which doesn't negate the statement, and the latter part is just a generic reference to research on closures.

One could use the same kind of saying-nothing responses to all kinds of posts or papers.

noelwelsh · on April 30, 2018

"One could use the same kind of saying-nothing responses to all kinds of posts or papers."

You misunderstand.

If you know the name of the concept you can search for more information.

If you read the section on "But a coroutine is also like a variable" you'll see the author is struggling with how to represent coroutines. Closure conversion deals with representation.

steveklabnik · on April 28, 2018

Given the recent discussions around co-routines in Rust, even if everything has been said, there's still a wealth of stuff in this space, and figuring out what makes sense for a given language is a lot of work, regardless if it's been in some paper somewhere already or not.

noelwelsh · on April 28, 2018

I agree. The interaction of features within a particular language requires a lot of thought. However this particular article (at least how I read it) was talking in generalities, and wasn't tied to a particular language.

ashleyn · on April 28, 2018

In my experience, there's an academic world, and a practical world, and never the 'twain shall meet.

The academic world knew without experience that higher-level structured concurrency was the way to go. It took the practical industry decades of unmaintanable pthread messes to learn the same lesson, but they ultimately learned it.

rumcajz · on April 28, 2018

Any interesting links to academic research of higher-level structured concurrency?

noelwelsh · on April 28, 2018

Here's one interesting paper: http://www.csc.kth.se/~phaller/doc/haller16-scala.pdf

Here's the general algorithm:

1. Find something interesting

2. Chase the references and read them.

3. Chase the citations (use Google Scholar) and read them, if they're relevant.

You'll pretty soon have an understanding of the general approaches and issues.

(If you throw "coroutine concurrency" into Google Scholar you'll very quickly find citations from the 80s discussing this model.)

joker3 · on April 28, 2018

You forgot the step where you beg, borrow and steal to get around the journal paywalls.

noelwelsh · on April 28, 2018

Most researchers in programming languages publish their papers on their home pages. In CS Ed I'm discovering it's very different.

girvo · on April 29, 2018

Sci-hub makes it too easy now!

coldtea · on April 29, 2018

>I feel a little bit of scholarship would help avoid creating problems that have already been solved.

Anything particular, besides a vast generalization?

UncleEntity · on April 28, 2018

Isn't this the same as Notes on structured concurrency, or: Go statement considered harmful[0] except using c rather than python?

One has a "nursery" while the other has a "bundle" with both hoping to control the lifetime of coroutines.

[0]https://news.ycombinator.com/item?id=16921761

yxhuvud · on April 28, 2018

Did you read the preface? It is a direct follow-up to that post written by one of the people that inspired it in the first place. The post goes through some parts in a bit more depth than what the previous post you are referring to did.

But yes, the author has a very similar approach to the problem.