
Co-routines as an alternative to state machines - gthank
http://eli.thegreenplace.net/2009/08/29/co-routines-as-an-alternative-to-state-machines/
======
silentbicycle
An _excellent_ paper on coroutines is "Revisiting Coroutines" by Ana Lucia de
Moura and Roberto Ierusalimschy (<http://www.inf.puc-
rio.br/~roberto/docs/MCC15-04.pdf>).

Lua has had excellent support for co-routines for a couple years, and, since
its implementation is "stackless" (in the same sense as Stackless Python), its
coroutines don't have the limitations as standard Python's. (It doesn't have a
GIL either...) There are a couple papers about coroutines in Lua, but
"Revisiting Coroutines" is the best overview. They're surprisingly versatile.

The paper shows how to efficiently implement one-shot continuations on
asymmetric coroutines. (There is also a small C patch by Mike Pall that adds a
coroutine.copy function to the VM, making multi-shot continuations possible:
<http://lua-users.org/lists/lua-l/2006-01/msg00652.html>, and note the fix in
his follow-up post.) Implementing continuations in terms of coroutines rather
than the reverse (as is commonly done in Scheme) often leads to clearer code,
IMHO.

------
chadaustin
IMVU makes heavy use of this technique:

<http://thespeedbump.livejournal.com/63798.html>
<http://thespeedbump.livejournal.com/64053.html>

It's the only way we've found to write understandable, maintainable flows that
touch network requests and user input. Explicit continuations are way too
error-prone, in our experience.

------
tnovelli
Oh yeah, software state machines are silly, considering that microprocessors
are very efficient state machines. It's funny how algorithms that make perfect
sense in Assembler can become so slow and convoluted in an HLL. Coroutines,
continuations, and GOTOs are some remedies for this high/low-level disconnect.

I wonder, are software FSMs ever really necessary?

~~~
Hexstream
I recently used a FSM approach to simultaneously match multiple patterns
against a tree in one walk, to know what rules match at each node and with
what pattern variable values (dictionary). Specifically, to associate the
ordered list of suitable error renderers for each specific validation error
that can occur.

The way I normally do pattern-matching is simply to walk the pattern and the
tree to match against simultaneously, but of course this approach breaks down
if you want to match multiple patterns against one tree in one pass. So I
compiled each pattern into a list of closures, each closure representing one
step of pattern-matching, a state. Then I walk the input tree normally and at
each node, I call "push" on each of the patterns. If the current state of the
pattern matches, I push the next state and the new dictionary variable values
that were matched at this step. If the match is invalid, I push :invalid on
the stack. After visiting the node I call pop on all the patterns.

Sorry if this is confusing. That's just to illustrate that learning the
concept of a FSM let me implement something I wouldn't have known how to do
otherwise.

Anyway, understanding the concept of a FSM is valuable in itself even if you
never were to implement one because thinking about all possible states, inputs
(symbols) and transitions helps you think more formally about some problems.

And even if the CPU itself is a state-machine, what if you want to make an
emulator for that CPU?

~~~
stcredzero
I've seen good patterns for state machines in Smalltalk with states as objects
and states as methods (on an FSM object).

With states as methods, you have an instance variable for "state" and you just
run a loop:

    
    
        [ self state == #final ]
            whileFalse:
                [self getNextInput.
                self state: (self perform: self state)]
    

Each method then becomes a concise description of the state transitions:

    
    
        code
            self atEndOfInput 
                ifTrue: 
                    [^#final].           "Go to #final State"
            self input == $"
                ifTrue: 
                    [ self writeToOutput.
                    ^#insideComment].    "Go to #insideComment"
            ^#code                       "Stay in #code"
        
        insideComment
            self input == $" 
                ifTrue: 
                    [^#code].            "Go to to #code"
            ^#insideComment              "Stay in #insideComment"
    

Then debugging just becomes a matter of putting breakpoints in methods.
#perform: is almost as fast a a regular message send, and the methods
themselves are mostly #ifTrue: statements, which are bytecode optimized, so
this is mostly JIT-ed to machine code.

------
enqk
My favorite example of that are "protothreads", stackless "threads" in C
introduced by Adam Dunkel, and related with the contiki project.

An implementation of the paper can be done in a 130 lines big header, and they
conveniently replace state machines with straightforward sequential code.

<http://www.sics.se/~adam/pt/>

------
mcav
For cool Python coroutines with networking code, have a look at Eventlet. I'm
using it for a lot of my server code; it makes server-side socket programming
dead-simple to understand.

~~~
btw0
Also don't forget cogen, a library for network oriented, coroutine based
programming using the enhanced generators from python 2.5. I recently used it
as a front-end concurrent WSGI server in my Google Summer of Code project.

Besides state machine simulation, corotine has many other usages as well (eg.
implement concurrency), it is a very powerful tool that not many programmers
know. For those who are interested, I suggest a tutorial:

<http://www.dabeaz.com/coroutines/>

------
abecedarius
Nice explanation, but the Python example could be expressed just as naturally,
and with a bit less code, using a generator instead of a coroutine. I'd have
added a note on when to use which style: prefer generators since Python
supports them better, but use coroutines when the producer of the data stream
has to be 'on top' in the control flow.

~~~
silentbicycle
A "generator" is a subtype of coroutine. Python calls them "generators"
because 1) that's what Icon called them, and 2) Python's implementation of
coroutines is* restricted by being unable to yield from within called
functions, among other things, so they're not as generally useful.

* Was? Maybe Python 3 fixed this. I use Lua instead now -- it's cleaner, consistently gets these things (TCO, lambdas, coroutines, the GIL, etc.) right, and I'm not going to wait forever for Guido to change his mind.

~~~
cturner
Yielding from within called functions is the basic generator syntax, but I
think I know what you're talking about.

> Maybe Python 3 fixed this

Can you verify that this is what is being talked about at page 26 of this
link? <http://www.dabeaz.com/coroutines/Coroutines.pdf>

If so - Python 2.5.

~~~
silentbicycle
Not slide or page 26, no.

From PEP 342:

    
    
        Also, generators cannot yield control while other functions are
        executing, unless those functions are themselves expressed as
        generators, and the outer generator is written to yield in response
        to values yielded by the inner generator.  This complicates the
        implementation of even relatively simple use cases like asynchronous
        communications, because calling any functions either requires the
        generator to "block" (i.e. be unable to yield control), or else a
        lot of boilerplate looping code must be added around every needed
        function call.
    

It looks like 2.5 fixed that particular issue. (I'm not holding my breath on
tail-call optimization, though...)

------
silentOpen
Why not use mutual recursion and TCO? The function with the program counter is
the current state. Problem solved?

Mutable state is your enemy...

~~~
silentbicycle
It's not always just two coroutines (could be thousands), and not every
language has TCO.

Also: While mutable state is problematic, sometimes the alternatives are
worse.

~~~
silentOpen
That's a good point... mutual recursion lacks dynamism.

The lack of TCO or ability to do mutual recursion+TCO is really the fault of
the language. If you have a state machine with a small number (<20) states,
mutual recursion seems like a more natural fit (unless you have to have states
determined at run-time).

------
nimrody
While coroutines may help in writing more maintainable/readable code, they
make for a very inefficient structure since they have to keep a separate
stack. Having many concurrent coroutines is probably a bad idea.

This is the classic trade-off between event based processing (state machines)
and process based models.

~~~
jrockway
There is not _that_ much overhead. If you are writing an event-based network
server, you have to allocate one structure of some type per connection, this
is going to use memory regardless of whether it's a coroutine, or just a
closure or instance of a class. If your coroutine implementation is good,
these representations will be exactly equivalent. (In the real world, it is
not as good of course, but it is theoretically possible to make coroutines as
space efficient as hand-crafted state machines.)

I did an experiment; I created 100,000 coroutines in Perl (with the Coro
module) each implementing a simple network protocol (echo). This used about
14k of RAM per "connection".

So sure, you could do it with less, but I know a lot of people who are happy
with using 1.4G of RAM handling three concurrent requests -- this handles 100
thousand. (Right now, my web browser is using half that amount of memory so I
can type this post.)

And BTW, this is also not too expensive CPU-wise; under the hood it uses epoll
on my machine, which scales linearly over the _active_ connections. With most
of the connections idle, this results in very good performance.

Edit: I did the same thing with GHC and Control.Concurrent's forkIO; the
overhead is about 3k per thread. Very reasonable.

~~~
silentbicycle
As a data point, creating 100,000 coroutines in Lua uses about 183 MB* , which
works out to about 1.9K apiece. This seems independent of the function
included in the coroutine -- "return 3" and a function with several pages of
code use the same space. It's probably a small stack, a closure for "upvaues",
and a function pointer.

Note that this doesn't include memory for the IO connection, but the original
question was about the space overhead of coroutines themselves. How much do
Perl and Haskell use _without_ the IO handle?

* On an AMD64 running OpenBSD 4.6. On an i386 (32-bit) system, it took 142 MB.

~~~
jrockway
With empty suspended coros, we use about 169M for 300,000 of them. So that's
about .5K per coroutine.

In Haskell, 500,000 threads waiting for an empty mvar used about 1.4G of RAM.
But if they just spin forever (foo = foo), not waiting for anything, then it
uses 129M for 500,000 threads; .2K each.

So really, coroutines are not costly at all. Most of the price you pay you
would have to pay anyway. The overhead of the coroutine itself is tiny.

~~~
silentbicycle
Good to know, thanks for the quick response.

~~~
jrockway
And just for the record, my examples were on a 64 bit machine.

------
amichail
Full coroutine example in uC++:

[http://www.student.cs.uwaterloo.ca/~cs343/examples/FullProdC...](http://www.student.cs.uwaterloo.ca/~cs343/examples/FullProdCons.cc)

------
noss
Erlang programming is like this all the time.

------
skwiddor
C.A.R. Hoare wrote a whole book on CSP

<http://www.usingcsp.com/>

------
skwiddor
The Limbo programming on the Inferno OS makes great use of co-routines.
<http://www.vitanuova.com/inferno/>

