
Coroutines: A Million Stacks (2017) - srean
http://www.mikemarcin.com/post/coroutine_a_million_stacks/
======
rdtsc
> Can you have 1 million entities and let them all think with stackful
> coroutines and achieve acceptable frame rates?

Yes. With Erlang (or Elixir). Even have them preempt each others, accept
connections and respond in a timely manner. If any crash, I they can be
restarted without much of an issue. Just saw a few cluster nodes spike to
above 1 millions processes recently, then recover, no big deal.

~~~
norswap
Well the above proves it's doable in C++ too. The issue hit upon is the time
required to run a million of coroutines to completion (8s in the most
optimized scenario). Would Erlang/Elixir do better? I don't know, but I
wouldn't be so sure.

~~~
rdtsc
> Would Erlang/Elixir do better? I don't know, but I wouldn't be so sure.

Pretty sure it can. Let's give it a try:

    
    
        #!/usr/bin/env escript
        %%! +P 2000000
    
        %% run with ./spawn_n 1000000
    
        -mode(compile).
    
        main([NStr]) ->
            N = list_to_integer(NStr),
            T0 = erlang:system_time(millisecond),
            Parent = self(),
            Waiter = spawn(fun() -> Parent ! {final_result, wait(N, 0)} end),
            spawn_n(N, Waiter),
            Res = receive {final_result, R} -> R end,
            Dt = erlang:system_time(millisecond) - T0,
            io:format("Done in ~p sec result:~p~n", [Dt/1000.0, Res]),
            ok.
    
        wait(0, Acc) ->
            Acc;
        wait(N, Acc) ->
            receive {result, R} -> ok end,
            wait(N - 1, Acc + R).
    
        spawn_n(0, _Waiter) ->
            ok;
        spawn_n(N, Waiter) ->
            spawn(fun() -> Waiter ! {result, 1} end),
            spawn_n(N - 1, Waiter).
    

And it outputs:

    
    
        ./spawn_n 1000000
        Done in 1.613 sec result:1000000
    

In summary, it started 1M processes, all with isolated heaps, pre-emptable,
with only a few KB of stack size. They didn't execute much just sent their
result to a collector process and then exited. All in about 5x as fast as C++.

~~~
norswap
A most excellent answer :)

~~~
alexeiz
If the C++ code in comparison uses boost.coroutine2 library, then it's not
surprising it performs poorly. Boost.coroutine2 library has an architectural
flaw which causes it to throw (and then catch) an exception on coroutine
completion.
[https://github.com/boostorg/coroutine2/issues/25](https://github.com/boostorg/coroutine2/issues/25)

------
thelazydogsback
Everything old is new again...

------
xhgdvjky
if the stacks are mostly the same and you have far fewer processors than
frames, you could use copy on write to share stacks

not useful but maybe interesting if you're into this kind of thing

~~~
kjeetgill
I'd imagine most stacks get written to almost immediately on the first
function call.

------
rapsey
If you need that many you must use either a higher level language (Erlang/Go)
or Rust futures.

~~~
kabdib
I've seen C++ based systems that regularly harbor hundreds of thousands of
stacks using quite minimal heap, 2M+ stacks when things get really busy (this,
on a machine with 64GB of RAM). The primitives are a few hundred lines of
pretty tame C++.

As much as I like Erlang/Go/Rust/etc., it's not a requirement.

~~~
cbetti
How quickly could the process perform a simple operation, say compare or move,
across all 2m+ stacks? I.e. how long is the latest stack frozen before it is
given an opportunity to work?

~~~
lossolo
Check out libdill[1] (in C): "Generally speaking, though, libdill's
concurrency primitives are only a bit slower than basic C flow control
statements. A context switch has been seen to execute in as little as 6 ns,
with coroutine creation taking 26 ns. Passing a message through a channel
takes about 40 ns."

Check also libmill[2] (from the same author), it's more Go'ish:

"It can execute up to 20 million coroutines and 50 million context switches
per second."

1\. [http://libdill.org](http://libdill.org)

2\. [http://libmill.org](http://libmill.org)

