
Actors, Green Threads and CSP on the JVM - sigil66
http://boundary.com/blog/2014/09/03/no-you-cant-have-a-pony/
======
noelwelsh
The argument seems to be that the actor model implementations on the JVM
aren't all that fast and the language can't stop you from shooting yourself in
the foot.

That's not really the issue as far as I'm concerned. Message-passing
concurrency actually allows people be productive writing concurrent code that
has a hope of working -- that is the main advantage. Occasionally things go
wrong. Occasionally you need more performance and have to drop down to a lower
level. But I can _easily_ write a web service running on the JVM that gets
1000 requests/s per CPU with a bit of message passing concurrency. That's the
point.

Heck, I don't even like the actor model -- I much prefer CSP -- but I'd attack
it for other reasons (like lack of type safety).

~~~
MCRed
Right, and when you "drop down to a lower level" to get more performance, the
entire house of cards comes crashing down with bugs that are extremely
difficult to debug.

The issue isn't actor model on a virtual machine-- erlang runs on a VM-- the
issue is that the actor model is insufficient. You need pretty much most of
OTP to build reliable, stable, concurrent systems.

This is why goroutines, everything on the JVM, etc are not going to work.

The sad thing is, I think the primary reason people are not using Erlang is
they are afraid of the syntax, but even that reason is obliterated by the
existence of Elixir (an even better language on the erlang VM)

~~~
gilbertw1
What do you mean by:

"This is why goroutines, everything on the JVM, etc are not going to work."

I've used Akka very successfully on the JVM for non-trivial clustered
applications where performance is important, and I'm pretty sure the existence
of most of google's infrastructure says that goroutines work at least a little
bit.

Do you mean not work well? They won't gain traction?

~~~
jeremyjh
For example, Go has no means of monitoring or linking to it's goroutines;
without supervision trees the Erlang model for resiliency cannot be
implemented.

------
village-idiot
Let's break this down a bit. It's pretty clear to me that there are 3 reasons
why you'd want to use CSP channels or actors.

1) Reliability. This is mostly going to be of the subscribing to other actors
to make decisions when they die type. This is a complete replacement for
try/catch, and probably isn't possible or desirable on the JVM due to
mutability concerns. If you need and want this, Erlang is your bet.

2) Performance. This is probably the most common reason, and a true fiber
system can hit this just fine on the JVM. So what if touching old code might
cause degraded performance due to blocks, it's not unreasonable at all to say
that high performance code will require some care. And if the fiber system
warns you, the more the better. Quasar is your bet here.

3) Architectural niceness. Callbacks suck, and we all know it. CSP channels
can be seen as a nicer way to structure the flow of a large asynchronous
system. In this context I think core.async tends to be the best, because of
it's support for transducers and Javascript support. Although Quasar/Pulsar
would not be a bad second choice for this because they work outside of a go
macro, assuming you only need them on the backend.

------
rdtsc
> The primary advantages center around the ergonomics of concurrency.

That is how most "frameworks" or those that try to copy this aspect from
Erlang see it. But the trick is in Erlang this actor pattern is used for
fault-tolerance just as much. That was the goal right alongside concurrency
initially. (The third one was priority of low latency responses, I believe).

That fault tolerance is harder to copy and that is why most libraries and
frameworks give up and copy the "class object+thread+queue" and call it "We
have a fast Erlang now".

The closest to get to something like Erlang fault tolerance wise is to use OS
processes and IPC (via ZMQ), with some serialization. But it would be hard to
run 2M of those on a reasonable machine.

Plus Erlang is not just the language (which is rather small and simple) but
the whole set of helper libraries and tools. Including a distributed database,
distributed application controllers. Support for rpc and so on. Supervisor
patterns you can use etc.

------
heavenlyhash
I can't help but wonder if the article of the author has seen Quasar:
[http://docs.paralleluniverse.co/quasar/](http://docs.paralleluniverse.co/quasar/)
It seems to be a concrete refutation of his claims of impossibility. Quasar
successfully brings green threads to the jvm. It includes both channels as are
now popularized by golang, as well as higher level patterns like actors.
Despite the young nature of the framework, benchmarks show it comparing
reasonably well with both golang and erlang. Quasar also provides libraries
that step all the way up into the OTP realms of supervision trees (though I
haven't myself used this, yet).

The article mentions bytecode weaving, but dismisses it with very wavery
justifications. Bytecode manipulation tools are a successful part of the jvm
ecosystem. Frankly, they're part of _why_ I consider the jvm ecosystem so
successful: bytecode manipulation has allowed things like:

\- third party tree-shakers/minifiers and obfuscators (i.e. proguard)

\- cross compilers (i.e. robovm)

\- concurrency libraries that DO have real green threads and continuations
(i.e. Kilim, Quasar, and others)

\- code coverage and complexity analysis tooling (i.e. jacoco)

\- scala

\- clojure

\- groovy

\- kotlin

\- [... more languages ...]

There are two critical points about the above:

\- All of these tools were built without direct cooperation with the compiler
and core tool chain. That means experimentation and growth were possible from
the community.

\- Everyone's tools play nice with each other! You can use Quasar as a library
in Clojure and then feed that bytecode into Proguard for minification, and
then add code coverage instrumentation, and then feed it into Robovm!

Given the wild success of bytecode and bytecode manipulators, I have no idea
how the article can so whimsically poo-poo the entire field.

(Yes, I'm well aware Erlang has a VM that allows alternate languages as well.
And yes, Elixer is pretty. OT, no, I won't be making investments of my time
into Elixer, because I like strong compile-time type systems, and Elixer
doesn't have that.)

It is true that even in the presence of a full greenthreading tool like
Quasar, code can call legacy APIs that still block a full thread, but this is
not sufficient cause to dismiss the possibilities. To quote back part of the
article, blocking will always be an issue in any cooperatively multitasked
environment: "There’s no real way to limit what that code can do, unless it is
explicitly disallowed from [...] looping." And yet I wouldn't claim Erlang
fails to give me concurrency just because it still allows loops! Part of the
compromise of cooperative multitasking is the very premise that in exchange
for the higher performance possible from cooperative code, yes, poorly written
code can suck up arbitrary amounts of CPU before yielding. If this were a
practical concern, it would also be entirely possible for a bytecode
instrumenting library to inject cooperative rescheduling points even into
loops; and yet I have no real desire to see this feature.

Furthermore, I strongly object to the claim ForkJoin is "notorious for its
overhead". All thread synchronization is notorious for its overhead. That's
completely known to any programmer with experience in this area, and in no way
unique to ForkJoin.

For an excellent, in-depth coverage of what exactly ForkJoin is and the
problems it solves for you, see
[https://www.youtube.com/watch?v=sq0MX3fHkro](https://www.youtube.com/watch?v=sq0MX3fHkro)
. I highly recommend watching the entire thing despite its length, and even if
you are not a JVM programmer -- even if you've been doing concurrent
programmer for years, you will almost certainly walk away knowing
significantly more about concurrent scheduling from the (relatively) high
levels of memory fencing all the way down to CPU architecture choices and
their impacts.

I'm not going to claim there are no issues with something like Quasar. In
particular, I find that it _is_ harder to operate in an ecosystem where very
few existing libraries understand what your application is trying to do with
green threads. Mostly, this doesn't phase me if my application is calling out
to other libraries, because I control the scheduling one step above them (just
like I would in a plainer actor framework without green threads like Akka).
The problem is more with "hollywood" style frameworks -- the "don't call me,
I'll call you" type -- so far it feels like these are very hard to use when
your application is using green threading, but the calling framework has no
clue about it. Some sort of interfacing code is required and usually has
thread handoffs of its own, which can be moderately unpleasant, and limits
your scalability at that juncture. But this is a present-tense bummer, and can
be solved by patching (or outright replacing) these hollywood frameworks, or
simply avoiding frameworks of that kind altogether.

But in short, I still think it's a bit unreasonable to dismiss the existence
of ponies.

~~~
pron
Author of Quasar here and apparently the target of the criticism in the
article. It's kind of hard to make out the main claim the author has, but let
me respond to the few more specific claims:

1\. ForkJoin is not "notorious for its overhead". In fact, it is among the
best implemented, best performing work stealing schedulers out there.
Scheduling a task with ForkJoin takes a few nanos, and is almost as cheap as a
plain method call. Don't take my word for it: go ahead and benchmark it.

2\. Like Go, Quasar doesn't constrain the running code from mutating shared
state -- if you're using Quasar from Java, that is. But it's still just as
useful as Go, and when used from Clojure, it's even more flexible than Erlang,
and actually quite safe.

3\. My macbook isn't cruddy.

4\. The stuff possible with Quasar, like running a plain Java RESTful service
on fibers to gain a 4x increase in server capacity -- without changing the
code and without even starting to parallelize the business logic with
actor/CSP -- speaks for itself.

5\. I'm not spreading FUD on threads -- you can watch my talk at JVMLS (linked
in the article) to see my precise point: kernel threads cannot be used to
model, one-to-one, domain concurrency, because the concurrency requirements of
modern application (and the capabilities of modern hardware) exceed by several
orders of magnitude the number of threads supported by the kernels. Fibers
keep the (excellent) abstraction provided by threads as the unit of software
concurrency, while making the implementation more suitable for modern soft-
realtime workloads. When your average programmer can spawn up a (lightweight)
thread without thinking about it -- say one for each request, and even many
more, concurrency becomes a lot easier.

6\. The linked Paul Tyma slide are completely irrelevant. I've got nothing
against doing kernel-thread-blocking IO. The problem becomes writing simple,
yet scalable code to process incoming requests. Modern hardware can support
over a million open TCP sockets, but not nearly as many active kernel threads.
Asynchronous libraries give you the scalability but fail on the simplicity
requirement; fiber-blocking IO gives you both the performance and the
simplicity of blocking code.

7\. As to the "strawman benchmark" with "too many threads", the author is
welcome to repeat the experiment using a thread pool with as few or as many
treads as he'd like -- the result would be the same: switching kernel threads
costs about 10-20us, while task-switching fibers costs 0.5us (and can be
improved).

> few existing libraries understand what your application is trying to do with
> green threads

That's exactly the purpose of the Comsat project, which integrates existing
third-party libraries with Quasar fibers. You're right, integrating "inverted"
frameworks does require more work, but so far Comsat integrates, servlets,
JAX-RS services and Dropwizard.

[1]: [http://blog.paralleluniverse.co/2014/05/29/cascading-
failure...](http://blog.paralleluniverse.co/2014/05/29/cascading-failures/)

~~~
hrjet
A tangential question:

Does any of this benefit a desktop app? I realize that most of the green-
thread interest lies in asyc i/o and i/o bound workloads. But can a desktop
app with a couple of dozen threads (i/o + cpu mix loads) gain something from
Quasar?

~~~
heavenlyhash
I'd say Yes.

A) Frankly, channels result in prettier, more maintainable code. I've seen
enough questionable uses of LinkedBlockingQueue to last me a lifetime.
Inability to so much as "close" a BlockingQueue in the face of multiple
concurrent consumers is an unbelievable cramp -- it won't bother you until it
does, but when it does, it's just a bellyflop-onto-concrete sort of sensation.

B) I'm even more pessimistic than Pron's sibling response about scalability of
threads. A minecraft server with a even a few dozen concurrent players is
starting to feel the limitations of naively scheduled threads, as an anecdote.
Part of this comes down to the choices of concurrent data structures, how
interaction with shared data strucutures is batch and the resolution of locks,
the devil is in the details etc etc etc, but I'd venture that the abstractions
with green threads and channels make good code a heck of a lot easier.

Truly trivial apps with one "compute" thread and one "UI" thread are unlikely
to see serious performance gains. Similarly applications that have workloads
that are highly parallel (say, somewhere around $num_cpus threads which
exchange information only once every few hundred millions of cycles --
spitballing a bit, but for context I think the Doug Lea talk I linked in
earlier
([https://www.youtube.com/watch?v=sq0MX3fHkro](https://www.youtube.com/watch?v=sq0MX3fHkro))
mentions thread unpark can take up to a million cycles in a worst-case
scenario) are unlikely to see serious performance gains. So there are
situations where green threading can't help you from a purely performance
perspective, yes. But in practice, it's my observation that it's startling how
quickly "simple" apps end up doing enough concurrent UI or network operations
that naive threading starts getting unpleasant.

------
pkinsky
I use Akka Scala/Java JVM actor framework daily, and it manages to get by
without bytecode weaving. The lack of type safety is irritating, though.

~~~
jeremyjh
It's very easy to mess this up though - either by blocking the thread or by
capturing a reference to the mutable state of the actor in a closure.

~~~
jshen
how often does it happen in practice?

------
InfiniteRand
Very interesting article and analysis, although it would be nice if it
explained what exactly a "Green Thread" is. From the article, I am guessing
that a "Green Thread" is related to the lightweight low-level concurrency
mechanism that he is referring to as the alternative to normal threads, but it
is not exactly clear what that really means.

~~~
noelwelsh
A green thread is a "lightweight" thread, meaning a thread that is managed by
a user level process, not by the OS. This main advantage is you avoid OS
thread context switch time, which is comparatively very large. The
disadvantages are:

\- you have to balance load across true OS threads to take advantage of
multiple CPUs. (You often pin an OS thread per CPU.)

\- if you make a call to a blocking OS function you have no way to pre-empt
your lightweight thread.

HTH.

Update: "you have to balance load" \--> I mean the green thread library
implementer. Users of a lightweight threading library typically don't concern
themselves with this, though they might if performance becomes an issue.

~~~
MCRed
Neither of those disadvantages exist in the erlang system, as the scheduler
spreads processes across OS processes. I can't speak for "Actor Model"
libraries, though.

~~~
Cr8
Sure they do.

Erlang spreads processes across OS threads, like most other green-threading
impls, and its not always great at it. (I don't know what you mean by "across
OS processes." Processes on different Erlang VMs can communicate but the
scheduler isn't going to move processes between them)

Calling into native code isn't an easy problem in Erlang either. NIF calls
will block a scheduler thread, but the scheduler knows nothing about how long
a NIF call is expected to take and will happily queue up processes to be run
on a thread that is blocked inside a NIF call.

"Regular" erlang I/O is done by queueing up requests to be fulfilled by .. you
guessed it, a pool of threads that spend most of their time sleeping in
blocking i/o calls.

------
playing_colours
Can anyone advise please books / blogs / videos to learn the theory behind
concurrency, threading, Green Threads, how they are implemented?

~~~
mjstahl
To see the earlier work by Rob Pike on the ideas that would eventually turn
into golang take a look at this paper:

[http://www.cs.bell-
labs.com/who/rsc/thread/newsquimpl.pdf](http://www.cs.bell-
labs.com/who/rsc/thread/newsquimpl.pdf)

A good overview (of CSP) document is written by Russ Cox:

[http://swtch.com/~rsc/thread/](http://swtch.com/~rsc/thread/)

Both the above articles are more focused on implementation as opposed to
theory.

~~~
playing_colours
Thanks!

------
jerven
Wonder what the original author thinks of
[http://erjang.org/](http://erjang.org/) or erlang on the jvm.

~~~
jallmann
Erlang on the JVM as anything more than a toy is fundamentally
misunderstanding Erlang -- which basically reinforces the article's central
thesis. The article mentions the need for lightweight concurrency support to
be baked into the platform -- the Erlang VM is the epitome of this. Idiomatic
Erlang spawns a large number of isolated, concurrent processes, and lets them
crash when things go wrong, with supervision trees to recover and restart
processing. If a single JVM thread crashes, the whole VM goes. Additionally,
you also lose secondary benefits such as per-process heaps/GC, etc. These
things are impossible to cleanly graft on to the JVM.

~~~
mike_hearn
Er, you can easily write a Java thread that just terminates or restarts itself
if it crashes ...

------
mateuszf
Anyone using core.async knows how this article relates to it?

~~~
mossity
The points about the JVM not being able to guarantee that your code won't
block the thread apply; it's left up to you to do it. This doesn't come as any
surprise to me, nor I would guess to most users of the library, so I'm not
sure this is really that damning. Core.async doesn't use bytecode weaving or
fork/join, so those criticisms don't specifically apply.

~~~
MCRed
It's damning because, rather than outsource the issue like the library makes
you think you are doing (or like anyone writing erlang code actually is doing)
you still have to deal with the hassle and the risk, so you're not really
buying much.

~~~
village-idiot
Except you're using it Clojure, which makes it easier to avoid that kind of
stuff and easier to spot when you count on Java APIs directly.

------
_random_
So, CLR and Mono would be better?

PS: at least not the V8/Node.

