
Implementing, Abstracting and Benchmarking Lightweight Threads on the JVM - dafnap
http://blog.paralleluniverse.co/2014/02/06/fibers-threads-strands/
======
justinsb
I noticed that you send the "native threading" case through your library as
well. Have you compared to just using "naive" Java - Threads and a
BlockingQueue?

Also: if the Google patches for the user-mode threading are adopted, will
Quasar have any advantages over a JVM that uses the same syscalls? Can you
explain where this would come from?

I think what you've done is genuinely cool, I'm just trying to better
understand what the 10x advantage actually comes from.

~~~
pron
The channels used are just like BlockingQueue. They're a queue with a
synchronized condition variable.

If user-mode threading is adopted, Quasar could work without instrumentation,
but instrumentation is a very small part of the Quasar code base.

Quasar gives you the scheduler, channels, actors, etc.

The 10x performance boost comes from the fact that there can be non-negligible
latency from the time you unpark a thread to the time it starts running, while
for fibers that latency is much shorter.

~~~
justinsb
I looked and your wrapper did look similar (and not obviously incorrect), but
if you plan on publishing this benchmark, I would suggest including a
comparison against raw Java code, for credibility's sake.

I guess I'm really wondering: let's say Java 9 mutexes use Linux 4's user-mode
threading syscall; in that case do we need Quasar's scheduler, channels,
actors etc. Or can we just use "good-old" threads and mutexes? Where would
Quasar's benefits come from?

It sounds to me like the subtext to Google's patches is that rather than
accepting the conventional wisdom that "threads don't scale", they've instead
just fixed threads.

~~~
pron
That's not how those kernel modifications work. You can't just use them with a
mutex. The idea is that a thread will be able to say, I'm yielding the CPU to
this other thread. When you unlock a mutex you don't necessarily want to park
yourself. These changes require either an app-level scheduler, or the use of
synchronization mechanisms that can better specify what you want in terms of
scheduling. An example for such a mechanism would be an API that says: I'm
sending a message to this other actor, but I'm going to wait for it to reply.
In this case, the implementation would tell the OS, switch me out and switch
that other guy in instead.

~~~
justinsb
OK, I was being sloppy in my phrasing (and probably thinking also)!

Trying again: Taking your example benchmark, you aren't really calling any
special methods that provide any hints for cooperative threading (to my
untrained eye). That's great - you've got a great abstraction. But then, what
opportunities for optimization does Quasar have, that are not also available
to a JVM using the magic syscall?

I'm sure there's something here, but I'd appreciate a hint!

~~~
pron
In theory? Absolutely none. But Quasar is here today (and also has an
excellent actor system, a nice Clojure API and more).

~~~
justinsb
Well, I appreciate the honesty!

I'm excited by the idea that threads are going to be "the right way", once
these improvements make it out of the 'plex.

I also like that I can get a similar API today with Quasar :-)

~~~
pron
Just to clarify: it's not that easy. The syscalls are the first step, and then
you'll need a scheduler. Once you have those two, you still need new
synchronization mechanisms and APIs.

Quasar doesn't just provide lightweight threads. It has rich libraries that
help you make the best of them.

~~~
justinsb
Yes, I'm thinking of the big picture. Might be more like Java 12 than Java
9... Or Quasar today!

------
donjigweed
"because it uses macros, the suspendable constructs are limited to the scope
of a single code block, i.e. a function running in a suspendable block cannot
call another blocking function; all blocking must be performed at the topmost
function. It’s because of the second limitation that these constructs aren’t
true lightweight threads, as threads must be able to block at a any call-stack
depth"

Can you elaborate on this a bit? Let's say I have a function called 'fetch-
url' which takes a core.async channel as an argument and makes a non-blocking
http request (say, using http-kit), and in the callback handler i put the
result onto the channel. If I'm in some other function, in which whose body I
open a core.async go block and call fetch-url from within that go block,
everything is still asynchronous is it not?

~~~
pron
If you're using callbacks at all, then you're not blocking. The main advantage
threads (and lightweight threads) have is that they can block.

What you can't do is this:

    
    
      (defn foo [ch]
         (go 
           (bar ch)))
    
      (defn bar [ch]
         (<! ch))
    

foo starts a go block which calls bar, which then blocks on the channel. For
threads that's ok:

    
    
      (defn foo [ch]
         (thread ; not sure about syntax here
           (bar ch)))
    
      (defn bar [ch]
         (<!! ch))
    

So a function running in a thread can call another function that blocks. A go
block can't, that's why go blocks aren't lightweight threads.

BTW, in Pulsar's implementation of core.async, the first example is ok, too.

------
RyanZAG
Any chance of someone putting together a benchmark for
[http://www.techempower.com/benchmarks/](http://www.techempower.com/benchmarks/)
for quasar? It would be nice to see how it compares to other techniques.

------
Fasebook
Wouldn't this kind of development target be better served by optimizing small
C/++ programs instead of trying to optimize to some abstract virtual machine
implemented on top of the hardware? I mean if speed really is your goal, why
not do it correctly instead of hitting yourself in the face with an extra tree
before starting?

~~~
pron
Why do you assume that the JVM adds overhead? While in some cases a program is
better served by C/C++ manual memory management and fine-tuned memory
alignment, this is not usually the case.

You can think of the JVM as a very good optimizing compiler that compiles your
program when you load it in a way that's tailored to your environment.

Also, when it comes to concurrency support, the JVM is usually years ahead of
C++ (lock-free data structures, etc.). If you're doing concurrency, the JVM is
usually a better target than C++.

~~~
kasey_junk
Not to mention the kinds of programs that would most benefit from lightweight
threads are high connection count servers. Precisely the kinds of applications
where the JVM weaknesses are most hidden (startup time, base level latency,
etc).

