
No More Callbacks: 10,000 Actors, 10,000 Threads, 10,000 Spaceships - pron
http://blog.paralleluniverse.co/post/64210769930/spaceships2
======
ChuckMcM
Nice exemplar. Back when Java was being created, James Gosling was pretty
insistent that concurrency be lightweight and scalable. When I ported it from
SunOS 4 so Solaris 2.0 I had to move from the really light weight
setjump()/longjmp() threads that he had implemented, into the thread system
that Solaris had defined. There was a huge negative impact on performance (as
I recall about 15x slower). That sucked because one of the coolest demos at
the time had a little world in it where 'Fang' (the Java mascot) lived and a
bunch of things in that world were all animated with threads. Looking at the
'fiber' model for threads I think they are much closer to what we should have
done in the first place.

The thought was to have a billion threads on a SPARCStation 10 (that is like
an old Pentium machine now). We never got close but it was a great goal.
Definitely going to have to go back and revisit this topic now. Thanks for the
excellent demo to play with!

~~~
rdtsc
I got to about 500K processes on Erlang's VM on an i7 with lots of memory.

These people people got up to 2M concurrent TCP connections

[http://blog.whatsapp.com/index.php/2012/01/1-million-is-
so-2...](http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/)

And on top of it, it is using isolated heaps. That is beautiful I think.
Completely concurrent GC is beautiful.

I know Erlang syntax is not to for everyone's taste (I do like it though).
There is also Elixir ([http://elixir-lang.org/](http://elixir-lang.org/)). But
the underlying BEAM VM is an awesome piece of technology.

> The thought was to have a billion threads on a SPARCStation 10

There are some green-thread C libraries I've been playing with, like
Protothreads (
[http://dunkels.com/adam/pt/index.html](http://dunkels.com/adam/pt/index.html)
) and libconcurrency (
[https://code.google.com/p/libconcurrency/](https://code.google.com/p/libconcurrency/)
) I think they use setjump/longjump trick. Some use setcontext()/getcontext()
POSIX call.

I still like Erlang VM best.

~~~
HowardMei
Erlang is really good, especially for engineers with hardware background, who
may find FP & Message Passing are more intuitive than OOP & Context Switching.

But it's quite difficult to hire.

~~~
reginaldjcooper
Difficult as in you are looking for Erlangers but cannot find them? I haven't
seen many postings looking for Erlang programmers.

~~~
yapcguy
Seems to be the same thing with Haskell and other languages, including Golang,
that many groups on HN are passionate about. There aren't many people hiring
for skills in these languages. Some start-up founders I know who have built
their company on Python and Javascript, when I ask them if they have looked at
Golang, they reply 'What's that?'.

~~~
skriticos2
I'd be happy if someone would hire Python. Most of it is Java, PHP and C#
around here. So sad.

~~~
wtracy
Where's "here"? I constantly get emails from recruiters looking for Python
devs.

(I'm in the SF area, but a lot of the ads I see are for New York and parts of
the Midwest.)

~~~
yen223
Quite frankly, "here" could be anywhere that's not near the Bay Area.

~~~
skriticos2
I think so too. I'm located in Germany, but I think this situation applies to
most all of the word except the west coast of the U.S.

------
jaimefjorge
Well written, good description and nice demo.

Would love to see more on how this is different to (or better than) Akka. The
programming model is actually close to Akka (with actor systems, supervision,
receive method, message passing, etc).

The article states that Akka has no true lightweight threads. The guys behind
Akka have put it running with 50M messages/second[1] and perfomance vs erlang
seems to be good as well [2][3].

Perhaps a benchmark would be great.

Thanks for sharing.

[1] [http://letitcrash.com/post/20397701710/50-million-
messages-p...](http://letitcrash.com/post/20397701710/50-million-messages-per-
second-on-a-single-machine)

[2] [http://uberblo.gs/2011/12/scala-akka-and-erlang-actor-
benchm...](http://uberblo.gs/2011/12/scala-akka-and-erlang-actor-benchmarks)

[3] [http://musings-of-an-erlang-
priest.blogspot.pt/2012/07/i-onl...](http://musings-of-an-erlang-
priest.blogspot.pt/2012/07/i-only-trust-benchmarks-i-have-rigged.html)
(discussing millions of messages is a good signal IMHO).

~~~
pron
The main capability provided by Quasar is the fiber, or the lightweight
thread. It is the same as a normal Java thread in the sense that it can block
– on IO, on a DB call, or on a synchronization mechanism. This makes the
programming experience very natural. The actor and the channel abstractions
build upon fibers.

Akka doesn't have lightweight threads at all. You implement a message-handling
method, but it must not block on, say, a DB call, lest it block the entire
thread it runs in. An Akka actor simply must not issue a DB call: it's as
simple as that.

With Quasar things are different: you pull messages rather than implement a
callback; you can block: on IO, DB, lock or anything else. The programming
then is not only simpler, but also more powerful. For example, Quasar supports
selective receive - just like Erlang.

~~~
mark242
_An Akka actor simply must not issue a DB call_

That's untrue. With Akka's pipe pattern, you can take the results of any
future and pipe it back to the sender, including using a map on the future if
you like. This is how we do reactive database calls in Akka. For example:

    
    
      def receive = {
        case msg => {
          val f = future { myDatabaseResult(msg) }
          f map { result => myTransformResult(result) } pipeTo sender
        }
      }
    

At no point does this actor block. Assuming you even have something like Play
calling this actor, you wouldn't be blocking there, either, you'd take the
result from the actor, likely map it to a result, and Play would
asynchronously return that. My basic rule is that if you're typing
Await.result anywhere in your Play/Akka code, you're doing it wrong.

~~~
pron
There are many ways to do asynchronous programming employing functional
approaches. The difference is that with Quasar you can use them if you like
but you don't have to. You can issue a plain-old JDBC call, and at no point
will the thread block, either, but the actor will: because it's simple,
familiar and intuitive. You don't need to learn so many unfamiliar patterns.
You program as you normally would a single thread.

~~~
dxbydt
Ok, so issuing a blocking call is "simple, familiar and intuitive". Invoking a
Future or a Promise is "so many unfamiliar patterns".

Yes Sir, with this attitude I hope to make a remarkable progress in my tech
career :) Seriously, there is nothing mysterious or magical about shoving a
"plain-old JDBC call" into a Future.

[http://en.wikipedia.org/wiki/Future_(programming)](http://en.wikipedia.org/wiki/Future_\(programming\))

Remarkable demo, btw. But lets not run down other approaches simply because
one might, god forbid, have to "learn so many unfamiliar patterns".

~~~
MrBuddyCasino
Well he is kind of right. I'm familiar with enterprise IT, and there are very
mediocre programmers at work. I'll bet you most of them have never heard of
"futures". Sad, but thats how I experienced it.

~~~
rdtsc
> I'm familiar with enterprise IT

I am also familiar with snobby wannabees functional programmers who instead of
opening the goddam file and reading it are creating homomorphic endofunctors
wrapped in futures with double memoization and distributed locks, so that
nobody on the fucking team knows what's going on.

These people are 10x more dangerous than mediocre programmers who just find
the simplest way to get the work done and ship the product.

Eventually 1% of the wannabes might get enlightened and realize that simple
basic code is usually better than using every single programming concept
wrapped in 100 lines of code that nobody (including themselves 2 weeks later)
can understand.

~~~
jacquesm
That's quite a feat of mind-reading you performed there. The fascination with
technology rather than just to solve the problem at hand via the shortest
critical path is a thing that has been puzzling me for a long time. At some
level technology is so fascinating in its own right that the temptation to
lose sight of the goal is ever present and many people succumb to that
temptation.

Imo it's just another variation on the Yak Shaving theme with a dose of
procrastination thrown in for good measure.

~~~
discreteevent
“Well, Mr. Frankel, who started this program, began to suffer from the
computer disease that anybody who works with computers now knows about. It's a
very serious disease and it interferes completely with the work. The trouble
with computers is you _play_ with them. They are so wonderful. You have these
switches - if it's an even number you do this, if it's an odd number you do
that - and pretty soon you can do more and more elaborate things if you are
clever enough, on one machine.

After a while the whole system broke down. Frankel wasn't paying any
attention; he wasn't supervising anybody. The system was going very, very
slowly - while he was sitting in a room figuring out how to make one tabulator
automatically print arc-tangent X, and then it would start and it would print
columns and then bitsi, bitsi, bitsi, and calculate the arc-tangent
automatically by integrating as it went along and make a whole table in one
operation.

Absolutely useless. We _had_ tables of arc-tangents. But if you've ever worked
with computers, you understand the disease - the _delight_ in being able to
see how much you can do. But he got the disease for the first time, the poor
fellow who invented the thing.”

― Richard P. Feynman, Surely You're Joking, Mr. Feynman!

------
CookWithMe
My first thought was "why don't they use Akka"?

> Akka has no true lightweight threads (the actors are actually callbacks)

Would you care to elaborate? I'm not too familiar with the internals of Akka,
but they definitely don't use "heavyweight" threads (which I assume are
threads that are 1:1 mapped to OS threads).

Also, I didn't get "the actors are actually callbacks". Yes, there may be
callbacks involved internally (why not?), but there is a big difference
whether I am sending a message to an actor (which may be processed at any
time) vs. calling a callback (which is immediately executed on the very same
thread that I'm running on).

Sorry if this sounds dismissive, but I'd really like to learn why you choose
to implement your own solution, because you've obviously put some time into
evaluating what is out there.

~~~
hp
[https://github.com/scala/async](https://github.com/scala/async) is the
syntactic sugar to write sequential nonblocking code in Scala (no callbacks).
Though functional-style code works well also if you know it.

~~~
pron
[cloned comment]

Essentially, Quasar provides async and await for all JVM languages. async is
called `Fiber.start()`, and await is called `Fiber.park`. Other than working
for all JVM languages, Quasar fibers are more general in that they can spawn
many functions (they have a stack), while async is limited to a single
expression block. Because of this, we can hide the "await" deep inside the
JDBC call stack.

Under the hood, they are similar: both instrument your code. Only async does
this at the language level (it's a Scala macro), while fibers do it at the
bytecode level.

------
Morgawr
I'm going to be "that" guy and ask... why actors? Why not agents?

The concept of agents (as defined by Rich Hickey in a lot of his Clojure
talks) is all about a globally shared, immutable and persistent state on which
you can act upon.

With actors you still need to have the actor handle its own mailbox of
requests and then handle them, the actor has to define its behavior.

With agents you don't have to ask for the world to stop to communicate, you
can read the current snapshot of the world (aka no request to view the state,
no database queries) and send transformation functions on the data of that
specific agent, which will be then processed by the agent's thread in an
ordered way.

I'd love to see more insight on the choice for this, it's interesting as I am
currently working on a similar project.

~~~
rdtsc
> I'm going to be "that" guy and ask... why actors? Why not agents?

Because actor is an establish paradigm and that has been around for a while. I
haven't heard Hickey's talk on "agents" but based on your description, how are
agents radically different?

What stops actors from reading the snapshots of the world? They can 1)
subscribe to "world" actor and get publication when it changes or 2) if
database is immutable (I guess you are hinting at Datomic or Clojure's
datastructures here?) an actor can also call a function. Remember actors in
practice are there to help isolate concurrency contexts. Reading truly
immutable data is safe so actors could just periodically read this immutable
data (just think of the database a function). It would be awkward having to
process messages from mailbox, timing out, then reading world state, process
world state, going back to processing messages. Etc. I like 1) better.

> the actor has to define its behavior.

How does an agent bypass defining its behavior? Doesn't an agent have a piece
of code that specifies what that agent does.

> and send transformation functions on the data of that specific agent, which
> will be then processed by the agent's thread in an ordered way.

So this basically centralizes the state of all the agents in one central
location that is an immutable database? Hmm interesting. It is a different way
of looking it at it I guess. Each actor usually handles its own internal state
privately. I guess we also assume that there is something underneath that
constantly distributes all these incrementing tree of states across a whole
system. I don't know I would rather think of actors explicitly choosing to
send their state to a system half way across the world rather than rely on
another layer of distributing state. Maybe it is just a matter of a mental
model here...

~~~
saryant
The best summary of actors vs. agents I've heard is Jonas Bonér's, one of the
head Akka engineers:

"With actors you send state to the behavior, with agents you send behavior to
the state"

------
IgorPartola
> Writing correct and efficient multi-threaded code is at once necessary and
> extremely difficult.

I do not agree with this. The original statement he is quoting says "can be
very challenging". Yes, if you are designing something very state heavy and
your design is somehow flawed or too complex then you can run into issues.
However, in most cases threads are no more complex than callbacks, actors,
etc. In fact, from what I've seen, concurrent code eventually all converges to
some semblance of the actor model anyways.

Where the actors/green threads/etc. really shine is having huge numbers of
them. OS threads still have very large overhead compared to lighter weight
green threads, so you can spin up many magnitudes more of them than you have
CPU cores.

Also, in lots of languages multi-core != concurrent. You can have 10,000
actors using a single core. In fact writing a scheduler that can efficiently
distribute actors between different cores is probably where the complexity
Doron Rajwan refers to lies.

~~~
chriswarbo
> However, in most cases threads are no more complex than callbacks, actors,
> etc. In fact, from what I've seen, concurrent code eventually all converges
> to some semblance of the actor model anyways.

Threads are the WorseIsBetter approach to concurrency; they're incredibly
simple to implement, but that just means that the difficulties are pushed on
to the users (ie. developers using the framework/library).

Threads may be a good idea for code which has no 'design flaws' and is not
'too complex', but as we all know everything has bugs and everything is more
complex than it seems. The arguments in favour of higher-level concurrency
models are basically the same as for tests and version control: if you don't
use them, you're making a dangerous gamble which may cause a large price down
the road.

Concurrency models like callbacks and actors can make dangerous things more
difficult; if we use the callback examples from the article:

> It’s hard for a programmer to reason about which line of code executes on
> which thread, and passing information from one callback to another is
> cumbersome as well.

Of course, this is the point of callbacks. The callback model tells us to
reason using function arguments and function calls, so of course we can't map
lines of code to threads, since neither lines of code or thread have any place
in a callback model. Likewise for passing data between callbacks; the problem
with threads is that everything is shared all of the time, which makes it
incredibly difficult to enforce invariants. When using callbacks, everything
is local by default and transfering data between threads requires explicit
channels, eg. free variables.

In the actor model the safety comes from messages having no ordering or
latency guarantees, so we can't assume that our data is always up to date.

With higher-level concurrency models we end up screaming at our IDEs as we try
to contort our code to fit the paradigm. This is how it should be, since this
means nothing's gone wrong.

With low-level concurrency models, the machine gladly accepts our dangerously
broken code, the number of interleavings is so huge that our tests never hit
an error case (or more likely, some of the bugs are so obscure that it never
occurred to us to test them). Six months later the application explodes and as
we sift through the pieces we find the true extent of the problem, and
discover that subtley corrupt output has permeated through every aspect of the
business and we can't anything that's been done since that code went live.

~~~
IgorPartola
> the problem with threads is that everything is shared all of the time, which
> makes it incredibly difficult to enforce invariants.

Your entire commet comes down to this, and my point is that this is not a
problem. Design your threaded code around a simple principle: one thread's
code must never touch another thread's data. Now you have safe threaded code.
If you want to add some limited well-documented cases where you break that
golden rule, go for it and reap the performance benefits.

There are some things that some threading models can be criticized for. For
example, POSIX threads cannot be killed if they get stuck. However, threads
are a powerful tool. The idea that you can share the in-memory code between
all your threads is great. Additionally, you can share state and _you_ control
how and when it is shared. Want complete isolation? Communicate via queues!
Want some shared state for performance reasons? Go for it! Want complete and
utter chaos that will blow up as soon as you look at it funny? Let threads
access other thread's data at will.

Your argument is similar to one that table saws are terrible because one
cannot guarantee that they will never cut off your fingers.

Edit: one other problem with callbacks. AFAIK, no implementation of callback-
based concurrency is able to take advantage of multiple hardware cores for
true parallelism. In the meantime OS schedulers already take care of
distributing OS threads between CPU cores, and some green thread
implementations do this as well.

~~~
JabavuAdams
So, you're not wrong ... but the table-saw argument is a straw-man.

There's a company that sells a revolutionary table saw with intelligent saw
stop precisely because experienced, skilled practitioners regularly cut off
their fingers.

In general "be smarter / do better" is not a reasonable prescription for large
numbers of people. Empirically, if people are fucking up, it makes sense to
analyze why and to give them automatic solutions to their fuck-ups.

~~~
IgorPartola
I don't see it as a straw-man as I see threads as a tool. Existence of the
actor model does not detract from the value that OS threads provide, the same
way that existence of Common Lisp does not detract from the value that C
provides. They are both tools. It's just that some tools are more dangerous
than others. In other words, I don't believe that threads are a "worse is
better" approach. There are things that can be improved about the specific
implementations of threading, but on the whole, the paradigm is far from
broken.

> Empirically, if people are fucking up, it makes sense to analyze why and to
> give them automatic solutions to their fuck-ups.

The problem is that other implementations of concurrency are not as widely
adopted and people tend to fall back on threads (especially OS threads) when
they really don't need them. But when you really do need threads, very few
things are a good substitute.

P.S.: I am aware of the table saw you refer to, and this is the kind of
improvement that tooling around threads could use. Note that this new table
saw does not completely re-design how you interact with the blade in order to
provide the safety.

------
regi
Interesting. I'm attempting to do pretty much the same thing in C:
[http://github.com/reginaldl/librinoo](http://github.com/reginaldl/librinoo)

~~~
sramsay
Well done, man. Seriously. Every time someone starts talking about how x
language makes it "easy" to do some kind of backflip, I start peering over the
fence. Then someone almost immediately implements it in a C library -- or
indeed, gets there first.

But have you given any thought to the critical and urgent problem of running
10,000 Actors, 10,000 Threads, and 10,000 Spaceships?

~~~
regi
We often forget that even though this common problem of replacing callbacks is
getting more critical and urgent, people already thought about it and gave
some solutions. Maybe not in higher level languages (although I think Go does
a great job there). In C, I have in mind glibc's ucontext for example. I'm
trying to improve that through rinoo. So to answer your question, if you look
at the wiki section you'll see test results I've done running 20,000 Actors.
Of course, once you handle "actors" correctly (which should definitely be
called fiber) you shouldn't use that many Threads (if too many you'll end up
spending most CPU cycles scheduling your threads). However, rinoo handles
multi-threading as well. I'm currently writing doc about it.

------
auvrw
concurrency --- albeit not at this scale --- is something that that you
sometimes have to deal at a low level with when writing android apps.
animating custom views, for example, often winds up involving direct use of
Runnable s rather than (what i assume are) system-level AsyncTask s. a lot of
the die-callbacks-die neatness on the java side of this relies on a coroutine
library, but that library doesn't run on android. there is a continuation
library that does > [http://commons.apache.org/sandbox/commons-
javaflow/](http://commons.apache.org/sandbox/commons-javaflow/) which could be
used to create coroutines and from there user-level threads

... but if we just want some generic kind of concurrency-niceness on a java
virtual machine, might it make more sense to use scala rather than write your
own lightweight thread library? is the user-space thread implementation really
necessary or even helpful if you're abstracting toward actors anyway? do these
questions even make sense to anyone?

~~~
pron
Quasar gives you fibers. On top of them you can build actors, Go-channels, or
data flow variables.

Scala gives you no advantage here. None of its concurrency constructs really
require Scala. There is no reason not to implement them in Java and use them
in any JVM language. More specifically, Quasar actors are more general and
powerful than Scala actors because they run in true lightweight threads and
can block. Also, a lot of people don't like Scala.

------
newobj
Title: "...10,000 Threads..."

Post: "...10,000 Fibers..."

sigh

~~~
rdtsc
Edit "...10,000 Actors..."

Comments "...10,000 Co-routines ... "

;-)

------
ericHosick
We are working on a fully composable frame and concurrency is done as follows
(upper-case = Object, lower-case = property):

AsyncRun ( part SomeObject )

multiple items can run in parallel like this:

AsyncRun ( part SomeObjectA SomeObjectB .. )

synchronization:

AsyncSync ( part AsyncRun ( part SomeObjectA SomeObjectB .. ))

locking a property:

AsyncRun ( part AsyncLock ( lockName = "someName", part = SaveUser ( ... ) ) )

On main thread (for UI/UX):

MainThreadRun ( part SomeObject )

------
vendakka
Looks very nice!

Does this play well with existing JVM threading support? More specifically, if
there is a call to a synchronized method inside of a fiber and another JVM
thread has entered the monitor, will this block the entire fiber scheduling
thread?

The reason I ask is I'd like something that plays well with legacy code.

~~~
pron
A synchronized method would block the entire thread, but calls to
ReentrantLock.lock, or any other java.util.concurrent class, can be turned
from thread-blocking to fiber blocking.

------
meowface
Is this similar to green threads / "greenlets" in Python? They look to be the
same concept.

~~~
fzzzy
One thing required for an Actor model that is missing from greenlets and
Python in general is the ability to have isolated contexts. Basically, each
Actor should have its own global state and shouldn't be able to share state
with any mechanism other than message passing.

In Python with greenlets, state can leak between green threads through module
globals and other module state.

~~~
meowface
Well, greenlets can and do have their own isolated contexts, but you're right,
they can indeed leak state. Thanks for the clarification.

------
mpweiher
And we nowadays have the hardware resources to run this on one CPU per
spaceship, at least theoretically:

[http://blog.metaobject.com/2007/09/or-
transistor.html](http://blog.metaobject.com/2007/09/or-transistor.html)

Needs some interconnect, of course...

------
stevefturner
I think I'm being dense... can someone explain the difference between a
'blocking' fiber and Ada's task/rendezvous constructs? Both seem like
synchronous message passing mechanisms?

------
EGreg
How does this compare with Grand Central Dispatch on the Mac?

~~~
roryokane
I can't compare the principles by which they work, but in terms of which to
choose, I think you will never have to make that decision, since GCD is only
for (Objective-)C programs, while Quasar is only for the JVM.

------
knodi
I'm not a fan of this approach. I like what Go does with channels and I like
what D does with synchronized functions. Its simple and powerful and no magic.
Fuck magic.

~~~
pron
Quasar gives you channels just like Go: you can have primitive channels, you
can select from several channels at once, or anything else you'd do with Go.
As a bonus, it performs better than Go.

~~~
tuxychandru
Have you published the benchmark source codes used for the comparison,
anywhere? I am interested in figuring out the bottlenecks that make go perform
worse.

------
dschiptsov
The other day some guys proudly re-implemented jemalloc in pure Java -
[https://blog.twitter.com/2013/netty-4-at-twitter-reduced-
gc-...](https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead)
now these guys re-implemented a half of Erlang.)

Isn't it better (and bitter) to face the reality and just use Erlang or Go or
at least to ask oneself why should everything be stuffed into JVM in 2013?)

~~~
pron
Go has nothing to do with it. The JVM is a superset of Go, and Go's strengths
lie mostly in a short startup time.

Erlang is a different matter. We love Erlang. But the JVM ecosystem is not
only two or three orders of magnitude bigger, but the JVM serves other
requirements as well like excellent performance (performance is not Erlang's
strongest suit, and more than a few Erlang projects require C code to meet
performance requirements).

Some projects will be best served by Erlang, but many will benefit from
Erlang's capabilities on the JVM.

In short – Erlang is awesome, the JVM is awesome, _extremely_ popular and very
successful. Why not combine their strengths? We already have a full Erlang
implementation for the JVM, and we think that Pulsar (Quasar's Clojure API)
really brings together the best of both Erlang and Clojure.

There are many technical advantages to using the JVM, too. Because it has
really good low-level concurrency constructs, you can implement state-of-the-
art concurrent data structures in Java. This is downright impossible in
Erlang, as in all pure functional languages. This is as it should be, because
these languages work at a higher level. The problem is that BEAM, Erlang's VM,
operates at the same level, too, so if you want to write a concurrent DS for
Erlang you'll need to do that in C. It's a lot harder than it may seem,
because many of these data structures require a good GC, and the BEAM the GC
can't help because it only manages process-private heaps.

~~~
dschiptsov
I am really appreciate your drive and effort, thank you for the reply.

In my opinion, however, as I could gather from the writings of mr. Armstrong,
(one of) the fundamental problem with JVM is that it lacks a process
isolation, and when it crashes, everything crashes completely. He explicitly
pointed out this in his thesis - JVM cannot provide fault-tolerance due to
being a mere user-level multi-thread process.

As a person who had experience of running huge Java crapware like Business
Objects I could tell that yes, it crashes and it crashes often, and when it
crashes there are situations in which there is no way to preserve data
integrity and plain re-installation is required.

I am also not quite sure about any superior concurrency constructs which
aren't based on OS primitives, but I am not Java guy.

Go is a way of doing things without a VM.)

~~~
pron
> In my opinion, however, as I could gather from the writings of mr.
> Armstrong, (one of) the fundamental problem with JVM is that it lacks a
> process isolation, and when it crashes, everything crashes completely. He
> explicitly pointed out this in his thesis - JVM cannot provide fault-
> tolerance due to being a mere user-level multi-thread process.

This is true in general, but not entirely accurate. When a Java thread crashes
it doesn't bring down the whole JVM any more than when an Erlang process does.
Just the one thread dies. With Quasar you get the same isolation for fibers.

It is true, however, that one thread in Java could negatively impact the
performance of another by triggering a GC, while in Erlang each process has
its own private heap. The Erlang approach (or the BEAM approach, rather, as
its a feature of the VM - not the language) provides this isolation because
Erlang was designed for systems where fault-tolerance is the number one
concern. But it has its cost, too. The lack of a global heap makes it
impossible to implement useful shared data structures, so Erlang provides some
simple shared data-structures (like ETS) implemented in C, but those aren't
garbage collected.

Also, the JVM has a big performance advantage over BEAM. That's why quite a
few Erlang projects need to code some performance critical functions in C. But
once you do that, you lose Erlang's isolation guarantees: a failed C function
could bring down the entire application, and one that's stuck in an infinite
loop _will affect_ the performance of other processes.

> I am also not quite sure about any superior concurrency constructs which
> aren't based on OS primitives, but I am not Java guy.

You can start by looking here:
[http://docs.oracle.com/javase/7/docs/api/java/util/concurren...](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-
summary.html)

None of these classes uses kernel mutexes or other synchronization mechanisms.

~~~
dschiptsov
> When a Java thread crashes it doesn't bring down the whole JVM any more than
> when an Erlang process does. Just the one thread dies.

I think this is inaccurate also.) Technically there is no memory protection
from one pthread to another, so "crashed" pthread could damage shared data or
the common stack. It is, however, not JVM's problem but of the pthreads as a
concept, and Armstrong argued that only share-nothing architecture (process-
based) could be fault-tolerant, and pthreads are just "broken by design".

~~~
pron
True, but this is not black-and-white, but a matter of degree. Erlang
processes also share memory: ETS. A crashed process could well leave an ETS
table in an applicatively illegal state. So isolation is a scale. With Quasar
we try to tip the scale closer to Erlang's isolation levels, but, as I've
said, shared data structures could be extremely useful, too.

If fault-tolerance is your most important requirement, that far exceeds in its
importance any other requirement, then by all means use BEAM. It was designed
for precisely that kind of application.

If, however, fault-tolerance is just one of several important requirements,
then the JVM will be the better choice in many circumstances.

------
frozenport
I think the approach is interesting but I don't understand how this considered
theoretical. 10,000 elements for an N-Body problem is expected.

What I am more confused about is how this considered peak optimization.

Assuming they are utilizing doubles and doing both read and write I get the
following computation:

(10000x10x8x2 bytes per second) or 12 Megabits per second vs the theoretical
bandwidth of a PCIe of 40 Gbs?

Are they computationally limited and what is their memory access pattern?

~~~
pron
This is far from an optimal simulation, because the framework is so general.
The spatial database gives you true isolated transactions that require (fiber-
blocking) locks.

In fact, the code is very naive, and that's our main point. Even naive code
can scale well with this approach. We care more about scaling than sheer
performance.

So your calculation is wrong, as we're not trying to approach a theoretical
limit (which would require optimizing the algorithm) but to demonstrate
scaling of a naive algorithm. For example, instead of a single spatial join,
each spaceship queries its surroundings: this is asymptotically (n^2 vs n)
worse than a single join.

------
perlgeek
>On my 4-core (8 virtual cores) i7 MacBook, with 10,000 spaceships, I get
close to 10 simulation cycles per second. [...]

> When running the simulation synchronously, i.e. with a phaser, performance
> drops to about 8 cycles per second on my development machine.

> Performance – we are able to fully exploit the computing power of modern
> multi-core hardware.

So, 25% faster with 8 cores is "fully exploit the computing power of modern
multi-core hardware". WTF?

~~~
wtetzner
When he says he's using a phaser, he means that updates happen in lockstep.
Each update still happens on multiple cores, but each fiber will not move on
to the next update until all of the other fibers have finished the current
update.

So it's not synchronous in the sense that it's running everything
sequentially.

~~~
pron
Exactly. Still parallel, but don't start the next cycle until the previous one
has completed.

