

Scala vs. Clojure: Concurrency - swannodette
http://bestinclass.wordpress.com/2009/09/17/scala-vs-clojure-round-2-concurrency/

======
dkarl
_If you think that the Barber problem looks simple, odds are you haven’t tried
to implement it. There are quite a few things which can go wrong when we have
several agents working on shared data, primarily

Deadlocks Race conditions_

A bit off-topic, but I see statements like this all the time, and I always
wonder: why is it assumed that you're going to throw together an aggressive
design that shoots for theoretically optimally performance, and then trying to
ferret out the bugs? Ideally, the solution will be both correct and optimally
performant, but if you're worried that you aren't going to get it right, you
should start thinking about the best way to fail. It seems wiser to start with
a simple dumb solution you know is correct and work carefully towards a more
complex, better-performing solution. In that case you would list "crappy
performance" as the primary thing that can go wrong in a concurrent system --
but nobody ever does. They're worried about bugs. That means that when they're
done with their first attempt, they'll be confident in its performance but not
in its correctness.

Here's why I think it's better to start with a simple dumb solution and work
from there: you can stop short of an ideal solution. Maybe you only have to
optimize it well enough so that it isn't the scaling bottleneck. Or maybe you
can just deploy it and tolerate the performance limitations. If you start with
code that has the optimal performance characterists but isn't _correct_ , you
can't deploy it until you fix all the bugs. You've committed yourself to
creating an ideal solution, even though an ideal solution might not be
necessary.

~~~
gchpaco
The chief problem with this is that most programmers' intuitions are
calibrated for single threaded processes; when they think of "what could
happen" race conditions and deadlocks aren't there. This, I think, is because
they are usually emergent global properties, not localized errors, and we're
not used to looking for those or, frankly, especially good at them.

There's a very simple way to prevent all deadlocks and race conditions. It's
called in some circles the "Global Interpreter Lock" which converts the system
from concurrent to single threaded, and is usually the wrong way to deal with
the problem. These locks become performance hurdles over time, and people try
to replace them with finer grained locks, and what happens _every time_ is
that mysterious and weird concurrency errors happen, sometimes for years,
until things finally get ironed out. Happens in Python, Linux kernel, and
probably a million other places I'm not immediately familiar with.

~~~
joe_the_user
Yes, most programming is a matter of solving a sequential design problem -
what comes next in process of building this structure? Threaded programming
are more like solving a logic or algebra problem - given certain initial
conditions, what could or could not be the case?

I'm not sure whether we aren't good at threaded programming or whether
threaded programming is inherently harder.

~~~
gchpaco
I think the model we use is wrong. My current belief is that the probability
of races and deadlocks occurring is a function of the amount of shared data
and that conventional shared-everything threading implementations make this
far more common than it ought to be. In its extremus this points to a
distribution-capable asynchronous message passing architecture á la Erlang, or
at least something like Hoare's CSP.

The related criticism that there's too much mutable data is correct in many
ways, but can also be seen as a way of reducing the "data frontier"; if it's
immutable then there are no concurrency implications.

Now, that said, transactional models don't fit well into that mental model.
There can be problems with them--write sync was mentioned in the article--but
they seem to be less _inherently_ buggy.

~~~
sandGorgon
doesnt this inherently lower performance?

Message-passing rather than shared data structures implies you need to pass
around heavy copies of data.

Are Lock-free data structures (supported probably by atomic hardware
operations) the middle ground?

<http://www.audiomulch.com/~rossb/code/lockfree/>

~~~
gchpaco
Not necessarily, even on a uniform memory access machine; perhaps you pass
around a reference to an object that is located close to the data you want. On
a non-uniform memory access machine (which is to say all of them these days
when you take cache into account) message passing and the message passing
formalism can actually save you time.

Anyway, I don't a priori object to shared data structures, but it is important
to remember that they are inherently dangerous and the amount of shared memory
should be sharply limited to that which is actually necessary, rather than
blindly making everything shared.

Lock free data structures won't protect you from deadlocks or races
necessarily; they mean you won't have to lock, but you can easily accidentally
write a situation with two processes mutually waiting on each other.

------
gchpaco
It seems to me that part of the problem here is that the Scala solution
misuses mailboxes--not terribly surprising. I've been unable to find an
idiomatic Erlang solution, but I wouldn't be terribly surprised to find that
one a) exists and b) works.

~~~
amalcon
A not-terribly-idiomatic Erlang implementation I slapped together to test this
theory, and also to practice on silly problem:

    
    
      -module(test).
      -export([run/0]).
      -import(timer).
      
      -define(SEATS, 3).
      -define(DAY, 8*60*60).
      -define(PERIOD, 20*60).
      -define(CUTTIME, 15*60).
      
      barber(Shop, Count) ->
          receive
              {customer, P, N} ->
                  Shop ! accept,
                  io:format("(B) Cutting hair of customer ~w~n", [N]),
                  P ! haircut,
                  timer:sleep(?CUTTIME),
                  barber(Shop, Count+1);
              endtime ->
                  io:format("(B) Closing the shop ~n"),
                  Shop ! close,
                  barber(Shop, Count);
              closed ->
                  io:format("(B) Closed. ~n"),
                  io:format("(B) Served ~w customers. ~n", Count)
          end.
      customer(N) ->
          receive
              haircut ->
                  io:format("(B) Customer ~w got haircut ~n", [N]);
              leave ->
                  io:format("(B) Customer ~w leaving ~n", [N])
        end.
      shop(Seated, Barber) ->
          receive
              accept when Seated == 0 ->
                  io:format("Flagrant Error~n");
              accept ->
                  shop(Seated-1,Barber);
              close ->
                  Barber ! closed;
              endtime ->
                  Barber ! endtime,
                  shop(Seated, Barber);
              {customer, P, N} when Seated >= ?SEATS ->
                  io:format("(S) Incoming customer ~w turned away ~n", [N]),
                  P ! leave,
                  shop(Seated,Barber);
              {customer, P, N} ->
                  io:format("(S) Incoming customer ~w seated ~n", [N]),
                  Barber ! {customer, P, N},
                  shop(Seated+1,Barber)
          end.
      shop_helper() ->
          receive
              P ->
                  shop(0, P)
          end.
      run() ->
          S = spawn(fun() -> shop_helper() end),
          B = spawn(fun() -> barber(S,0) end),
          S ! B,
          make_customers(S, 0, ?DAY).
      make_customers(Shop, _, Time) when Time<0 ->
          Shop ! endtime;
      make_customers(Shop, Id, Time) ->
          Shop ! {customer, spawn(fun() -> customer(Id) end), Id},
          timer:sleep(?PERIOD),
          make_customers(Shop, Id+1, Time-?PERIOD).
    

This implementation is 68 lines long: twice the Clojure, but still less than
the Scala. A big portion of this difference is because the Clojure version
cheats, modeling the system as straight-up producer/consumer with queue size
constraints. The other two implementations spend about 20 lines dealing with
Customer actors, which the Clojure completely punts. The Clojure still wins,
but it's not as big a margin as it looks.

~~~
Avshalom
Actually 68 is about the same as scala. About 30 of the lines are stylistic:
one closing bracket to a line and some blank lines for visual grouping. As
well several bits of printed status that the clojure version elides.

------
jwhitlark
Joshua Bloch, [http://www.scribd.com/doc/33655/How-to-Design-a-Good-API-
and...](http://www.scribd.com/doc/33655/How-to-Design-a-Good-API-and-Why-it-
Matters), I believe had as one of his rules: "Example code should be
Exemplary" (I don't see it in the slides, but it was in the video as I recall)

Sample code lives forever; make sure you get it right!

Of course, I didn't go to the primary sources for the article, so I could be
completely wrong about the situation.

edit: While I really like clojure, that isn't all its syntax, less macros.
There's stuff like ` #() _ etc., although two out of three of those examples
are shorthand for things you can do with the syntax he listed (quote and fn).
I think you could easily fit it on a page, though.

------
jongraehl
The use of actors with mailboxes that annoys the author is:

1\. Take a message from the mailbox (that may have been queued for a long
time).

2\. If the mailbox still has more than N items, then throw away the message.

This is a really dumb way to approximate a fixed size mailbox, and doesn't
allow for an immediate refusal.

The questionable Scala code is pulled from recently written O'Reilly book on
Scala. I haven't read it myself, but it's probably not as good as "Programming
in Scala" (Odersky et al), which covers Actors+Concurrency in Chapter 30.

Needless to say, Scala also has access to all the Java concurrency primitives
and libraries. I believe there's a Scala STM library as well.

~~~
michaelneale
Yes I like the look of the clojure one, but actors are just one of many
(flawed) libraries in Scala, I am sure a similar solution could work there.

------
dschobel
Definitely one of the better postings I've seen on HN.

In very little space it teaches you a lot about clojure + scala + concurrency.
And it seems to me (coming from a scheme + java background, so with no
particular horse in the race) to be a rather honest comparison.

I'm eager to see what the Scala community responds with (and hoping they take
the time to show code).

------
JulianMorrison
That design is poor. There should be an actor modeling the seats (passed a
customer and as a non-blocking operation either seats them or ejects them) and
an actor modeling the barber (loops doing a blocking poll of the seats for a
customer and doing the haircut).

Mailboxes are not queues!

