

2012: The year Rubyists learned to stop worrying and love threads - bascule
http://tonyarcieri.com/2012-the-year-rubyists-learned-to-stop-worrying-and-love-the-threads

======
tinco
Ruby's big problem with concurrency is the mutability of everything. Ruby just
loves mutable state, it is in its blood. Not even the constants are really
constant, not even the classes make any promise about the future.

Embracing concurrency would mean to compromise there, in your code you have to
acknowledge that there are variables you could reference, but you shouldn't,
because they aren't threadsafe. This goes against the idea that ruby is this
beautiful abstract garden where everything is possible.

This deep_dup and deep_freeze make it easy for the programmer to create safe
objects, but they don't make it harder to use unsafe objects. I think this is
why they haven't been accepted in Ruby yet, and perhaps will not be, they just
solve a problem that Ruby does not want to go into, for the same reason Ruby
won't have a memory model that takes concurrency into account.

In my opinion, the only way Ruby should only ever integrate threads into the
language is by introducing a way to start a second thread that will execute
either a string or a file. It could return an object that allows sending
messages to this spawned thread. The message send method itself might itself
perform deep_dup or deep_freeze on the objects it receives. (without needing
to expose this deep_dup/deep_freeze method)

You might complain that evalling a string, or loading a file seems like an
evil way of going about things, but this is the only way to introducing code
into ruby that does not close over its scope.

An alternative to evalling would be to introduce non-closure blocks, but I
think their existance might break the principle of least surprise.

edit: btw this idea of spawning a second thread that returns an object that
can be used to send objects to another thread could already be implemented by
using ruby's fork method and a handle to some shared memory or a pipe.

edit: is there something particularly untrue about what I'm saying? is it
worth a downvote?

~~~
fzzzy
Not sure why somebody would downvote you? You seem correct to me.

The ability to spawn new global contexts and communicate only immutable
objects between them is fundamental to actor systems. (message passing)

Unfortunately most modern scripting languages do not make it easy or cheap to
spawn new global contexts. I hope this changes in the near future. (Lua is an
exception, I believe)

~~~
bascule
JRuby makes it easy to start as many scripting containers as you want:

[http://jruby.org/apidocs/org/jruby/embed/ScriptingContainer....](http://jruby.org/apidocs/org/jruby/embed/ScriptingContainer.html)

~~~
fzzzy
Awesome. What's the minimum overhead required for each global context?

------
jeremyjh
I really like Tony's article and appreciate all the work he has done on
Celluloid. I am in early stages of writing a multi-threaded server app using
Celluloid and Hamster as the basic libraries to deal with concurrency. So far
I have found them to be idiomatic and pleasurable to use. It may actually be
somewhat of a drawback, but Celluloid really can get out of the way to the
extent that you would not even realize as the client of a particular object
API that there is a message-based proxy in the middle of things. Still ,I like
it that I don't have a lot of infrastructure and ceremony in my code just to
be safely concurrent.

The GIL is like a boogey-man hanging over heads still. Its important to
remember though that we still get a lot of concurrency in MRI; if you are
bound in I/O you may not see a difference. My core app is not I/O bound and I
predict I'll see enough benefit in JRuby to use it. I find JRuby to be slow in
development; but library support is good and I'm presently planning to unit-
test in JRuby in parallel and deploy with it from the beginning.

------
3amOpsGuy
Do we really need threads? From my limited Ruby experience, it'll happily fork
new interpreters, it has connectivity with pretty much all major messaging
queue implementations as well as various serialising and networking libraries.
In short, talking to other processes is easy, even if they are a bit slower
than threads (but if speed is such an issue, it's unlikely ruby would be your
implementation language).

Threads only ever scale so far, when you need more processor cycles you'll
have to go off-host eventually. By adopting a multi-process model with data
shared over the network (with or without broker queue in between) you can
benefit the app's ability to scale and its robustness greatly.

For the non-compute intensive reasons to parallelise, non-blocking code often
performs better (e.g. chatty networking code) than threads anyway.

If threads aren't great (they aren't in Python), forget about them and move
on. There are other tools in the toolbox, with the bonus that the other tools
are actually better (in most if not all cases on unix like platforms).

~~~
tinco
Although you are right about threads only ever scaling so far, you need to
remember that network I/O has a rather large overhead.

If you always assume your code is going to be run over a network you might
miss an opportunity to efficiently solve some problems that might be solved on
just a single machine with a bunch of cores.

I think frameworks like celluloid allow you to deal with this elegantly, but
they need the help from the language to realize this potential, which is why
bascule requests these features.

An example: a computer game might be built concurrently by having the
rendering system, the two physics engines, any AI's and the main game loop
execute on separate threads. Obviously there is a bunch of information to be
shared between these systems with as little delay as possible.

~~~
sliverstorm
Simply put, if you map out storage levels like this:

L1 -> L2 -> (L3) -> Memory -> Disk/Network

These are orders of magnitude different in performance. Network can be faster
than disk, but not generally by an order of magnitude.

So, everything you know about memory vs. disk for performance ought to
translate fairly well to memory vs. network.

It's a good observation that extremely performance-bound jobs might want to
look to other languages, but avoiding a level of that data storage hierarchy
is no meager 2-3x speedup.

------
tel
I don't want to be "that Haskell guy" but that's all I could read from this.
Realistic multithreading and immutability are deeply tied. I'm very interested
to see how far the MRI community can come to getting decent multithreading by
implementing suggestions such as these... since my learned intuition is to
just throw out mutability and plan in that, much simpler and more limited,
sandbox.

~~~
fzzzy
It's funny, the article doesn't really even contain anything about threading.
It just has a bunch of band-aid solutions to tack on immutable message passing
on top of global mutable shared state.

~~~
bascule
Only one of the proposals had anything to do with immutability

~~~
fzzzy
Huh? Deep Freeze, Deep Dup, and Ownership Transfer are all strategies to avoid
multiple concurrent actors mutating the same objects at the same time.

Even the last proposal, which I think has to do with fine grained locking, is
still a strategy to avoid issues with mutable shared state.

------
lazzlazzlazz
I'm not an expert in this domain, but wouldn't the threading issues that have
impeded Python (and the removal of the Python GIL) also impede Ruby in the
same way? I've heard solutions like "freezing" and ownership transfer before,
but they're always more complex than they seem.

Thanks

~~~
chimeracoder
In short yes.

The longer answer is linked in a post above[1] - it describes the problems
with Python (CPython), many of which would apply to Ruby (CRuby/MRI) as well.

[1] [http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-
remova...](http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-removal-
patch-of.html)

------
radiospiel
I would love that, but I don't see a sensible way to get there.

deep_dup and deep_freeze solutions would have to dup/freeze the entire object
graph of an object in question, and this would have to include classes and
modules as well, including the Object, Class, and Module classes. This would
probably become a _very_ huge object graph.

One way to prevent this could be to explicitely freeze such objects at some
point during startup. This would still break a lot of code in the Rails world,
where dynamically adding methods to a class is just standard.

Another way could be to implement copy-on-write semantics for such (and other)
objects - if two threads share, say, a Class object, and one thread modifies
it, this modification should then only manifest itself in one class.

~~~
bascule
There's no reason that you would need to freeze anything but the state. Things
like classes represent the function associated with that state. I'd generally
say runtime modifications to the class hierarchy are BAD BAD BAD and you
should never do them and you should feel bad when you do them, but that's a
separate concern from concurrent state mutation. Detractors of OOP might wave
their hands and say OOP colludes function and state, but really they're
cleanly separated: it's the difference between (meta)class and instance.

Concurrent languages like Erlang allow you to swap function at runtime even
though they're mutable state.

------
danso
This was a much more thorough article than I was expecting, will have to
bookmark for later.

From the OP: > _At the end of the conference, Evan Phoenix sat down with Matz
and asked him various questions posed by the conference attendees. One of
these questions was about the GIL and why such a substantial “two dot oh”
style release didn’t try to do something more ambitious like removing the GIL
and enabling multicore execution. Matz looked a bit flustered by it, and said
“I’m not the threading guy”._

The fact that a lot of Ruby's development (at least MRI) is in a language
totally incomprehensible to me is part of my fascination with Ruby...I
remember there being some discussion awhile back about translating Matz's
original Ruby documentation for historical purposes...as it is now, some of
that design and thought process is probably still locked in Japanese. I'm sure
he's discussed it in postings and in conferences since, but did Matz have any
kind of intractable philosophical objections to threading, other than it being
a ton of work involved? That is, did he or any of the MRI team think that it
would take Ruby too far away from its original design goal?

~~~
1qaz2wsx3edc
I don't think GIL will change, especially with jRuby as a viable option. I
think Matz interested in linguistic expressions (ways to write code) and not
GIL/performance issues. I'm not justifying the decision either way. We might
see it someday, I hope.

~~~
zem
also, people experimented with removing the GIL in python and did not get any
benefit from it. [http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-
remova...](http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-removal-
patch-of.html) looks at some of the issues involved.

~~~
quux
That was really interesting, especially this part:

'Reference counting is a really lousy memory-management technique for free-
threading. This was already widely known, but the performance numbers put a
more concrete figure on it. This will definitely be the most challenging issue
for anyone attempting a GIL removal patch.'

If ref counting is so bad with threads, how does Objective-C do it
performantly?

~~~
bdash
While I've not measured the performance of the approaches, from reading the
Python patch discussed in the article it would appear that Objective-C uses a
more intelligent approach to maintaining the reference count in the face of
concurrent manipulation.

The patch to Python involves guarding every increment and decrement of a
reference count with a single pthread mutex. This pthread mutex would become a
major source of contention if multiple threads are attempting operations that
manipulate the reference count. Pthread mutexes are also a relatively
heavyweight synchronization mechanism, and their overhead would impact
performance even when the single mutex was uncontended.

In contrast, Objective-C uses more efficient means of managing the reference
count. The implementation of -[NSObject retain] uses spinlocks to guard the
side tables that hold the reference counts. There are multiple such side
tables and associated spinlocks in order to reduce contention if multiple
threads are attempting to manipulate the reference counts of different
objects. CoreFoundation, which provide the implementations of many common
types such as strings and arrays, uses an inline reference count that is
manipulated using an atomic compare-and-swap operations. This reduces
contention at the cost of increasing the storage size of every object of this
type.

------
dexcs
Great post, great explanations and my quote of the day:

"Well Matz, I’m a “threading guy” and I have some ideas ;)"

