

Ruby’s GIL and transactional memory - ricny046
http://www.mikeperham.com/2013/12/31/rubys-gil-and-transactional-memory/

======
huxley
Pypy has a (currently, software) transactional memory branch that is trying to
remove the python GIL without changing the semantics

[http://pypy.org/tmdonate.html](http://pypy.org/tmdonate.html) (has the best
executive summary of what the branch intends to accomplish)

------
memracom
Just like with Python, why would you even care about the GIL?

Writing single multithreaded apps with low-level locking hardcoded everywhere
is now quite clearly NOT the right way to build software. If you don't use
locks, i.e. only use lock-free data structures and immutable state, then you
won't care about the GIL. And you can use multiple processes and interproccess
communications in place of threading. On Linux, the difference in performance
between threads and processes is very small. Most people who complain about
the GIL have not even profiled multithreading versus multiprocessing. They are
just bound and determined to reinvent the wheel in their own code base.

There is no reason why you can't leverage C++ (ZeroMQ) or Erlang (RabbitMQ) to
do the hard bits and write the rest of your app in nice simple Ruby (or
Python) scripts that are designed according to the Actor Model.

~~~
pmahoney
> Most people who complain about the GIL have not even profiled multithreading
> versus multiprocessing.

This is a fair point, but the issue might be about memory usage, not speed. A
unicorn setup might have two to eight worker processes to service HTTP
requests. Even with copy-on-write-friendly garbage collection, the memory
usage of each additional process is significant. On the other hand, a thread-
based solution (using JRuby, for example) can maintain a threadpool with
hundreds of worker threads because the cost of an additional thread is nearly
negligible.

~~~
nostrademons
Why would you need hundreds of worker threads? If you're using an event-loop
in each process (which you probably want if only to minimize context-switching
overhead, and is how Unicorn does it), then you need only one process per
physical core in the machine. Anything else will just sit in the runqueue and
cause context switches.

There is definitely annoying memory overhead with multiprocess (vs.
multithreaded) architectures, but it's on the order of 2x-8x, not 100x. And
that's 2x-8x the _code size_ of the application, not _data size_ \- you only
need duplicate interpreter objects, anything at the app or framework level
(like templates or data files) can be stored in read-only shared memory or
just COW'd with no writes. (It's technically not even every interpreter object
- a number of function objects are completely static data that will never have
additional references made, and so COW means they'll be shared perpetually
between processes.)

~~~
rubiquity
> If you're using an event-loop in each process (which you probably want if
> only to minimize context-switching overhead, and is how Unicorn does it)

Unicorn is a pre-forking multiprocess server so I don't know why it would be
using an event loop.

Why threads over processes? Because memory isn't cheap when you don't own it
yourself.

------
NatW
Moore's law-scale performance improvements have increasingly come from
harnessing multiple cores and much of Ruby is not taking advantage of this.
The GIL can limit performance on multi-core systems. This is a serious problem
worth tackling. Many on this thread defending just status-quo solutions seem
to miss this point.

------
twoodfin
I haven't read the linked paper yet, but if transactional memory can be used
in a feedback loop with the developer to identify sources of resource
contention (where, for example, more intricate locking could usefully replace
both TM and the GIL) that sounds like a real win.

~~~
memracom
To me locking means waiting which means greatly reduced performance. That is
the whole reason why event-based systems like NGINX and node.js are so
popular.

Look at the first slide from this talk by Professor Michael Stonebraker to see
why more locking is not such a good idea.
[http://blog.jooq.org/2013/08/24/mit-prof-michael-
stonebraker...](http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-
the-traditional-rdbms-wisdom-is-all-wrong/)

Just because we can do it doesn't mean that we should do it.

~~~
viraptor
> To me locking means waiting which means greatly reduced performance. That is
> the whole reason why event-based systems like NGINX and node.js are so
> popular.

I think you've got it slightly wrong. What do you think happens when one
request is being worked on in effectively one green-thread-equivalent in
node.js? Everything else is blocked until the current operation yields. That's
an equivalent of a GIL, yet it doesn't "reduce performance". If you switched
to multithreading + locking (making it run in a M+N scheduling model), you'd
gain performance, not lost it.

------
ksec
Ruby needs JIT, then Incremental GC much more urgently then getting rid of GIL
and getting TM.

------
geoffroy
I thought Ruby 2 standard library was thread safe now ?

~~~
adamtj
GILs are at a lower level than that. A GIL is used to protect the internal
state of the interpreter.

You (or your standard library) need ruby-level locks in ruby code when you do
things like increment a counter. E.g.: "obj.count = obj.count + 1".

The interpreter needs interpreter-level locks when doing things like looking
up attributes from an object's dictionary or hash table or whatever ruby calls
it. That's why you don't always need a lock around a simple assignment to
avoid interpreter crashes: "obj.count = 0" is safe. Without a ruby-level lock
around that, there is a race condition with the previous example, and
resetting the count to zero may not have an effect if you're unlucky.

Without a GIL, however, you could have problems like the interpreter
segfaulting (instead of throwing a nice execption that you can handle). You
may also have problems like that assignment turning into an infinite loop,
depending on ruby's implementation of hash tables.

And if accessing a hash table isn't safe, you can't even use ruby-level locks.
How would you create a lock? You'd need to access classes or functions in a
module. The module stores those things in a hash table, which you can't access
without a lock. That's what the GIL is for.

