

About concurrency and the GIL - telemachos
http://merbist.com/2011/10/03/about-concurrency-and-the-gil/

======
nupark2
I'm really tired of these disingenuous justifications and arguments to 'just
use processes!'

A GIL is an unnecessary limitation that precludes a huge swath of architecture
optimizations. It's there because it's difficult to remove once you've made
the incorrect choice to rely on it, not because a GIL is a good idea.

~~~
lacker
Is it really coincidence that two of the most popular scripting languages,
Python and Ruby, made this language design choice that you think is incorrect?

It seems more like a case of "worse is better". The GIL lets programmers
ignore a lot of parallelism bugs, so it lets people get their regular work
done faster at the cost of long-term scalability. That certainly seems to be
the "Rails way" and it is a philosophy that has led to popularity.

~~~
rbanffy
> it lets people get their regular work done faster at the cost of long-term
> scalability.

More like it lets them get their work done at the cost of having to use
processes or co-routines calling non-GIL code instead of threads.

------
jbert
With most web architectures, your state is outside the process (RDBMS, NoSQL,
filesystem, whatever persistent store you're using).

So the main benefit using threads (easy sharing of in-process state) goes
away. For web architectures, concurrent request handling seems to me to be
best achieved by multi-process, rather than multi-thread.

There are some minor benefits to threading (e.g. an in-memory cache of global
state can be shared amongst multiple threads, rather than being replicated
among multiple processes) but:

\- there are out-of-process cacheing technologies (memcached, redis etc)

\- threading doesn't come "for free", in that you need to pay a performance
cost in terms of locking in your interpreter and/or a code complexity cost in
terms of access to your shared data structures.

~~~
buff-a
_threading doesn't come "for free", in that you need to pay a performance cost
in terms of locking in your interpreter and/or a code complexity cost in terms
of access to your shared data structures_

vs

 _there are out-of-process cacheing technologies (memcached, redis etc)_

If you subject "out-of-process cacheing technologies" to the same measurements
(performance cost, code complexity), as threading solutions, what do you find?

~~~
jbert
> If you subject "out-of-process cacheing technologies" to the same
> measurements (performance cost, code complexity), as threading solutions,
> what do you find?

Yes, I think you find that shared state is difficult.

So it's the old process (private by default, shared as the exception) versus
threads (shared by default) split. imho, "less shared state" == "simpler". And
imho, "private by default" == "less shared state".

And you have an easier time of it the shared state helps you with concurrent
access. This is one benefit of RDBMSs (transactions) and also systems like
redis. The fact that your state is external means it is more likely to have
been designed to be conncurrency-safe.

~~~
antrix
I don't get this argument. If I replace my memcache logic with an in-memory
thread-safe hashmap, it will always be faster. I can't see how it can be slow.

As for 'simpler', as long as you aren't actually implementing the thread safe
map, it is simpler than using memcache too.

So as long as you use well written shared state implementations, (e.g. Java's
concurrent collections), shared state isn't as hard as you make it out to be.

~~~
jbert
> If I replace my memcache logic with an in-memory thread-safe hashmap, it
> will always be faster. I can't see how it can be slow.

Because in a threaded application context, you have to either:

\- have locking/synchronization on all data structures (performance cost)

OR

\- manage which of your data structures are shared and which are not
(complexity cost/race bugs)

With external, shared state (RDBMs, memcached, etc) you get fast in-process
access to your (private) data structures (no performance cost due to locking)
and a (mostly) concurrency-safe, explicit datastore.

And as pointed out elsewhere, to scale beyond a single process you need the
external state _anyway_.

There are other approaches to this, software transactional memory, functional
programming, clojure paradigm etc. But it's a hard problem and it's not clear
these are the right solution at scale.

~~~
buff-a
_manage which of your data structures are shared and which are not_

That is _exactly_ what you are doing when you decide which of your data
structures go in memcache and which don't.

~~~
jbert
> That is exactly what you are doing when you decide which of your data
> structures go in memcache and which don't.

Yes, apart from the fact that the ones you don't think about are private. i.e.
processes are "private by default" and threads are "shared by default".

Processes give you a safe default, threads give you a dangerous one.

This is the main (only) difference between threads and processes - and it is
the important one.

~~~
buff-a
So you concede Antrix's point that mutlithreading is not slower?

I'd just like to confirm that you are even able to determine when your
argument has been refuted, as you seem to think that that the proper response
is to just pretend it didn't happen and come up with a new reason instead.
That is to say, are you a reason factory supporting of an internally held
belief despite evidence to the contrary, or are you a rational sentient who is
in a discourse for the purpose of determining a common truth.

~~~
jbert
> So you concede Antrix's point that mutlithreading is not slower?

Slower than what?

I said (and you quoted): threading doesn't come "for free", in that you need
to pay a performance cost in terms of locking in your interpreter and/or a
code complexity cost in terms of access to your shared data structures

For your reference, I stand by that comment and don't think I've said anything
to contradict it. For clarity, this is what I think:

1) a multithreaded (fine-grained locking) interpreter will run a single-
threaded workload slower than an interpreter with a GIL (reference from
elsewhere in this thread):
<http://www.artima.com/weblogs/viewpost.jsp?thread=214235>
[http://mail.python.org/pipermail/python-
dev/2001-August/0170...](http://mail.python.org/pipermail/python-
dev/2001-August/017099.html)

2) the threading programming model imposes a complexity burden on the
programmer, since all data structures are shared by default and so they must
think about every data structure and whether it can become shared in practice
(and so must be concurrency-safe or not)

Basically - either your interpreter locks everything for you (perf cost) or
you have to worry about it (complexity cost) or a bit of both.

I don't think that threaded access to an in-process data structure is slower
than multi-process access to memcached.

I do think that defaulting to private data and having explicitly shared data
is wise and is an easier programming model.

I hope that's clear. Please let me know if you think I've been inconsistent,
rude or done anything other than espouse these points in this thread.

ps. I found your last reply rude. Also I'm not trying to say "threads are bad
and you are bad for using them". I'm also not attacking your (or anyone
else's) integrity.

~~~
buff-a
Ok, its the former then.

 _Slower than what?_

The great thing about HN, and similar discussion systems you'll find on the
internet, is that you can read the conversation. So when I say "slower", and
reference "antrix", a sentient (or even a reasonable AI) could infer that I
was referring to this statement, by antrix:

>If I replace my memcache logic with an in-memory thread-safe hashmap, it will
always be faster. I can't see how it can be slow.

And that _you replied to by quoting it_.

Clearly, by reading the english therein, antrix is comparing the memcache
logic with a thread-safe hashmap. So in case you still aren't getting it, the
answer to your question "Slower than what?" is "than a thread-safe hashmap".
That you are not aware that this was antrix's assertion would explain why your
subsequent posts fail to refute it.

So your behavior is not merely that of a response factory, but a response
factory with a 1 deep context buffer.

~~~
jbert
The reason I asked "slower than what", is because at no point have I claimed
that going to memcached would be faster than a local hash (with locking).

Whereas I have (in this thread) claimed that an interpreter without a GIL (and
with fine grained locking) would be slower than one with a GIL (for single-
threaded workloads).

I wanted to know which you meant.

You haven't shown (I believe because it's not there) where I claimed that
going to memcached would be faster.

And you're being rude and trying to provoke a reaction.

And HN is hiding the reply link because it's heuristics have determined that
the signal/noise of these posts is likely to be low.

And I agree and so won't reply further in this thread.

------
drnicwilliams
Three quotes, hopefully not out of useful context, that seem incongruous:

"Rubinius is about the join JRuby and MacRuby in the realm of GIL-less Ruby
implementations"

"I spend my free time working on an alternative Ruby implementation which
doesn’t use a GIL (MacRuby)"

"I respect Matz’ decision to keep the GIL even though, I would personally
prefer to push the data safety responsibility to the developers. However, I do
know that many Ruby developers would end up shooting themselves in the foot"

So developers using GIL-less MacRuby, JRuby and Rubinius are prone to foot
shooting? I wish they'd blog about this more, I've never once heard a MacRuby
or JRuby developer blog saying "I went back to MRI because I needed my Ruby
code to be run more safely".

~~~
jballanc
I can't speak for JRuby or Rubinius, but in MacRuby one of the main reasons
that lack-of-GIL is not a more serious problem is because MacRuby has the
Dispatch library (based on libdispatch, a.k.a. Grand Central Dispatch) which
makes working with multiple threads safe again.

If you were to invoke Ruby's Thread library directly in MacRuby, you would
find that things get crashy rather quickly!

