

'We are no longer as optimistic about removing the GIL completely. ' - nailer
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Global_Interpreter_Lock

======
smikhanov
I don't want to put professional qualities of Google engineers under doubt,
but could someone explain to me why it's so complicated to remove GIL in
Unladen Swallow? As far as I remember, UnSw targets LLVM which is an advanced
VM with JIT compilation (i.e. not fully interpreted). Jython does (I'm not a
Jython developer, so may not know the details) nearly the same, but targets
the JVM and Jython does not have GIL as a result. What's the key difference
between LLVM and JVM in this regard?

~~~
cconstantine
The simple answer is that LLVM does not come with a garbage collection system,
and the JVM does.

~~~
sandGorgon
_snigger_ \- first clojure and now jython. And the news about high profile
websites moving to Lift/Scala from <whatever>.

Is the JVM the answer to life, the universe and everything ?

~~~
fauigerzigerk
Well, the JVM is a high quality, mature, very fast, cross platform runtime.
However, I'm always slightly concerned that letting the JVM in through the
front door might open the back door to the kind of SOP (Soviet Oriented
Programming) exemplified by Java EE. JSR hell is lurking behind the most
innocent-looking serverside library :-)

~~~
smikhanov
I can't see how JSR argument is applicable here.

Java suffers from JSR approval hell just because it tries to stadardize
everything. This is the requirement to perform well in the enterprise world,
along with the programmers certification mechanism and inclusion of the top
industry players in the standards approval.

If your language is just a hackers' toy (Clojure and Scala are exactly that)
then why do you care about JSRs in the first place? The JSRs don't mention JVM
usually, only the libraries or Java language extensions.

~~~
sandGorgon
now if only java had a package manager (not maven) - something like "gem" for
jar files.

That would be golden.

~~~
bokchoi
There is maven. Oh, and jpackage for rpm. And debian creates packages for
quite a few jars.

Hopefully this will get easier in Java 7 with the modularity work in Project
Jigsaw: <http://openjdk.java.net/projects/jigsaw/>

------
axod
How about teaching programmers to program without using threads?

edit: sure downmod me. It's crazy talk! How could programmers do without
threads and concurrency issues and all of the other blocking problems.
Hardware should handle multiple cores. Not programmers.

~~~
andrew1
I think you're being downvoted because people disagree with what you're
saying. In my experience you need multiple threads when you want multiple
things to happen at the same time. i.e. if I have a client/server architecture
and one client instructs the server to perform a long running task then I
don't want the server to appear frozen to all my other clients, which it would
if the server ran in a single thread. I don't really see how you can get
around this. Do you have a solution?

~~~
axod
>> "you need multiple threads when you want multiple things to happen at the
same time"

Computers don't work that way. Unless you have many CPUs, nothing happens at
the same time.

Rewrite your 'long running task' to do things bit by bit. By effectively doing
your own timeslicing, you remove the need for any locking or concurrency
issues. Once you get into the habbit of programming like this, you wouldn't
believe how much easier things are.

FWIW this is how Mibbit backend works - thousands of connections handled in a
single thread.

Javascript doesn't have threads (thankfully). There is no need for threads.
They _look_ like magic, but they cause more issues than they solve IMHO. The
mapping of 'work' onto physical CPUs should be done silently by the hardware
IMHO (If you have more than one CPU).

~~~
andrew1
I appreciate that nothing happens at the same time on a single core but CPU
time is shared between threads so it 'appears' as if more than one thing
happens at once. A good example of this is a web browser on a single core
machine - the browser does not freeze up while it is downloading data. That is
because the CPU time is shared between the UI thread and the other worker
threads.

~~~
axod
Possibly in some browsers.

An alternate (better) model would be to simply have a single thread with a
main loop, have async networking, and UI updates periodically in the same
thread.

    
    
      while(true) {
    
         networking.check(); // Check if any sockets are ready for read/write/connect
         ui.update();    // Update the UI a bit if needed
      }
    

The only case this would be a terrible idea is if you don't have control of
all the code, or need to interface to things that may block/crash/etc.

~~~
scott_s
No modern web browser controls all of the code, since it must execute
arbitrary JavaScript - which can block and crash.

Everyone doing something doesn't make them right. But when all major instances
of an application are implemented differently than you think is best, perhaps
you don't understand the problem as well as you think you do. Chrome, I think,
is the best browser architecture, and it looks like IE and Firefox will adapt
something similar. I think they use separate processes to manage tabs instead
of threads, but it's still parallel.

~~~
axod
>> "No modern web browser controls all of the code, since it must execute
arbitrary JavaScript - which can block and crash."

That's a silly argument. The following can't block or crash.

    
    
      while(true) {
        jsRuntime.executeInstruction();
        // Other stuff.
      }

~~~
scott_s
Execute a single JavaScript instruction at a time? I doubt the performance of
that would be acceptable. But if you're aware of any browsers doing that, I'd
like to know.

~~~
axod
If I were to write a browser right now, it's how I'd do it. You would more
likely execute a few js instructions per loop, depending on what else you have
to do in that loop also - network check, ui update, etc.

Why would there be a performance hit in doing js instructions one by one
though ;) A loop isn't expensive.

~~~
scott_s
You would have terrible cache performance, since you would constantly bounce
back and forth between the JavaScript VM and other browser code.

~~~
axod
Code is code. If you can explain that a bit more I'd be interested.

Obviously you wouldn't update the UI _every_ time you execute a js
instruction. That would be insane. I just put 1 js instruction in the loop to
have the minimum unit, in case anything else needs to be updated very quickly
at the same time - eg some animation etc

~~~
scott_s
The JavaScript VM will have a significant amount of state associated with it.
Executing a virtual instruction will require accessing that state. If that
data is not in the CPU's cache, it will cause cache misses, which stall code
progression.

If you then use that data in the cache for a while, then the cost of the cache
miss will be amortized. But what you're proposing is going back and forth
quickly between the JavaScript VM and the rest of the browser code. The
browser code will also need to bring its data into the cache, which will kick
out the JavaScript VM's data.

Since you're proposing that the JavaScript VM should do a very small amount of
work at each time, and it will likely need to bring all of its data back into
the cache each time, you will see a lot of CPU stalls.

~~~
axod
Yeah I think we have a _long_ way to go before js performance is affected by
CPU caches.

------
antirez
Instead of dealing with all this complexity, I don't understand why a simpler
approach is not used, like having a single interpreter per thread and a very
good message passing strategy between interpreters.

~~~
mahmud
You could already achieve that with OS processes and IPC. The whole point of
having multi-threading is to be able to write compact, shared-memory code with
minimal use of synchronization operators, and sharing as much code and data as
possible.

One interpreter per-thread means all side-effects have to be migrated to the
other threads to keep a consistent view of memory: guess what you will need to
do that? yep, a global lock (except this time it's across all interpreters,
instead of just one.)

~~~
yummyfajitas
If you restricted shared memory to objects explicitly declared as shared, you
wouldn't need a GIL. You'd simply need per-object locks.

For scientific computing purposes, you can often accomplish this with multiple
processes and a numpy array allocated by shmget/shmat. But I'm not sure how to
share complex objects in this way.

~~~
nailer
I'm not quite sure if I'm right here (and I'd appreciate it if another HN
reader corrected me).

But I _think_ that's how the Queue object in Python 2.6 works. The Queue
instance is locked, you _seem_ to be free to do whatever within the threads
that are consuming the queue.

The reason I'm not sure is that having a single object being locked seems to
contradict the GIL concept...

------
j_baker
This prompted me to ask this question on stackoverflow:
[http://stackoverflow.com/questions/1914605/what-does-
pythons...](http://stackoverflow.com/questions/1914605/what-does-pythons-gil-
have-to-do-with-the-garbage-collector)

I'd like it if someone could show me how the garbage collector is related to
removing the GIL.

~~~
ig1
I'm no expert in the GIL, but pretty much every widely adopted Garbage
Collection algorithm requires a "stop-the-world" phase where object references
can't be changed. Every VM has some concept of "stop points" where all user
code is suspended, but Python's GIL is much more wide-ranging than that found
in say the JVM or .NET.

~~~
Tuna-Fish
Python doesn't really have real GC, it refcounts. See the stackoverflow
answer.

~~~
jemfinch
Python refcounts _and_ it has a real mark+sweep collector for collecting
cycles. It's not a dichotomy, you know.

~~~
silentbicycle
The reference counting causes the problem here. When multithreading means that
incrementing/decrementing a reference count is no longer deterministic, you
_need_ locks, or all hell breaks loose. Adding a mark&sweep GC isn't going to
fix that.

While I'm not familiar with the specifics of Python's GC, a mark&sweep phase
is usually added to reference counting so that if there's garbage which
contains references to itself but has no external references, it will
eventually be collected. (_Garbage Collection_ by Richard Jones and Rafael
Lins is an excellent resource on GC details, btw. There's also a decent
overview in the O'Reilly OCaml book ([http://caml.inria.fr/pub/docs/oreilly-
book/html/book-ora082....](http://caml.inria.fr/pub/docs/oreilly-
book/html/book-ora082.html) )). In other words, it plugs the worst memory
leaks caused by reference counting.

How to do multiprocessor / multithread GC well is still an area of active
research. In the mean time, one simpler solution is to have several
independent VM states, each running in their own thread (or process), and
communicating via message passing. Lua makes this easy, but its VM is
considerably lighter than Python's.

------
euroclydon
If Python can't get this threading thing worked out, isn't the language going
to get left behind as parallel architecture marches onward?

~~~
cdavid
There are many ways to exploit multi-cores, multi-threading is just one of
them. Many other techniques exist. Also, one thing to realize is that if speed
really matters (like in scientific apps), you will get much higher speed
increase by rewriting some parts in C than by allowing using all the cores
from python (at least with only a couple of cores).

Finally, a point which is not often brought but is crucial in my opinion is
about C extension: the GIL makes C extensions much easier to write. That's one
big reason for python success in the first place.

------
zepolen
Why are real threads so important? Does anyone have an example where threads
would be much better than using the multiprocessing module?

~~~
mahmud
With native threads, all your threads have their own identity and they're
known to the OS task scheduler. So they can all block or run independently.
But with green threads, the OS doesn't know about your "threads"; when the
parent process is blocked, so are all the threads.

~~~
barrkel
One of the basic assumptions behind green threads is that you aren't going to
make blocking OS calls from your green threads - the interpreter should
intercept those, run them in a different thread (or ideally, run with async
I/O) so that it can schedule a different thread until the call returns.

------
ghshephard
March 2009, presuming the comments followed the creation of that article.

~~~
ZeroGravitas
No, I remember reading those comments before this revision.

I was under the impression that these Google code wiki pages were kept under
source control so you should be able to view histories etc. but I can't see
any obvious link.

Found it, the change to this section was done 33 hours ago, a diff can be
found here:

[http://code.google.com/p/unladen-
swallow/source/detail?spec=...](http://code.google.com/p/unladen-
swallow/source/detail?spec=svn937&r=935)

------
nihilocrat
At least they want to get rid of the horrible abomination known as reference
counting. It's the primary reason why I've moved on to other languages for the
sake of creating games.

