
Python's Hardest Problem (2012) - hartleybrody
http://www.jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/
======
pekk
This was FUD when it was posted and now it is age-old FUD which there's no
point in re-posting.

Jython and IronPython do not have a GIL. Multiprocessing avoids the GIL.
Blocking on I/O gives up the GIL. There are all kinds of techniques used
instead of throwing threads naively at every problem. And, conveniently, none
of this is mentioned in the article. Either the author was not aware of these
basic facts, or suppressed them.

It is blatantly false that "no single issue has caused more frustration or
curiosity for Python novices and experts alike than the Global Interpreter
Lock." The author may consider it important, but this does not mean that
author is speaking for everyone else.

Novices would have good reason to avoid shared-everything threading, which
introduces piles of race conditions and difficulty controlling runaway
threads, and should try simpler tools first and see whether they can get good
results instead of prematurely optimizing with techniques they don't know how
to use.

Experts will know that the GIL is often not a primary concern, and where it
actually is a concern they'll be conversant with other tools like
multiprocessing and task queues.

The people with the most to say about the GIL are mediocre programmers who
want to show off that they are so good Python is limiting them, and people not
very familiar with Python (possibly with background in languages which try to
make threads the answer to everything) who have an axe to grind.

Instead of asking how it is to do what they want to do, they just assume that
the problem is the GIL and there is no solution, then expect to be praised for
their technical acumen. People with technical acumen just solve the problem in
any of the available ways instead of bitching in public about how it's the
tool's fault they can't solve the problem because they have defined the
problem incorrectly and insist on some arbitrary way of doing it.

~~~
jknupp
I'm the original author (and didn't post this) but I fail to see how the
article is FUD. In fact, the article makes the exact same point as your fourth
paragraph ("Novices would..."). Controlling access to shared data often does
lead to issues for many programmers, and just throwing threading at a problem
is rarely a good idea.

I'm sorry you didn't find the article useful, but perhaps you're not in the
article's target audience. I simply wanted to give Python novices some
information and background about the GIL.

That said, the fact that other implementations do not have a GIL isn't
relevant to the article; it specifically refers to the cPython implementation.
And your observation that multiprocessing avoids the GIL is explicitly
mentioned in the article. To say "blocking on I/O gives up the GIL" is true in
a very narrow sense but not very interesting. Import any third party package
using C extensions and you now need to worry about how well the author manages
the GIL.

~~~
goostavos
Ignore the negativity train on Hacker News as of late. I thought it was a very
well written and interesting article.

I was once one of the very noobies that your article describes. In fact, I
even made the same Stack Overflow post asking why threading made my program
slow to a crawl. I did then (thanks to a friendly answer) learn about
multiprocessing rather than threading, but I never actually got around to
learning -- or even thinking about!-- what exactly the GIL is and why it makes
threading terrible for CPU bound tasks.

Point being, I enjoyed it.

~~~
Luyt
_"I never actually got around to learning -- or even thinking about!-- what
exactly the GIL is"_

David Beazley does a good job on explaining the GIL. There are multiple videos
of his GIL talks on YouTube. This one, for instance:
<http://www.youtube.com/watch?v=Obt-vMVdM8s>

He has a page about this at <http://www.dabeaz.com/GIL/>

------
revelation
I've written native C modules interacting with EVE online, probably one of the
biggest Python systems in deployment, and they run on stackless python with a
bazillion of threads for every little thing. While still needing to hit that
50fps window.

The GIL was not a problem.

~~~
jknupp
...because they run on Stackless Python.

~~~
kbutler
This is a common misconception about Stackless Python.

Stackless still has the GIL - it facilitates concurrent, not parallel
programming. Stackless Python programs run on a single core, with cooperative
task switching between microthreads.

See [http://stackoverflow.com/questions/377254/stackless-
python-a...](http://stackoverflow.com/questions/377254/stackless-python-and-
multicores)

~~~
asperous
It's not concurrent either. It's about asynchronous. Tasks that don't finish
before another one starts and runs.

Why would you want this? From the website: * Improved program structure. *
More readable code. * Increased programmer productivity.

~~~
kbutler
This is "concurrent" in the increasingly common Erlang sense of the word:

"The real world is ’concurrent’. It is made up of a bunch of things (or
actors) that interact with each other ina loosely coupled way with limited
knowledge of each other." -- Grant Olson "Why Stackless" 2006

"Concurrent computing is a form of computing in which programs are designed as
collections of interacting computational processes that may be executed in
parallel." -- <http://en.wikipedia.org/wiki/Concurrent_computing>

------
estebank
> "And why has no one attempted something like this before?"

"However, if you've been around the Python community long enough, you might
also know that the GIL was already removed once before--specifically, by Greg
Stein who created a patch against Python 1.4 in 1996." (Also mentioned in the
OP)

More info can be seen at [http://dabeaz.blogspot.nl/2011/08/inside-look-at-
gil-removal...](http://dabeaz.blogspot.nl/2011/08/inside-look-at-gil-removal-
patch-of.html)

~~~
georgemcbay
I don't know a lot about the history of Python and the GIL, but did they ever
consider optionally removing the GIL or leaving it in depending upon some
switch? Go has GOMAXPROCS which defaults to 1 to avoid the overhead of locking
when only a single thread will be used for similar purposes.

I guess back in 1996 a system like that may have been considered overkill
because multithreading was still pretty exotic.

~~~
somejan
Having a look at the source, you get this for free in python. If you don't use
any threads the overhead is just the checking of one (non-atomic) variable in
the main eval loop, so that is at least as cheap as GOMAXPROCS=1 in Go. There
is a small overhead of dropping/taking the gil around I/O functions, but it
isn't worth optimizing the few nanoseconds that takes for the much slower i/o
functions.

But if even this overhead is too much you can compile python without treading
support.

~~~
somejan
Ok, the above is not really relevant, I misread the parent comment. Why the
gil is not completely removed or made optional can be read at
<http://wiki.python.org/moin/GlobalInterpreterLock>

------
gizmo686
>using multiple threads to increase performance is at best a difficult task.

This isn't completely true. If you are doing anything non CPU bound, using
threads is trivial, as the GIL will allow you to perform IO in parallel.

~~~
lmm
Performing I/O already releases the GIL.

(Not to mention that Python has many event-driven I/O options available which
are generally more efficient than threading)

------
rdtsc
> Due to the design of the Python interpreter, using multiple threads to
> increase performance is at best a difficult task. At worst, it will decrease
> (sometimes significantly) the speed of your program.

Nope. The writer sounds misinformed and is spreading FUD.

I have successfully used Python's threads to perform concurrent database
fetches, http page getters, file uploads in parallel. Yes, there was almost
linear speedup.

If you listen to this story it sounds like Guido and most other talented and
smart Python contributors added threads to Python just to fuck with people
heads -- "thread don't work but let's add them anyway! just to mess with
them!". Nope they added them because there are many cases when they work.

The answer is if you handle concurrent I/O Python's threads will give you good
speedup. Threads are real OS threads and come with nasty side-effects if using
shared data structures, but make no mistake you will get the speedup.

Your mileage may very and everyone is probably biased and has a different
perspective, but where I am coming from in the last 10+ years I have written
mostly I/O bound concurrent code. There were very few cases where I hoped to
use extra CPU concurrency.

Now I did have to do that a couple of times and if you do have that issue,
most likely you'd want to descend down to C anyway. Which is what I did. Once
in C you can release the lock so Python can process in parallel and your C
extension/driver can process in parallel. This is exactly what I did.

Now wouldn't it be nice if Python had CPU level concurrency built in. Yes it
would be great. But I don't think that is the #1 issue currently. We still
don't have 16 cores on most machines.

    
    
       #define RANT
    

What worries me is library fragmentation and the new Python 3 adoption (or
lack of) now coupled with the new Async IO Future/Promise/Deferred framework
introduction. That will harm Python faster and worse than GIL ever did.
Adopting and standardizing a Twisted like approach to Async IO will put the
nail in Python's coffin. And Guido is certainly marching in that direction.
This will fragment existing (already rather fragmented) libraries. Now we'll
have Twisted, Tornado, gevent, eventlet, asyncore, Threads, new Promise/Future
thingie (anyone know of more?) ways of doing concurrent IO and every time you
pick a library (unless you use threads, gevent or eventlet + monkey patching)
you will end up choosing a whole new _ecosystem_ of frameworks.

I remember for years scouring the web for a Twisted version of an already
existing library,because I had made the mistake of picking Twisted as the I/O
concurrency framework. Regular library module is available, oh but I need to
return a Deferred from it of course, in order for me to use it.

    
    
       #undef RANT

~~~
lmm
The idea of including an async API ( _not_ an implementation) in the standard
library is that it will enable compatibility by having
twisted/tornado/gevent/etc. implement this interface, and async libraries to
conform to this interface. It's trying to solve the same problem you identify.
(Personally I'm not terribly hopeful that it will work, but what would you
suggest?)

------
aba_sababa
This a great overview - nicely done! It would have been nice to also mention
how other implementations of Python like Jython that _don't_ have the GIL, and
how they managed to do it.

As for why it hasn't been solved yet...the api for threads and processes is
pretty much identical. Since you're just as well off using a process in the
majority of cases, that's we we go with.

~~~
jknupp
Author here.

Actually, the reason it hasn't been "solved" yet has much more to do with the
cPython implementation rather than the fact that we can just use
multiprocessing. There is a ton of globally shared data in the cPython
implementation. Retrofitting a locking scheme granular enough to obviate the
need for the GIL while at the same time not negatively impacting single-
threaded performance is decidedly non-trivial.

The PyPy guys are making decent progress by attacking the problem from another
angle: using software transactional memory to automatically resolve data
conflicts that arise from multiple threads mutating data simultaneously.

------
montecarl
This article is written as if shared memory is the only parallel programing
paradigm. While it is true, that threads are a very useful construct for
writing high performance parallel software, distributed memory programming is
also a valid approach.

If your problem can map to distributed memory techniques, then you have
multiple advantages over shared memory programming. Most importantly you can
parallelize over multiple machines. Other advantages include decoupling of
each parallel task from each other (fewer race conditions and other hard to
debug problems).

There are several ways to achieve distributed memory parallelism in Python:
multiprocessing, zeromq, raw tcp/ip sockets and mpi4py. Which approach makes
sense to use will depend on your problem.

------
ChuckMcM
Neat discussion on this. Its interesting to look at stuff where threading was
front and center (Go, Java, Etc) vs where threading was not (Python, Perl,
Etc). The arguments are something you should get an introduction to at least
with a CS degree and one of the things people without that explicit teaching
develop by feel.

Concurrency is the 'tricky bit' of the 'algorithms' pillar [1].

[1] The Four arbitrary pillars of computing (algorithms, languages, systems,
and data structures) I

~~~
eru
Aren't algorithms and data structure one and the same? (If you squint just
right..)

------
peripetylabs
I'm glad I came across this article. I'm learning Python and was given that
advice to use multiprocessing rather than threading, but hadn't researched
why. Very informative, thanks for sharing.

~~~
pjscott
Well hold on now; there are a lot of times when using threading is easier and
faster than using multiprocessing. It depends on what you're doing.

Threading creates new OS-level threads, but whenever your code is being run by
the bytecode interpreter, Python holds the global interpreter lock. This is
released during I/O operations and a lot of the built-in functions, and you
can release it in any C or Cython extension code you write. If you're running
into Python speed bottlenecks, you can usually get significant speedups with
very little effort by moving the bottlenecky code to Cython and maybe adding a
few type declarations.

Multiprocessing spawns a pool of worker processes and doles out tasks to them
by serializing the data and using local sockets for IPC. This naturally has a
lot of overhead, and there's some subtlety with the data serialization. So, be
aware of that. The nice part, though, is that you don't have the GIL, which
can sometimes speed things up.

------
brass9
What about Google's Unladen Swallow project? I'm aware their attempt to remove
GIL was aborted. Did any enhancements from those projects find their way into
the mainline CPython3k?

------
conover
Is there really that much Python code out there that is not IO bound
(obviating the GIL)? Scientific computing is the only area that comes to mind
where I can imagine problems.

~~~
pjscott
I've written a fair amount of CPU-bound Python code. It quickly turned into
CPU-bound Cython code. The most CPU-heavy parts release the GIL. It hasn't
really been a problem.

------
Ihmahr
I _love_ to use the Python parallel map function.

------
MostAwesomeDude
Disagree; Python's hardest problem is packaging, followed closely by
bikeshedding over networking libraries and an ever-growing dichotomy between
the Python 3 and Python 2 universes of code.

You can _skip_ all conversation about the GIL and threads neatly by simply
preferring a different concurrency model. There are plenty of ways to do this
(see above re: bikesheds and their colors), but being permanently tied to
CPython and the threading module is increasingly uncommon for professional
Python, and it isn't as unavoidable as things like networking or even _which
language you're going to use_.

Edit: I see that the author's in this thread. Nicely written article, but a
tad hyperbolic.

~~~
parfe
The transition was always planned to take several years and I can't think of
any major project which decided not to port to python 3.
<http://py3ksupport.appspot.com/> and <https://python3wos.appspot.com/> both
list package compatibility.

What issues do you see with python2 to 3 transition?

~~~
thezilch
To be fair, the Python 3 transition timer shouldn't have had its tick starting
until more recent versions of Py3. This won't be true for all Python
communities (eg. Mathematics / Research), but until the more sane
implementation of bytes/strings/unicode, the transition for most web
frameworks or protocols was a mess.

However, from the later list, there are some really damning packages. For one,
we're going to have transition off Twistd, which is much more than just an
event loop. Py3 might not have enough advantage(s) to compel us to make that
transition.

~~~
parfe
<http://twistedmatrix.com/trac/milestone/Python-3.x>

So, just wait to migrate until twisted is ready?

~~~
thezilch
Honestly, I didn't check the dozen-plus packages pending update for us and
then someone else cutting their teeth on it for a while, but Twisted was
clearly marked as not being updated for Py3. But yes, clearly we're waiting;
apparently Guido also wants to take a stab at the async batteries, which could
help or hurt, if the Twisted contributors think they will be affected.

To the point of the comment you first responded to, there is lot of issues not
being "solved" with respect to network libs, their upgrading to Py3, and Py3
looking to step into that arena. In other words, there is no horizon for
networking libs updating, and they may feel even less motivated if the Py3
crowd is saying, "don't worry about it, because we are going to standardize
our way." Does the Py3 transition-clock only start then?

~~~
parfe
I'm not really sure what your issue is. You use twisted. Twisted is currently
being updated to support python3. Gevent has a branch which passes tests for
python 3.3: <https://travis-ci.org/fantix/gevent/builds/3588293> Is there some
major library you are using that has no plans for python 3?

I'm not actually seeing an issue with python 2 -> 3 transition which is what
my original comment was asking about. I see a lot of negativity surrounding
python 3 and talks of fracture, but no actual evidence of it.

