

Threading in Python - nry
http://www.nryoung.org/blog/2013/2/28/python-threading/

======
exDM69
> Generally, you should only use threads if the following is true: - Sharing
> memory between threads is not an issue.

Here's the problem. Threads are really useful only if you can share memory
between threads. If you can't share memory, you're usually better off using
many processes.

Threads in Python (ie. CPython) can still be useful for I/O multiplexing or
executing native code in background worker threads via FFI and releasing the
GIL while doing so. For I/O multiplexing, there are better options than Python
threads (select/poll/kqueue/epoll system calls and frameworks like twisted
that use them).

In most applications, threads probably should not be used in CPython/CRuby
code as they provide little performance gain compared to the complexity and
overhead they add.

~~~
lifeisstillgood
Thank you. I would even go so far as to say that except in simple cases
(downloading 10,000 images goes much faster with 100 worker threads than
serially - which is I think the origin of "dont share memory) I would say _do
not use Python_ \- or any other similar language.

Got parallel needs at your core? Look at Erlang or Haskell. If parallel or
distributed work is mission critical, go with a language that has such things
at its very soul. Python is a great language, but it is being enthusiastically
bent to do things it is not top of the class for.

Want to handle more concurrent connections per python web server? If WSGI in
Gunicorn is not enough, _stop trying_ and use a load balancer to spread work
between _more_ servers.

~~~
obviouslygreen
You are technically correct. However, there's at least one invalid assumption
at the core of this, which is that people who need to do things that fall into
the there's-a-better-language-for-this category always have the opportunity to
learn and implement a more appropriate tool.

This is almost always the case on commercial projects. Extremely few companies
and clients will be perfectly fine with "yes, I'm a Python expert, but this
would be best done in Erlang; I will need an extra week to research, learn,
and implement this on top of the month the project would otherwise take." In
most situations you either do it the way you know how to do it, eat the extra
time (not practical in most cases), or you lose the contract/job.

Of course this is specific to client work, but I think most of us are likely
doing that or something similarly limiting for at least half our waking hours,
making it fairly relevant when considering ideas like "using tool X for job A
is not a good idea when tool Y exists." It's correct but ignores too many
practical situations to be very useful advice.

~~~
lifeisstillgood
I used to think that - bit I now believe we can find the clients who want it
done right more than right now

To be fair the best way of judging this is the reverse penalty clause - so
this job must be done by June 1. Ok and if it is three weeks late as we use
erlang? A penalty of 1000 dollars a day? Wow - ok so if I am a month early you
can pay a bonus of 20,000 ? No - so perhaps we are not as time critical as we
feared ? Would you rather save 20,000 in ongoing maintence costs and general
uncertainty over how good the solution is for three weeks delay that would
likely creep in anyway?

Have I told you erlang has an uptime of 99.999 % proven over twenty years?

------
seanp2k2
Some developers, when confronted with a problem, think "I know, I'll use
threads" have two Now problems they.

<http://regex.info/blog/2006-09-15/247>

~~~
rozap
Clever. But just like with regexps, the issue arises when a programmer applies
them to every situation he/she encounters. Obviously they need not be avoided
like the plague, but rather used when the situation calls for them.

------
ramidarigaz
Where does the GIL factor into this? I thought using threads in Python gives
basically no gains unless all the heavy work is being done outside the
interpreter. I've always gone with the multiprocessing module instead.

~~~
bvdbijl
Python threads are only useful if you use C modules that handle the GIL
correctly and I/O bound stuff from the standard library, it gives no speed
boost for python code

~~~
tantalor
If not for performance, why would anybody use it? Scalability?

~~~
andrewguenther
It still provides concurrency.

~~~
tantalor
Concurrency itself is not desirable. It is a means to end such as performance.

~~~
ColinWright
This turns out not to be the case - some problems and calculations are most
naturally expressed with concurrency.

~~~
asynchrony
Could you provide an example of a calculation that is more naturally expressed
with concurrency?

~~~
scott_s
I wouldn't say _calculations_ , as that implies, to me, a mathematical kernel.
I can't think of any low-level calculations that are easier with concurrency.
But some problems are most naturally solved with concurrency - that is, it's
easier to architect the software with threads. Applications with GUIs are
sometimes like this: it may be easier to have a main application thread that
does the real computations, and a separate thread that handles all GUI work.

The pattern is that instead of one monolithic process that must know about and
do everything, sometimes it's easier to think about several independent
processes that only know about one domain, and make requests or give answers
to other processes that know about other domains.

------
eidorb
I've implemented something similar using Eli Bendersky's example [1] as a
guide. His example adds a stop event. I pass my worker thread lambdas, so that
arbitrary tasks can be carried out.

[1] [http://eli.thegreenplace.net/2011/12/27/python-threads-
commu...](http://eli.thegreenplace.net/2011/12/27/python-threads-
communication-and-stopping/)

------
ctoth
I find concurrent.futures to be a much nicer way of managing this sort of
thing in Python. It's in 3.2 I believe, and there's a backport for 2 at
<https://pypi.python.org/pypi/futures/2.1.3> You can set up as many executors
as you like, each with a given amount of threads to use for its threadpool.

------
scott_s
_You are not looking for the best optimized performance since threads share
memory within a process._

That is a non-sequitur to me. The first half I'm on board with: generally, you
use threads to _improve_ performance, but because of the GIL in Python, you
may not get the parallelism you want. If you're calling into libraries that
don't hold the GIL, then great, but that means you have to be very aware of
what's going on below you.

The second half does not follow, though. Typically, that threads share the
same address space is the entire reason we use threads over processes. And the
reason comes from improved performance: if the thread share an address space,
you don't need to copy the data. Copying data is expensive. (It also means
you're susceptible to a whole host of synchronization bugs.)

~~~
pekk
Sharing all data between threads means you're susceptible to a whole host of
synchronization bugs (in the sense of thread synchronization, not data
synchronization). Unless you use synchronization primitives like locks to
protect the shared data, which can also easily kill concurrency. It's a trade-
off.

If avoiding copying is not a top problem, then you may be wasting your time;
there's nothing wrong with using abstractions more appropriate to your
environment.

If the program scales out, it should be less important to micro-optimize
inside each process because it's so much cheaper just to use another core or
another node.

It's getting boring to hear all discussions of concurrency reduced to threads,
and threads reduced to the GIL in CPython. It's really not that simple.

~~~
scott_s
Yes, it's a trade-off, which is why I brought it up.

But my point here is that the statement the author made, as far as I'm able to
understand it, makes no sense. That is, I think he _tried_ to discuss these
issues, but I don't think he understands them well enough to do so. I think
you and I are in agreement, unless you are saying that what the author stated
does make sense.

------
stefantalpalaru
This is Python so threads don't run concurrently because of the GIL (no, using
C/C++ code where you can release it is not in the scope of this article). Save
yourself the trouble and use multiprocessing.

~~~
pekk
It's oversimplifying to say "This is Python so threads don't run concurrently
because of the GIL".

This is not a Python-language issue, it is an implementation-specific issue.
Not all implementations of Python have the GIL.

In reality, threads do run concurrently. Because in the CPython (itself
written in C) with the famous GIL, it is normal and realistic to do I/O and
heavy computation in C code that releases the GIL, enabling threads to work
concurrently. There's no reason this information shouldn't be part of
discussions on threads in Python.

That doesn't mean threads are great for everything, but the severity of the
case is easily and frequently overstated.

------
mctx

        for i in xrange(len(item_list)):
    

Could be more clearly written as:

    
    
        for _ in item_list:

------
laurenceputra
GIL? I've found that a single thread is slower than not running a thread at
all.

