

Multicore Programming in PyPy and CPython - disgruntledphd2
http://morepypy.blogspot.dk/2012/08/multicore-programming-in-pypy-and.html

======
mercuryrising
Although it adds another piece of software on top, I've been using Python with
Redis a lot lately, and it's wonderful. Here's a 'base' for building the
larger program. It doesn't make a single python instance multithreaded, but it
easily allows me to connect other computers on my network to join in the
processing fun.

    
    
        filePath = "/home/user/file"
        p = Process(target=readFile, args=(filePath,))
        p.start()
    
        numProcesses = 6
        for x in range(numProcesses):
            p = Process(target=mapper, args=())
            p.start()
    

So I have a file reader to prevent a seek storm on a large file, I read
something (like separations between tags that I want to process more), I put
the binary data in a "ready:NAME" key, with "NAME" going into a "readyList",
and things go wonderfully. It makes my data persistent without writing to
files or pickling, and I can easily share data between process instances.

My question is - what advantage does a multicore programming offer that my
take wouldn't? I can import process and chain things out over my 8 cores, but
why would I do that if I can have persistent data with the ability to more
easily add processing power when I want to?

~~~
andrewcooke
well it allows you to share in-memory, mutable state between processes. if you
have problems that don't need that then yes, parallelization is easy (the
python-only version of the above would be to use multiprocessing). but maybe
i've missed the point (seem to be making error after error at the moment...)

[edit: to clarify a bit more, the example given in the article linked is
making each execution of a loop run in parallel, even if the loop affects some
variable. so you would hope to get speedups in code that isn't written
explicitly to be executed by multiple processes - without messaging etc. in
particular, it guarantees safety while giving the chance (sometimes it doesn't
work out) for speedups.]

~~~
TillE
Message passing is usually a "better" method of IPC anyway. The situations
that genuinely require shared memory are rare.

~~~
scott_s
Perhaps in your line of work, but in scientific computing, it's quite common.
Scientific codes have parallelism at multiple levels, but loop-level
parallelism is extremely common. Simply, you have large vectors or matrices
that are operated on by a loop. The iterations of the loop are (mostly)
independent, so you can use multiple threads to do them in parallel. Assuming
shared memory, of course; the communication cost of transferring the vectors
or matrices to another process over typical IPC mechanisms would kill the
benefit of the parallelism.

OpenMP (<http://openmp.org/wp/>) for C, C++ and Fortran is the most common
means of exploiting such parallelism in the high performance computing world.

~~~
montecarl
I'm curious about what field of scientific computing you work in. I am only
familiar with MPI and hybrid MPI/OpenMP codes. I haven't come across any
OpenMP only programs ever in chemistry or physics.

If you have a loop with independent iterations then you can use message
passing or shared memory to approach your problem (as well as loop
vectorization). The main problem with message passing is that for very large
datasets you can run into memory limitations.

~~~
scott_s
I don't, really. I do system software for high performance and distributed
computing. The scientific applications are given to me, and are largely
benchmarks. (Although, now, I don't do much actual HPC, but high-throughput,
low-latency stream processing.)

I have a friend who has a galaxy simulation that uses OpenMP but not MPI. The
reason is simple: she doesn't have the expertise to make it distributed.
Slapping a few OpenMP directives on the most expensive loops is easier than
figuring out how to make it distributed using message passing. How much
parallelism you extract out of a program is often a function of how much time
and effort you can invest into it. Some people get "good enough" performance
improvements by scaling on a single node.

------
exDM69
It would be good to first make it possible to write Python code that can
execute concurrently and use plain old locks and conditions -style
synchronization. When that works efficiently, things like transactional memory
can add value, but not before.

What comes to automatic mutual exclusion "AME", that sounds pretty far fetched
to me. I don't think there's a production-worthy implementation of anything
like this out there, all parallel programming environments need some kind of
annotations from the programmer to keep stuff in sync. Is anyone aware of any
research into this kind of systems?

If anyone is interested in parallel programming features in languages,
Haskell's parallel programming facilities should be worth looking at. Not
because the language or functional programming would be some kind of magic
medicine, but simply because there's been years of research and hard work put
into making Haskell work in a concurrent environment.

~~~
zurn
So are you saying the current STM prototype should be put on hold until a
"plain old locks" Python comes along?

------
mmariani
Here's his EuroPython talk <http://m.youtube.com/watch?v=pDkrkP0yf70>

And here's another talk about an alternative approach By Robert Hancock
<http://lanyrd.com/2011/pygotham/shppw>

------
andrewcooke
this gives a route for AME on STM in CPython via the LLVM. it assumes (afaict
- it took me a while to grok the article and i'm still not sure i have it
right) that the CPython GIL will be removed. but we already have a failed
attempt to remove the GIL via LLVM (unladen swallow). so the obvious question
is - how meaningful is that failure? does it mean that removing the GIL via
LLVM is hard? how hard? because without removing the GIL, AME and STM are
pointless, right?

[edit: ok, thanks for the reply. it turns out it was dropped from the roadmap
for unladen swallow because the garbage collection library they thought would
help turned out not to - see [https://code.google.com/p/unladen-
swallow/wiki/ProjectPlan#G...](https://code.google.com/p/unladen-
swallow/wiki/ProjectPlan#Global_Interpreter_Lock) \- so i guess this question
is much less interesting. sorry.]

unrelated note / question. this is the first time i've heard that HTM is
restricted to the cache(s) (although it makes a lot of sense in retrospect).
how serious a limitation is that for other languages? is haskell planning to
use it? what about clojure (JVM - guess not)? and what happens to HTM if the
cache overflows?

AME: automatic mutual exclusion, STM: software transactional memory, HTM:
hardware transactional memory, LLVM: itself [fixed, thanks], GIL: global
interpreter lock, JVM: java virtual machine, uff.

~~~
_delirium
Acronym nitpick: LLVM originally stood for _low-level virtual machine_ , not
_little-language_. It doesn't really have a connection to the little-language
or domain-specific-language communities, but instead originates in a desire to
provide a compiler target that was lower-level than generating C code, but
less platform-specific than directly generating machine code (C-- is another
project in that space). But it's since been renamed to just LLVM, a brand that
doesn't officially stand for anything.

------
nubela
I find this incredibly interesting. I have a production server running a Flask
app, performance is not stellar, and I've been meaning to try running it with
Pypy w/ a web server, but I have yet to find a good resource for this. Anyone
has any tips? Seems kinda counterintuitive with all these performance numbers
with Pypy > Cpython when I can't even run it in production.

~~~
dalke
It's strange that you have a problem with Flask in production. What I've
heard, there shouldn't be much of a problem with it. It should easily be able
to handle 1,000 basic requests per second.

Have you done any performance analysis to figure out where the bottlenecks
might be? Have you tried the flask-debugtoolbar
(<http://pypi.python.org/pypi/Flask-DebugToolbar>)? How many requests do you
get? What's the load average of the process? Is there an un-indexed SQL query
you don't know about? In the most extreme case, have you put a debugging
sleep(1) call in your code which you've forgotten to remove?

Without knowing more about where your Flask app is being bogged down, it's not
that worthwhile to switch to pypy.

------
batgaijin
Haha wtf. After learning Haskell and seeing stuff like lparallel for CL, I
just fundamentally can't pretend this is even worth learning. The GIL and the
lack of foresight in designing Python are problems that are too fundamental to
the language.

I love python, I've used it for a long time, but using it as anything other
than for readable shell scripting is a waste of time. It's using the wrong
tool for the job. If you are worrying about speed aside from algorithms, learn
a different language.

~~~
JulianWasTaken
Do you have examples of lack of foresight?

~~~
jerf
There are many examples of things that people now call "lack of foresight",
but I think people quickly gloss over just how long Python has been around. It
started in December 1989, when the first Pentium was still three years into
the future. If Guido had tried to "foresight" his way to a language with an
acceptable multicore solution at that point (on any hardware he could afford),
the only possible result is total failure. I don't think there are many
attributes the language currently possesses that were truly that avoidable.
Most people's snap suggestions result in incurring costs to the language that
it may not be able to recover from when it is young.

Personally, I think the truth is that Python is what it is; a very, very, very
good OO language that has reached the mature stage of its life. It probably
won't be able to make the leap to "true" multicore, and besides, even if it
does it won't be likely to be very successful anyhow because it'll still be a
very slow-but-powerful language.

(There's very little point in taking a language very near the bottom of the
Shootout and adding 4 or even 8 way parallelism to it, when you could just
rewrite the target hotspot in C and get the same performance on one core. The
math just doesn't favor trying to add lots of "multicore" to such a _slow_
language. Python's one of my favorite languages, but that doesn't mean I can't
see where it is weak.)

~~~
fijal
PyPy (together with LuaJIT and tracemonkey) has been banned from shootout,
partly because we complained about unfair rules. Looking at shootout to asses
the relative performance of languages where you cannot compare high-
performance implementations of a language is a very very bad idea. Looking at
shootout to compare would very likely be a bad idea anyhow, but right now it's
just completely useless.

~~~
jerf
I was confining myself to CPython on the grounds that it is still what people
complain about in terms of adding multicore to it. People don't complain about
PyPy not having it since it's still developing, and it is not yet known
whether PyPy will be fast enough to be worth worrying about multicore. To be
honest I'm skeptical, but open to the possibility. But I rather suspect in the
end that Python will forever be a slower language than the competition; all
the dynamicness doesn't come for free.

Also the nicer primitives that IMHO the really-useful multicore languages are
building on are either impossible (Haskell) or too late (Go/Erlang) for Python
to deeply embrace due to massive legacy libraries and code. If you want to
make threading "work" in Python, go nuts, it will bring much benefit, but
Python will simply never be the first choice for tasks in which multicore
performance and safety is the first or second priority, rather than the fifth
or sixth. And that's _fine_. I'd really rather see Python become a better
Python than see it become a crappy Go or something.

~~~
fijal
PyPy (on it's good day) is within 2x mark from the equivalent C. We're
planning on closing this gap, but also we're planning on making every day the
good day (right now you kind of have to know what you're doing in case you
want to hit the sweet spot). This might be an interesting read for you:
[https://www2.cisl.ucar.edu/sites/default/files/2cameron_spar...](https://www2.cisl.ucar.edu/sites/default/files/2cameron_sparr_FINAL.pdf)

