
PyPy Status Blog: Multicore Programming in PyPy and CPython - spdy
http://morepypy.blogspot.de/2012/08/multicore-programming-in-pypy-and.html
======
mattj
First off, I really, really like pypy. It's been slowly taking over my data-
massaging-type tasks, and consistently yields 5-10x speedups over cpython.

That being said, I'm a little nervous about all the energy being exerted on
novel multicore stuff. Pypy has always been somewhat of a research project,
but it's so close to being usable in a mainstream setting. I'd love to see
them getting it over the final few hurdles (an out of the box as-fast-as-
simplejson json module being a major problem, better docs for performance
tuning being the two I've noticed).

~~~
dripton
You don't get to pick what other people work on, unless you pay them.

PyPy is quite friendly to new contributors, so if you want to help speed up
JSON or improve docs, visit #pypy on freenode and offer to pitch in, and I'm
sure some core developers will do their best to help you help them.

Or if you really want a feature but don't have time to work on it, see if any
PyPy devs are accepting contract work. I suspect speeding up JSON wouldn't
take one of the experts long, so it probably wouldn't cost you much.

Docs are more of a moving target: no matter how much you write, users will
always want more (and then some of those complaining users will fail to
actually read what you do write). There was a tutorial session at PyCon US
this year about using PyPy to speed up Python code, and the video is on
Youtube.

~~~
jeffdavis
"You don't get to pick what other people work on, unless you pay them."

Tangent: I donated a small amount of money to two PyPy projects (STM and
py3k). But I didn't see the donation bars move at all -- at least in the case
of STM, I'm almost positive that it didn't move a single dollar after my
donation. Do they update those things? If not, it's kind of hard to tell what
is sufficiently funded to succeed and what is not.

~~~
fijal
We don't have a direct access to the funding buttons (it would require quite a
bit of work, probably doable though), so we update it every 2 weeks (roughly)
when it lands on the bank account. That way we don't have bounced payments to
deal with either (and it works easier). Sorry if you're used to real-time
donations, but banking is not real time.

~~~
jeffdavis
Hmm... I figured that at the beginning, but I made the donation around June 10
and I could swear the number for STM is exactly the same as back then.

Maybe I should try to contact them and make sure the donation wasn't lost? But
that seems strange, surely others have donated during that time as well.

~~~
fijal
updated today. sorry for the lag

------
monopede
As I understand this, this wouldn't automatically protect you from race
conditions. If you have a shared variable x then a statement like

    
    
      x += 5
    

may be fine depending on whether it is implemented as a single bytecode
instruction or not. However, more complicated updates are still subject to
races:

    
    
      x = somefunction(x)
    

It is only safe if you use:

    
    
      with thread.atomic:  x = somefunction(x)
    

Having a serialisation doesn't mean it's the right serialisation.

That said, the fundamental problem is shared mutable state, and I don't see an
easy way around that in Python. In that sense, this is probably easiest to
work with.

~~~
apendleton
Right, this is no different than it would be programming on current Python
with the GIL. There are two different levels of locking in current Python
implementations: locks within the implementation of the interpreter (the GIL
for cpython and current pypy, or all the micro-locks in Jythin), and the locks
accessible from the Python environment by Python programmers (e.g., the locks
in the threading module: <http://docs.python.org/library/threading.html#lock-
objects> ). The pypy STM project attempts to replace the first set to allow
for simultaneous multi-core use while making it seem to the programmer like
the GIL is still there, but if the programmer needs things like a specific
execution order, it will need to managed by hand, the same way it always has.

------
radarsat1
> _We would have to consider all possible combinations of code paths and
> timings, and we cannot hope to write tests that cover all combinations._

Not to take away from this fantastic article, but I've felt for a while that
this particular point is bullshit. Surely it's possible to write tools that
do, indeed, test all possible code paths. I'm not up on the literature, but
don't there exist proofs for unsynchronized lockless algorithms in certain
specific cases? If so, it would be interesting to think about how this could
be generalized.. I imagine that much of the literature on atomic transactions
must touch on this kind of stuff, no to mention networking protocols that may
receive messages out of order or deal with unpredictable timing, etc.

There should be tools that we automatically reach for, like gdb, as soon as we
introduce threading and need to check if we did it reliably. Moreover, such
tools should help you find problems that may not occur on your system, but
might be theoretically a problem on other systems, just like compiler warnings
that warn of dependencies on pointer size.

------
justincormack
Surely there will be a huge benefit in using the hardware TM support if the
implementation does not suck. It is integrated with the cache so should work
much better. Structuring stuff to make this work seems a win.

