Hacker News new | past | comments | ask | show | jobs | submit login
PyPy-STM: first “interesting” release (morepypy.blogspot.com)
242 points by rguillebert on July 5, 2014 | hide | past | web | favorite | 19 comments



Very cool.

I am interested to see how this progresses. It seems to me there is lots of optimisation possible to alleviate the cost of contention with greater number of threads. Single thread performance will always have to suffer somewhat but their cheap read barriers have done wonders already to bring that down to an acceptable level. It will reach a point however where the only way to speed it up further will be for the programmer to be able to specify STM hints for variables.

I would like to see this in Java or C# where there is more room for extended optimisation.

Kudos to the PyPy guys for sticking to their guns on this.


Java and C# don't have a GIL though, and have good support for threading right?


Linked list for single processor is a job for first year student.

Linked list for multiprocessors with mutexes and locks is a PhD-level job.

Linked list for multiprocessors with STM is a job for first year student again.

The quote is from one of Simon Peyton-Jones talks.


Thanks - that quote immediately shifted my thinking.


STM is useful even if you have good threading support. You could easily build a STM library for Java or C#, and there are some, but without syntax support it's pretty cumbersome to use.

Clojure has good STM support because it's easy to add the necessary primitives to a Lisp.


Precisely. :)

STM's advantages are not so much in -enabling- threading, but instead allowing for new paradigms (possibly simpler ones).


Java (and probably also C#) already has mature STM implementations. Here is one, but there are certainly others:

https://sites.google.com/site/deucestm/

As noted in the other comments, you don't need STM for parallelism in Java and C# because they have access to native threads. But the programming model may still be nicer to use than the standard threads-with-shared-memory programming model.


Standard CPython also has access to native threads. The Python threading module is just a thin wrapper over pthreads. The issue is that the GIL prevents threads from exploiting multiple cores.


That raises a few questions :

1- will this takes us closer to a golang like way of working in parallel ( since that means no GIL) ?

2- Why did dropbox / guido didn't join forces with pypy, since they already had proved that their approach was very efficient performance wise ? Is it because pypy doesn't deal with native C code at all, whereas the python LLVM approach they chose may let the two work better ?

3- does anyone else feel like the python world is a complete mess right now ?

EDIT : didn't want to sound harsh, on the contrary. Seing so many smart people work on separate path on great projects for so long feels like some great energy is going to end up a waste of time.


1. Yes, thanks to Armin's great work, this improves language support for parallelism. It doesn't improve support for concurrency patterns (like coroutines and channels). Python now has concurrent.futures, yield from, and the (overly complicated) asyncio library, which provide a toolkit for concurrent programming patterns, but it's not really beautiful nor a joy to use. That's part 2 of Armin's proposal, and he's asking for donations to fund it (see the link above).

2. Until recently PyPy was still regarded as an experimental side project, and had memory use limitations that prevented it from being a drop-in CPython replacement. CPython is a simpler codebase than PyPy and supports more targets and environments. PyPy does deal with native C code via cffi, but there is a lot of code that uses CPython-specific C bindings that will need to be updated to run on PyPy.

3. I think the Python world is doing great. The Python 2 to 3 and CPython to PyPy migrations are largely orthogonal, and will coalesce to "PyPy 3000 with STM" soon enough. Once that happens, I think that will serve as a really powerful foundation for future improvements. (Writing code that runs on both 2.7 and 3.3+ is not that hard, by the way.)


>1- will this takes us closer to a golang like way of working in parallel ( since that means no GIL) ?

I doubt it. The asyncio module was an attempt at this but the API isn't that great and it also isn't closely integrated with the language like goroutines are in Golang. This is still a great improvement though, as is PyPy in general. Basically, currently existing programs should run faster, and people will use the multiprocessing module less and the threading module more. It may change the way lower level parallel code is written, and library maintainers may be able to simplify some things.

>2- Why did dropbox / guido didn't join forces with pypy, since they already had proved that their approach was very efficient performance wise ? Is it because pypy doesn't deal with native C code at all, whereas the python LLVM approach they chose may let the two work better?

Probably a mix of "NIH" and also the belief that Dropbox has some of the best Python devs (and BDFLs) out there, so surely they can create the best solution. We'll see in the next few years if it plays out for them.

>3- does anyone else feel like the python world is a complete mess right now ?

Not that much. It's a big mess if you're trying to maintain a large open source project and need to keep retaining compatibility from 2.6 to 3.4+, but as a regular developer it's not too terrible. Personally I am still going to stick with 2.7 for personal projects for years to come, though.


For 3, even that isn't terrible in a lot of cases. All of OpenStack (AFAIK) maintains py26, py27 and py33 support (and one of the projects I work on aims for pypy as well, but isn't gating on it yet).


It's not that terrible, but it can be annoying. And arguably worst of all, it can make code a lot uglier than it needs to be.


1. Well -- unless I misunderstand, they use STM to emulate a GIL (while actually letting stuff be done concurrently). Which means that it will still be a bit of a hodgepodge messy disaster.

2. I can't speak for Dropbox, but I worked in a shop that bet their future on Python -- and we spent a good bit of effort looking at PyPy ... it was a nightmare if anything went wrong -- a horrific impossible to debug mess.

3. Yep, I added Python to my "won't work on" list of languages (with Cobol, Progress and MUMPS).


What sort of problems did you run into? Did you submit bug reports? I don't think finding a bug in say V8 JIT will be easy to debug for example (or any advanced VM).


Neat stuff.

Not sure why they have "(insert LLVM snark here)" when the docs say "At the LLVM IR level, the behavior of these special address spaces depends in part on the underlying OS or runtime environment, and they are specific to the target (and LLVM doesn't yet handle them correctly in some cases)."

Expecting that to work without fighting bugs seems "optimistic" to say the least :)


Awesome news. I'm kind of scared to think how much Python code is written to assume the GIL though. I have a feeling a lot of libraries and code will have very subtle race conditions and issues, but it's still great to have an option for getting around the GIL.


I think the big idea is that this STM only attempts to commit at points where the GIL would have traditionally been released anyway - after so many bytecodes or some other condition. The linearisation of the memory writes of the different threads should be the same as if there was a GIL.


PyPy already has custom python code for different bits, like the crypto library - because this code was written in c for cpython (and the c code gets around the GIL, funnily enough). So PyPy may be required to make more custom code for it all to work, though if it tries to emulate the GIL as specified below, probably not.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: