PyPy STM work update

seunosewa · on April 27, 2012

People think optimistic STMs are faster than locks, because "locks are slow", but optimistic and pessimistic approaches are roughly equivalent when latency is low. Under low contention, both locks have low overhead, just like optimistic STMs. Under high contention, locks have high overhead, and so do optimistic STMs (due to wasted work).

Pessimistic STMs are built on locks. They make code involving fine-grained locking composable. So you can have many more locks instead of a global one, and recover gracefully from read/write conflicts and deadlocks. When latency is high, e.g. in distributed systems, optimistic strategies are more efficient. When latency is low, e.g. in SMP memory, pessimistic strategies more efficient. So pessimistic lock-based STMs are generally more efficient.

This implies that STM is not really a faster alternative to locking. It's a simply form of fine-grained locking that composes well. I believe STM will not eliminate the slowdown that results from fine-grained locking in freely-threaded Python interpreters, but it might just make them easier to write, which isn't a bad thing.

kbd · on April 27, 2012

> On a few smaller, more regular examples like richards, I did measure the performance. It is not great, even taking into account that it has no JIT so far. Running pypy-stm with one thread is roughly 5 times slower than running a regular PyPy with no JIT (it used to be better in previous versions, but they didn't have any GC; nevertheless, I need to investigate).

Oy. Armin is hopeful that STM will become for concurrency what garbage collection is for memory management. I'm really hopeful he's right. While I know there's still lots of work to be done, that kind of speed hit is, however, discouraging :(

jlawer · on April 27, 2012

Many GC strategies were tried before we settled on our current mix that seem to perform well in the current problem domains. I am sure once it is proved, someone will build a faster way.

Not to mention that Hardware Transactional Memory is coming the next generation of intel chips (Q2 2013) and like virtualization it might just be that hardware support is required to give it the boost required.

Either way, I think the research is important because its based around a real language, with real production code. All too often CS research is based around idealized languages and thus all the code tends to be coded to the capabilities of the language, instead of the code that is often suboptimal, because the developer was under the gun to meet a deadline.

megaman821 · on April 27, 2012

Even if it didn't get better. PyPy with the JIT enabled is over 5 times faster the CPython. PyPy with JIT and STM enabled might be the same speed as CPython but with the advantage of being able to use multiple cores.

seunosewa · on April 27, 2012

It's 5 times slower than the PyPy interpreter, which is probably slower than CPython.

sp332 · on April 27, 2012

Even if it's 5 times slower on one core, it might make scaling to 16+ cores possible in some cases where it wasn't before. That at least gives you the option of throwing hardware at a problem to speed it up.

dmpk2k · on April 27, 2012

You'd likely be better off with multiple processes then.

There's no reason to suffer 16 cores performing like three (assuming no contention), when you can get much closer to linear with existing, mature, and well-understood technologies.

seunosewa · on April 27, 2012

The last attempt to remove the GIL was just 2 times slower on one core, and it was rejected.

oconnore · on April 28, 2012

Do you have a link?

That seems incredibly shortsighted.

sanxiyn · on April 28, 2012

There were many attempts. The last one I remember is http://code.google.com/p/python-safethread/