I’ve also had great success using pypy to accelerate preprocessing steps (when they don’t rely on incompatible c libraries) for machine learning pipelines. It’s the most painless performance enhancement trick in my toolbox before I reach for concurrency (in which case I reach for joblib or Dask).
The one oddity I’ve noticed is that using tuples (including things like named tuples) often speeds up CPython by a lot, but even plain tuples can slow down pypy on the same code—in some cases pypy winds up slower than CPython.
In any case, I’m low key in love with pypy, even though I can’t use it for _everything_.
I'd love for this to be addressed. namedtuple should be lowered into a fixed sized struct, great opportunity for PyPy to show wins in both memory, memory bandwidth and compute. Add in type stability and PyPy should approach native speeds just by adding in namedtuple.