Hacker News new | past | comments | ask | show | jobs | submit login

I have turned to pypy quite extensively for pure-Python text processing tasks, where I often get a 10-100x speed up just by changing the command line invocation. For example, I wrote a proof of concept Rabin-Karp hashing approximate string matching algorithm to perform plagiarism analysis while I was working at Udacity in around 2017. The system never went into production, but pypy was super helpful in crunching all historical user submissions for analysis.

I’ve also had great success using pypy to accelerate preprocessing steps (when they don’t rely on incompatible c libraries) for machine learning pipelines. It’s the most painless performance enhancement trick in my toolbox before I reach for concurrency (in which case I reach for joblib or Dask).

The one oddity I’ve noticed is that using tuples (including things like named tuples) often speeds up CPython by a lot, but even plain tuples can slow down pypy on the same code—in some cases pypy winds up slower than CPython.

In any case, I’m low key in love with pypy, even though I can’t use it for _everything_.




> The one oddity I’ve noticed is that using tuples (including things like named tuples) often speeds up CPython by a lot, but even plain tuples can slow down pypy on the same code—in some cases pypy winds up slower than CPython.

I'd love for this to be addressed. namedtuple should be lowered into a fixed sized struct, great opportunity for PyPy to show wins in both memory, memory bandwidth and compute. Add in type stability and PyPy should approach native speeds just by adding in namedtuple.


I created an issue [0] to address this. Could you add a microbenchmark that demonstrates the problem?

[0] https://foss.heptapod.net/pypy/pypy/-/issues/3373




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: