Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is anyone using PyPy for real work?
573 points by mattip 4 months ago | hide | past | favorite | 180 comments
I have been the release manager for PyPy, an alternative Python interpreter with a JIT [0] since 2015, and have done a lot of work to make it available via conda-forge [1] or by direct download [2]. This includes not only packaging PyPy, but improving on an entire C-API emulation layer so that today we can run (albeit more slowly) almost the entire scientific python data stack. We get very limited feedback about real people using PyPy in production or research, which is frustrating. Just keeping up with the yearly CPython release cycle is significant work. Efforts to improve the underlying technology needs to be guided by user experience, but we hear too little to direct our very limited energy. If you are using PyPy, please let us know, either here or via any of the methods listed in [3].

[0] https://www.pypy.org/contact.html [1] https://www.pypy.org/posts/2022/11/pypy-and-conda-forge.html [2] https://www.pypy.org/download.html [3] https://www.pypy.org/contact.html




I'm using pypy to analyse 350m DNS events a day, through python cached dicts to avoid dns lookup stalls. I am getting 95% dict cache hit rate, and use threads with queue locks.

Moving to pypy definitely speeded me up a bit. Not as much as I'd hoped, it's probably all about string index into dict and dict management. I may recode into a radix tree. Hard to work out in advance how different it would be: People optimised core datastructs pretty well.

Uplift from normal python was trivial. Most dev time spent fixing pip3 for pypy in debian not knowing what apts to load, with a lot of "stop using pip" messaging.


Debian is its own worst enemy with things like this. It’s why we eventually moved off it at a previous job, because deploying Python server applications on it was dreadful.

I’m sure it’s better if you’re deploying an appliance that you hand off and never touch again, but for evolving modern Python servers it’s not well suited.


Yes 1000x What is it with them which makes them feel entitled to have special "dist-packages" vs "site-packages" as is the default? This drives me nuts, when I have a bunch of native packages I want to bundle in our in-house python deployment. CentOS and Ubuntu are vanilla, and only Debian (mind-boggingly) deviates from the well-trodden path.

I still haven't figured out how to beat this dragon. All suggestions welcome!


> What is it with them which makes them feel entitled to have special "dist-packages" vs "site-packages" as is the default? This drives me nuts, when I have a bunch of native packages I want to bundle in our in-house python deployment. CentOS and Ubuntu are vanilla, and only Debian (mind-boggingly) deviates from the well-trodden path.

Hi, I'm one of the people that look after this bit of Debian (and it's exactly the same in Ubuntu, FWIW).

It's like that to solve a problem (of course, everything has a reason). The idea is that Debian provides a Python that's deeply integrated into Debian packages. But if you want to build your own Python from source, you can. What you build will use site-packages, so it won't have any overlap with Debian's Python.

Unfortunately, while this approach was designed to be something all package-managed distributions could do, nobody else has adopted it, and consequently the code to make it work has never been pushed upstream. So, it's left as a Debian/Ubuntu oddity that confuses people. Sorry about that.

My recommendations are: 1. If you want more control over your Python than you get from Debian's package-managed python, build your own from source (or use a docker image that does that). 2. Deploy your apps with virtualenvs or system-level containers per app.


Dist packages is the right way to handle Python libs. You'd prefer to have the distro package manager clashing with Pip? Never knowing who installed what. Breaking things when updates are made.


I usually make a venv in ~/.venv and then activate it at the top of any python project. Makes it much easier to deal with dependencies when they're all in one place.


i am a big fan of .venv/ -- except when it takes ~45 mins to compile the native extension code in question -- then I want it all pre-packaged.


At this stage [0], uncompiled native extensions are not yet a bug, but a definite oversight of the maintainer. They should come as precompiled wheels

[0]: https://pythonwheels.com


Honestly I don't think I've ever used a precompiled package in Python. Every single C stuff seems to take ages and requires all that fun stuff of installing native system dependencies.

Edit: skimming through this page, precompiling seems like an afterthought, and the linked packages don't even seem to mention how to integrate third-party libraries. So I guess I can see why it doesn't deliver on its promises.


Probably a function of the specific set of packages you use, or the pip options you specify. Pretty much all the major C packages come as wheels these days.


They all come as wheels, they just aren't precompiled.


I honestly can't remember the last time I had to compile anything, and I am on Windows.


Can you link one that comes as a wheel but is really a source distribution?


You can try pip install pillow for a good example of how it works. I suspect there's a strong survivorship bias here, as you'd only notice the packages that don't ship with wheels.


Yeah, perhaps. One I remember from last year is the cryptography and numpy package, for instance. Now they do seem to ship with binary wheels, at least for my current Python and Linux version.

Kerberos and Hadoop stuff obviously still doesn't, though. I guess the joke's on me for being stuck in this stack...


In order for a wheel to be used instead of a source distribution there needs to be one that matches your environment. For numpy you can take a look at the wheels for their latest release[1]. The filename of a wheel specifies where it can be used. Let's take an example:

numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

This specifies cpython 3.9, linux, glibc 2.17 or higher, and x86_64 cpu. Looking through the list you will see that the oldest cpython supported is 3.9. So if you are running with an older version of python you will have to build from source.

I just learned a bit more about this recently because I could not figure out why PyQt6 would not install on my computer. It turned out my glibc was too old. Finally upgraded from Ubuntu 18.04.

[1] https://pypi.org/project/numpy/1.25.2/#files


Try `--only-binary :all:` to force pip to ignore sdist packages, might help avoid those slow compilations.


It's a good idea to be caching sdists and wheels — for resilience against PyPI downtime, for left-pad scenarios, and even just good netiquette — and for packages that don't have a wheel for your environment, you can fairly easily build that wheel yourself and stick it into the cache.


second this and it's what I do on all Linux distros, just run it inside .venv as the site-installation.

if you need extra dependencies that pip can not do well in the .venv case, Conda can help with its own and similar site-based installation.

I don't know how it is different in the python installation case between ubuntu and debian, they seem the same to me.


IMO bespoke containers using whatever python package manager makes sense for each project. Or make the leap to Nix(OS) and then still have to force every python project into compliance which can be very easy if the PyPy packages you need are already in the main Nix repo (nixpkgs) or very difficult if depends on a lot of uncommon packages, uses poetry, etc.

Since PEP 665 was rejected the Python ecosystem continues to lack a reasonable package manager and the lack of hashed based lock files prevents building on top of the current python project/package managers.


dist packages are a must for software written in Python that is part of the distribution itself.


You're not really answering why they are important?

Is it because .deb packages will install inside dist-packages and when you run pip install as root without a virtual env, it installs inside site-packages?

I don't really see how this helps though? Sure you won't get paths to clash between the two but you still have duplicate packages which is probably not what you want..


Debian ships packages with a coherent dependency structure that crosses language boundaries. You don't need to care what language something is written in to be able to "apt install" it. The expectation is that if it "apt installed" then it should Just Work because all the required dependencies were also pulled in from Debian at the same time.

Debian also tries to ship just one version of everything in a single distribution release to reduce the burden on its maintainers.

This is fundamentally at odds with pip. If you've pip installed something, then that'll likely be the latest version of that package, and in the general case won't be the version of the same thing that shipped in the Debian release. If there exist debs that depend on that package and they are shared between pip and debs, now the deb could be using a different version of the dependency than the deb metadata says is acceptable, leading to breakage.

Another way of putting this: it shouldn't be possible for you to pip upgrade a dependency that a deb shipped by Debian itself relies upon. Because then you'd creating a Frankenstein system where Debian cannot rely on its own dependencies providing what it expects.

This is fixed by having two places where things are installed. One for what the system package manager ships, and one for your own use with pip and whatever you want to do. In this sense, having duplicate packages is actually exactly what you want.


Yep, I screw things up all the time with packages in homebrew that are written in python, when I forget to switch into a virtual env before doing stuff with pip. Debian's solution seems very sensible. And it is the same solution as homebrew, I suppose, as long as you don't interact with any of the homebrew-installed packages via pip. But I find it quite easy to accidentally do that.


  export PIP_REQUIRE_VIRTUALENV=1
has been quite helpful in the past as pip then refuses to just install things directly.


There is https://peps.python.org/pep-0668/ which suggests that in the future this kind of behaviour will be default. I'm not sure of the specifics but I have seen lots of conversation about it in Debian circles.


Nice!


OK, but... I get the same problem when I compile python from source. I'm not talking about the distribution's base files. In fact, I am in the business of creating an /opt/gjvc-corp-tools/ prefix with all my packages under there. When I compile python from source on Debian, the resulting installation (from make install) does not have a site-packages directory in-place already. That is what is mind-boggling.


> This is fundamentally at odds with pip

It's at odds with everything. I leave the system versions of any language alone and use language manager tools or docker to be able to run the exact version that any project of my customers require. Asdf is my favorite because it handles nearly everything, even PostgreSQL.


Imagine you installed python3-requests (version x.y.z). Some of your distribution's packages depend on that specific package/version.

If you pip install requests globally, you just broke a few of your distrib's packages.


> I still haven't figured out how to beat this dragon. All suggestions welcome!

Docker


What distro did you move to? IME debian as a base image for python app containers is also kind of a pain.


We moved to stripped down Debian images in containers and made sure to not use any of the Debian packaging ecosystem.


It works completely fine in my experience.


Lucky you. Having gone through multiple Debian upgrades, a Python 2->3 migration on Debian, and Debian python packaging to Pip/PyPI, it was a whole world of pain that cost us months of development time over years, as well as a substantial amount of downtime.


If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash

There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.


I'm in strings, not 64 bit keys. But thanks, nice to share ideas.


The idea is to hash the string into a 64-bit key. You can store the string in a value, or you can have a separate vector and make the value a struct that has the key and the value.

The chance of colliding on the 64-bit space is low if the hash distributes evenly, so you just yolo it.


> it's probably all about string index into dict and dict management

Cool. Is the performance here something you would like to pursue? If so could you open an issue [0] with some kind of reproducer?

[0] https://foss.heptapod.net/pypy/pypy/-/issues


I'm thinking about how to demonstrate the problem. I have a large pickle but pickle load/dump times across gc.disable()/gc.enable() really doesn't say much.

I need to find out how to instrument the seek/add cost of threads against the shared dict under a lock.

My gut feel is that probably if I inlined things instead of calling out to functions I'd shave a bit more too. So saying "slower than expected" may be unfair because there's limits to how much you can speed this kind of thing up. Thats why I wondered if alternate datastructures were a better fit.

its variable length string indexes into lists/dicts of integer counts. The advantage of a radix trie would be finding the record in semi constant time to the length in bits of the strings, and they do form prefix sets.


Would love to hear more. You can reach us with any of these methods https://www.pypy.org/contact.html


Uplift from normal python was trivial.

By definition if you lift something it is going to go up, but what does this mean?


If you replace your python engine you have to replace your imports.

Some engines can't build and deploy all imports.

Some engines demand syntactic sugar to do their work. Pypy doesn't


One should really consider using containers in this situation.


Can you describe what in this situation warrants it?

I'm very curious about where the line is/should be.


In my experience leaving the system python interpreter the way it was shipped will save you enormous headaches down the road. Anytime I find myself needing additional python packages installed I will almost always at minimum create a virtual env, or ideally a container.


I use it at work for a script that parses and analyzes some log files in an unusual format. Wrote a naive parser with a parsing combinator library. It was too slow to be usable with CPython. Tried PyPy and got a 50x speed increase (yes, 50 times faster). Very happy with the results, actually =)


Thanks for the feedback. It does seem like parsing logs and simulations is a sweet spot for PyPy


Simulations are, at least in my experience, numba’s [0] wheelhouse.

[0]: https://numba.pydata.org/


what cpython version and OS was that? I'd be very surprised if modern Python 3.11 has anything an order of magnitude slower like that. things have gotten much faster over the years in cpython


I put PyPy in production at a previous job, running a pretty high traffic Flask web app. It was quick and pretty straightforward to integrate, and sped up our request timings significantly. Wound up saving us money because server load went down to process the same volume of requests, so we were able to spin down some instances.

Haven’t used it in a bit mostly because I’ve been working on projects that haven’t had the same bottleneck, or that rely on incompatible extensions.

Thank you for your work on the project!


You're welcome.

> that rely on incompatible extensions.

Which ones? Is using conda an option, we have more luck getting binary packages into their build pipelines than getting projects to build wheels for PyPI


I can't actually remember off of the top of my head, I tried it out a year or two ago but didn't get too far because during profiling it became clear the biggest opportunities for performance improvement in this app were primarily algorithmic/query/io optimizations outside of Python itself, so business-wise it didn't make too much sense, though if it had I think using Conda would have been on the table. We make heavy use of Pandas/Numpy et al, though I know those are largely supported now so I'd guess it was not one of them but something adjacent.


This post is a funny coincidence as I tried today to speed-up a CI pipeline running ~10k tests with pytest by switching to pypy.

I am still working on it but the main issue is psycopg support for now, as I had to install psycopg2cffi in my test environment, but it will probably prevent me from using pypy for running our test suite, because psycopg2cffi does not have the same features and versions as psycopg2. This means either we switch our prod to pypy, which won't be possible because I am very new in this team and that would be seen as a big, risky change by the others, or we keep in mind the tests do not run using the exact same runtime as production servers (which might cause bugs to go unnoticed and reach production, or failing tests that would otherwise work on a live environment).

I think if I ever started a python project right now, I'd probably try and use pypy from the start, since (at least for web development) there does not seem to be any downsides to using it.

Anyways, thank you very much for your hard work !


If you use recent versions of PostgreSQL (10+ I believe) you can use psycopg3 [1] which has a pure Python implementation which should be compatible with PyPy.

[1]: https://www.psycopg.org/psycopg3/docs/basic/install.html


Second this - no psycopg2 support and to a lesser extent lxml is a nonstarter and makes it pretty difficult to experiment with on production code bases. I could see a lot of adoption from Django deployments otherwise.


Yeah we don't use pypy for those exact reasons on our small django projects.


I work on pg8000 https://pypi.org/project/pg8000/ which is a pure-Python PostgreSQL driver that works well with pypy. Not sure if it would meet all your requirements, but just thought I'd mention it.


One compromise could be to run pypy on draft PRs and CPython on approved PRs and master?


I use CPython most of the time but PyPy was a real lifesaver when I was doing a project that bridged EMOF and RDF, particularly I was working with moderately sized RDF models (say 10 million triples) with rdflib.

With CPython, I was frustrated with how slow it was, and complained about it to the people I was working with, PyPy was a simple upgrade that sped up my code to the point where it was comfortable to work with.


That is a great idea! I use rdflib frequently and never thought to try it with PyPy. Now I will.


Is your group still using it?


That particular code has been retired because after a quite a bit of trying things that weren’t quite right we understood the problem and found a better way to do it. I’m doing the next round of related work (logically modeling XSLT schemas and associated messages in OWL) in Java because there is already a library that almost does was I want.

I am still using this library that I wrote

https://paulhoule.github.io/gastrodon/

to visualize RDF data so even if I make my RDF model in Java I am likely to load it up in Python to explore it. I don’t know if they are using PyPy but there is at least one big bank that has people using Gastrodon for the same purpose.


What do you use RDF models for?


So I wrote this library

https://paulhoule.github.io/gastrodon/

which makes it very easy to visualize RDF data with Jupyter by turning SPARQL results into data frames.

Here are two essays I wrote using it

https://ontology2.com/essays/LookingForMetadataInAllTheWrong...

https://ontology2.com/essays/PropertiesColorsAndThumbnails.h...

People often think RDF never caught on but actually there are many standards that are RDF-based such as RSS, XMP, ActivityPub and such that you can work on quite directly with RDF tools.

Beyond that I’ve been on a standards committee for ISO 20022 where we’ve figured out, after quite a few years of looking at the problem, how to use RDF and OWL as a master standard for representing messages and schemas in financial messaging. In the project that needed PyPy we were converting a standard represented in EMOF into RDF. Towards the end of last year I figured out the right way to logically model the parts of those messages and the associated schema with OWL. That is on its way of becoming one of those ISO standard documents that unfortunately costs 133 swiss franc. I also figured out that it is possible to do the same for many messages defined with XSLT and I’m expecting to get some work applying this to a major financial standard and I think there will be some source code and a public report on that.

Notably the techniques I use address quite a few problems with the way most people use RDF, most notably many RDF users don’t use the tools available to represented ordered collections, a notable example with this makes trouble is in Dublin Core for document (say book) metadata where you can’t represent the order of the authors of a paper which is something the authors usually care about a great deal. XMP adapts the Dublin Core standard enough to solve this problem, but with the techniques I use you can use RDF to do anything any document database can, though some SPARQL extensions would make it easier.


Thanks for reminding me to look at PyPy again. I usually start all my new Python projects with this block of commands that I keep handy:

Create venv and activate it and install packages:

  python3 -m venv venv
  source venv/bin/activate
  python3 -m pip install --upgrade pip
  python3 -m pip install wheel
  pip install -r requirements.txt

I wanted a similar one-liner that I could use on a fresh Ubuntu machine so I can try out PyPy easily in the same way. After a bit of fiddling, I came up with this monstrosity which should work with both bash and zsh (though I only tested it on zsh):

Create venv and activate it and install packages using pyenv/pypy/pip:

  if [ -d "$HOME/.pyenv" ]; then rm -Rf $HOME/.pyenv; fi && \
  curl https://pyenv.run | bash && \
  DEFAULT_SHELL=$(basename "$SHELL") && \
  if [ "$DEFAULT_SHELL" = "zsh" ]; then RC_FILE=~/.zshrc; else RC_FILE=~/.bashrc; fi && \
  if ! grep -q 'export PATH="$HOME/.pyenv/bin:$PATH"' $RC_FILE; then echo -e '\nexport PATH="$HOME/.pyenv/bin:$PATH"' >> $RC_FILE; fi && \
  if ! grep -q 'eval "$(pyenv init -)"' $RC_FILE; then echo 'eval "$(pyenv init -)"' >> $RC_FILE; fi && \
  if ! grep -q 'eval "$(pyenv virtualenv-init -)"' $RC_FILE; then echo 'eval "$(pyenv virtualenv-init -)"' >> $RC_FILE; fi && \
  source $RC_FILE && \
  LATEST_PYPY=$(pyenv install --list | grep -P '^  pypy[0-9\.]*-\d+\.\d+' | grep -v -- '-src' | tail -1) && \
  LATEST_PYPY=$(echo $LATEST_PYPY | tr -d '[:space:]') && \
  echo "Installing PyPy version: $LATEST_PYPY" && \
  pyenv install $LATEST_PYPY && \
  pyenv local $LATEST_PYPY && \
  pypy -m venv venv && \
  source venv/bin/activate && \
  pip install --upgrade pip && \
  pip install wheel && \
  pip install -r requirements.txt
Maybe others will find it useful.


Just a note; these scrips are not comparable in monstrosity as the first is about to initiate the project when as the second one is to initiate whole PyPy installation.

So if you have PyPy already on your machines;

  pypy -m venv venv && \
    source venv/bin/activate && \
    pip install --upgrade pip && \
    pip install wheel && \
    pip install -r requirements.txt
Was not that bad after all, when my initial thought was that do I need all the above to just initiate the project :D


That's true, but you can run the first block of commands on a brand new Ubuntu installation because regular CPython is installed by default. Whereas you would need to do the whole second block when starting on a fresh machine.


Given you'll want to activate a virtual environment for most Python projects, and projects live in directories.. I find myself constantly reaching for direnv. https://github.com/direnv/direnv/wiki/Python

    echo "layout python\npip install --upgrade pip pip-tools setuptools wheel\npip-sync" > .envrc
When you CD into a given project, it'll activate the venv, upgrade to non-ancient versions of Pip/etc with support for latest PEPs (ie. `pyproject.toml` support on new Python 3.9 env), verify the latest pinned packages are present.. it's just too useful not to have.

    direnv stdlib
This command (or this link https://direnv.net/man/direnv-stdlib.1.html) will print many useful functions that can be used in the `.envrc` shell script that is loaded when entering directories, ranging from many languages, to `dotenv` support, to `on_git_branch` for e.g. syncing deps when switching feature branches.

Check it out if you haven't.. I've been using it for more years than I can count and being able to CD from a PHP project to a Ruby project to a Python project with ease really helps with context switching.


If you have a system-level installed pypy, the pypy equivalent is:

  python3 -m venv -p pypy3 venv
  source venv/bin/activate
  python3 -m pip install --upgrade pip
  python3 -m pip install wheel
  pip install -r requirements.txt
Not very different...


For a more apples to apples comparison, you would install pypy using your package manager, e.g. apt install pypy3 or brew install pypy3. On Linux, you might have to add a package repo first.


I find that much scarier to do personally since it seems a lot more likely to screw up other stuff on your machine, whereas with pyenv it's all self-contained in the venv. Also using apt packages tends to install a pretty old version.


No, installing a package with apt is not more likely to screw up your machine than installing it manually. Moreover, you seem to be completely fine using the apt-installed CPython, while you think PyPy needs to be installed manually.


I use pyenv myself, but that is beside the point. The two examples above are using different strategies to install python3 versus pypy. A valid comparison would use a package manager for both or pyenv for both.


We don't. To be honest, I didn't realize PyPy supported Python 3. I thought it was eternally stuck on Python 2.7.

So the good: It apparently now supports Python 3.9? Might want to update your front page, it only mentions Python 3.7.

The bad: It only supports Python 3.9, we use newer features throughout our code, so it'd be painful to even try it out.


Their docs seem perpetually out of date, but they recently released support for 3.10. I haven't been able to try it recently because our projects use 3.10 features but in the past it was easily a 10-100x speedup as long as all the project's libraries worked.

https://downloads.python.org/pypy/


It supports Python3.10 now too. Thanks, I updated the site.


I think it supports up to 3.10, as there are official docker images for this version, I saw them this morning.

Maybe the site is not up to date ?


You should probably put "Ask HN:" in your title.

Personally I don't use PyPy for anything, though I have followed it with interest. Most of the things I need to go faster are numerical, so Numba and Cython seem more appropriate.


Cut him some slack, he's only been registered for 10 years


I read this as humor and I imagine mattip may have done also.


I don’t think it’s about being strict or condescending. In some HN readers the post will show up in a different catalogue and generally be easier for people to find, thus giving the post more visibility :)

Edit; typo


I use PyPy quite often as a 'free' way to make some non-numpy CPU-bound Python script faster. This is also the context for when I bring up PyPy to others.

The biggest blocker for me for 'defaulting' to PyPy is a) issues when dealing with CPython extensions and how quite often it ends up being a significant effort to 'port' more complex applications to PyPy b) the muscle memory for typing 'python3' instead of 'pypy3'.


For the b) part, you should consider creating alias for that command, if it really might lead for you to not use it otherwise.


I had the same thought. For years I have aliased ‘p’ for ‘python’ and after reading this thread I will alias ‘pp’ for ‘pypy’.


We use PyPy extensively at my employer, a small online retailer, for the website, internal web apps, ETL processes, and REST API integrations.

We use the PyPy provided downloads (Linux x86 64 bit) because it's easier to maintain multiple versions simultaneously on Ubuntu servers. The PyPy PPA does not allow this. I try to keep the various projects using the latest stable version of PyPy as they receive maintenance, and we're currently transitioning from 3.9/v7.3.10 to 3.10/v7.3.12.

Thank you for all of the hard work providing a JITed Python!


Cool. Would love to hear more about the successes and problems, or even get a guest blog post on https://www.pypy.org/blog/


Nice to meet you here mattip.We had used PyPy for several years and I had raise this several times that only thing lacking PyPy is marketing ( and wrong information on cpyext unsupported ). PyPy gave us 8x performance boost on average, 4x min , 20x on especially JSON operation on long loops.

PyPy should had become standard implemention and it would save a lot of investment on Fast python

I tried to shill PyPy all the time but thanks to outdated website and weird reason of hetapod love ( at least put something on GitHub for discovery sick) , the devs who won't bother to look anything further than a GitHub page frawns upon me thinking PyPy is outdated and inactive project.

PyPy is one of the most ambitious project in opensource history and lack of publicity make me scream internally.


I use it for data transformation, cleanup and enrichment. (TXT, CSV, Json, XML, database) to (TXT, CSV, JSON, XML, database).

Speed up of 30x - 40x. The highest speedup on those that require logic in the transformation. (lot of function calls, numerical operations and dictionary lookups).


Similar. I was working on some ETL work with SQLite, and now PyPy is my regular tool for getting better performance at similar jobs.


Same. I have used it for many ETL jobs, usually with about a 10x speed up. It also pulled in the latency on some Flask rest apis.


Copying from an older comment of mine shilling Pypy https://news.ycombinator.com/item?id=25595590

PyPy is pretty well stress-tested by the competitive programming community.

https://codeforces.com/contests has around 20-30k participants per contest, with contests happening roughly twice a week. I would say around 10% of them use python, with the vast majority choosing pypy over cpython.

I would guesstimate at least 100k lines of pypy is written per week just from these contests. This covers virtually every textbook algorithm you can think of and were automatically graded for correctness/speed/memory. Note that there's no special time multiplier for choosing a slower language, so if you're not within 2x the speed of the equivalent C++, your solution won't pass! (hence the popularity of pypy over cpython)

The sheer volume of advanced algorithms executed in pypy gives me huge amount of confidence in it. There was only one instance where I remember a contestant running into a bug with the jit, but it was fixed within a few days after being reported: https://codeforces.com/blog/entry/82329?#comment-693711 https://foss.heptapod.net/pypy/pypy/-/issues/3297.

New edit from that previous comment: there's now a Legendary Grandmaster (ELO rating > 3000, ranking 33 out of hundreds of thousands) who almost exclusively use pypy: https://codeforces.com/submissions/conqueror_of_tourist


Really cool!

Competitive Programming needs a lot of speed to compete with the C++ submissions, really cool that there are Contestants using Python to win.


I do think it would be very useful to have an online tool that lets you paste in your requirements.txt and then tells you which of the libraries have been recently verified to work properly with PyPy without a lot of additional fuss.

Also, you might want to flag the libraries that technically "work" but still require an extremely long and involved build process. For example, I recently started the process of installing Pandas with pip in a PyPy venv and it was stuck on `Getting requirements to build wheel ...` for a very long time, like 20+ minutes.


I was experimenting with some dynamic programming 0/1 knapsack code last week. PyPy available through the distro (7.3.9) was making a reasonable speed up, but not phenomenally. Out of curiousity I grabbed the latest version through pyenv (7.3.12) and it looks like some changes between them suddenly had the code sit in a sweet spot with it, I saw a couple of orders of magnitude better performance out of it. Good work.

I'm rarely using python in places at work where it would suit it (lots of python usage, but they're more on the order of short run tools), but I'm always looking for chances and always using it for random little personal things.


Yes. We have a legacy Python-based geospatial data processing pipeline. Switching from CPython to PyPy sped it up by a factor of 30x or so, which was extremely helpful.

Thank you for your amazing work!


Would love to hear more. Is it still being used?


When I worked at Transit App, I built a backend pre-processing pipeline to compress transit and osm data in python [1] and also another pipeline to process transit map data in python [2]. Since the Ops people complained about how long it took to compress the transit feeds (I think London took 10h each time something changed), I migrated everything to Pypy. Back then that was a bit annoying cuz it meant I had to remove numpy as a requirement, but other than that there were few issues. Also it meant we were stuck on 2.7 for quite a while, so long that I hadnt prepared a possible migration to 3.x. The migration happened after I left. Afaik they still use pypy.

Python is fun to work with (except classes…), but its just sooo slow. Pypy can be a life saver.

[1] https://blog.transitapp.com/how-we-shrank-our-trip-planner-t... [2] https://blog.transitapp.com/how-we-built-the-worlds-pretties...


What kind of speed up did you get?


Some parts 10x and more. Overall a bit more than 5x.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: