> threading doesn't give me any benefit due to the GIL
If your heavy lifting is done in C extensions, and those extensions release the GIL before doing that work, you can still get the benefit of parallelism within CPython.
Part of the heavy lifting is in C extensions, but part is in Python code that's hard to migrate to C, and that's the part that takes more time now after migrating the low hanging fruit. I'd get a bit of benefit, but far from what I'd get if the GIL wasn't a thing.
Depending on the nature of the interpreted Python part of the code, you could get a 60-100 times speedup by using PyPy (no rewriting needed). Removing the GIL in CPython would only beat that if you could get 60-100 times parallelism.
I tried using PyPy several times, but I always run into problems due to the C extension and other libraries I use. Also, last time I checked it wasn't very up-to-date with respect to the mainstream Python version.
If your heavy lifting is done in C extensions, and those extensions release the GIL before doing that work, you can still get the benefit of parallelism within CPython.