Pretty much the entire Data Science/Machine Learning landscape from numpy, etc. to tensorflow and alike are thin wrappers over C code and if you want performance, you better batch structure your operation beforehand and minimize back and forth from Python.
The GIL is only a problem if you’re trying to access the same memory.
If you take the time to split up your data into chunks you can avoid the GIL entirely with multiprocessing. Or by handing it off to a library that does it for you. just as long as they’re not using the same python objects.
Using multiprocessing adds some overhead for because of serialization, so it's slower than it would be to just hand off the Python objects directly to the workers (as you can with threads) because the process doing the parsing also has to spend time on serializing them again. So you can avoid the GIL, but it has a cost.
For example, if I parse an XML document with ElementTree, as a quick experiment, parsing the document takes ~1 second and serializing all the elements names and attributes to JSON takes an additional ~0.5 seconds. Serializing the whole ElementTree object using pickle takes ~4 seconds. Serializing it as XML takes roughly as long as parsing it.
Plus, [de]serialization often needs to be done under GIL (although under non-shared GIL per process, it interferes with any other async thing that may be done in each process).
It also screws up C extensions that don't support fork and does not solve the problem of increasing TLS handshake/fanout in outbound requests by the number of cores.
multiprocessing has more overhead even without serialization. I just brought it up to expand on why the GIL would force someone to thinking about being data-parallel.
What I was trying to say, I guess, is that the additional serialization overhead can't be parallelized, which means that parallelization with multiprocessing doesn't help much in some cases where GIL-free threading would.
Pyflink seems promising, I love vanilla flink but as soon as you need to debug your pyflink job pyflink becomes a hurdle. That translation layer between Python and Java can be opaque.