It's easy to drop in Cython in an existing project where you need some performance, and start gradually "cythonizing" modules from the inside out. The rest of the code does not need to care.
With a bit of care (and benchmarking) you can get very respectable speed. The main drawback is that the further you go, the more C knowledge you need in order to not blast your own feet off.
If you're just after a bit more performance in general, a drop in solution like pypy might be enough.
I can't speak for the parent commenter, but there is often code processing the input/output of machine learning models that benefits from high-performance implementations. To give two examples:
1. We recently implemented an edit tree lemmatizer for spaCy. The machine learning model predicts labels that map to edit trees. However, in order to lemmatize tokens, the trees need to be applied. I implemented all the tree wrangling in Cython to speed up processing and save memory (trees are encoded as compact C unions):
2. I am working on a biaffine parser for spaCy. Most implementations of biaffine parsing use a Python implementation of MST decoding, which is unfortunately quite slow. Some people have reported that decoding dominates parsing time (rather than applying an expensive transformer + biaffine layer). I have implemented MST decoding in Cython and it barely shows up in profiles:
We had to parse dozeon of 20GB files daily with super complex structure and not in linear structure.
With Cython (finally we migrated to Pypy) we gained around 20-60x speedup.