How much time did the PyAST_mod2obj actually take? The rewritte is 16x faster but the article doesn't make it clear if most of the speedup came from switching to the ruff parser (specially because it puts the GC overhead at only 35% of the runtime).
That's a good question. I don't have an easy way to rerun the comparison since this happened actually a while ago, but I do remember some relevant numbers.
In the first iteration of the Rust extension, I actually used the parser from RustPython. Although I can't find it at the moment, I think the RustPython parser was actually benchmarked as worse than the builtin ast parse (when both returned Python objects).
Even with this parser, IIRC the relevant code was around 8-11x faster when it avoided the Python objects. Apart from just the 35% spent in GC itself, the memory pressure appeared to be causing CPU cache thrashing (`perf` showed much poorer cache hit rates). I'll admit though that I am far from a Valgrind expert, and there may have been another consequence of the allocations that I missed!