When I worked on a Python app that did real-time processing of (very large) lists of python dicts, merely processing them in a list comprehension caused "leaks" causing unbound virtual memory growth. The issue was that that for small allocations, eg. interned strings, malloc would use the "brk" call rather than "mmap". "brk" simply bumps the top of heap pointer, so if a more long-lived object gets allocated during the time of the comprehension it creates high-water marks in the heap.
The solution at the time was to process the large lists in smaller chunks at a time to make it less likely that a small longer lived object would be allocated in the middle, preventing later cleanup. However, a better solution would probably have been to rewrite parts of the app in a less allocation-happy language.
Wow, I've experienced an issue that's very similar to that one if not the same. Using malloc/free directly worked just fine on Windows and older versions of glibc but lead to runaway RSS on at least some newer versions. I ended up writing a small custom allocator on top of mmap to work around it. It's incredibly frustrating when the memory leak is in the system allocator and not your code.
We at Crunch.io run Python using jemalloc because we have found some really bad behaviour in the default glibc malloc when using Python with many threads.
Using jemalloc we were able to reduce our memory pressure over time dramatically.
I'd say it's a problem in the interaction of that specific version of python with that specific malloc. From a performance perspective we could not afford to simply force malloc to only use mmap for every small allocation (as it would thrash the TLB and/or balloon memory usage).
Something they could have done inside Python is manually implement a short-lived objects space that lives entirely in pre-allocated memory, preventing the repeated allocations.
I should have mentioned that the app in particular was running python2.7 - some preliminary tests on porting to python3 seemed to indicate that the issue was significantly less severe there, though the problem only occurred very infrequently that restarting the affected processes when they started exceeding a memory threshold was a more cost-effective solution that spending ages debugging the interactions.
> I'd say it's a problem in the interaction of that specific version of python with that specific malloc. From a performance perspective we could not afford to simply force malloc to only use mmap for every small allocation (as it would thrash the TLB and/or balloon memory usage).
And other option might have been to switch to a malloc using mmap'ed size-class arenas like jemalloc or somesuch?
As far as I know the garbage collector in the reference implementation is only a fallback for cycle detection. The various alternative implementations (pypy, IronPython, Jython, ... ) should have a modern GC in exchange for not supporting the same C API and predictable object disposal.
The title seems to suggest that there's a memory leak in (C)Python, and while it's tedious to debug mixed Python+C code it's a Nice
Numba memory leak rather than one in Python or numpy.
The solution at the time was to process the large lists in smaller chunks at a time to make it less likely that a small longer lived object would be allocated in the middle, preventing later cleanup. However, a better solution would probably have been to rewrite parts of the app in a less allocation-happy language.