Fixing a Tough Memory Leak in Python

Gladdyu · on Dec 8, 2018

When I worked on a Python app that did real-time processing of (very large) lists of python dicts, merely processing them in a list comprehension caused "leaks" causing unbound virtual memory growth. The issue was that that for small allocations, eg. interned strings, malloc would use the "brk" call rather than "mmap". "brk" simply bumps the top of heap pointer, so if a more long-lived object gets allocated during the time of the comprehension it creates high-water marks in the heap.

The solution at the time was to process the large lists in smaller chunks at a time to make it less likely that a small longer lived object would be allocated in the middle, preventing later cleanup. However, a better solution would probably have been to rewrite parts of the app in a less allocation-happy language.

glandium · on Dec 8, 2018

Python uses its own allocator for small sizes (below 512 bytes if I remember correctly). I've had a similar problem to yours. See https://glandium.org/blog/?p=3698 and the followup https://glandium.org/blog/?p=3723 . I have a glibc bug open about the issue https://sourceware.org/bugzilla/show_bug.cgi?id=23416 .

MrRadar · on Dec 8, 2018

Wow, I've experienced an issue that's very similar to that one if not the same. Using malloc/free directly worked just fine on Windows and older versions of glibc but lead to runaway RSS on at least some newer versions. I ended up writing a small custom allocator on top of mmap to work around it. It's incredibly frustrating when the memory leak is in the system allocator and not your code.

X-Istence · on Dec 8, 2018

We at Crunch.io run Python using jemalloc because we have found some really bad behaviour in the default glibc malloc when using Python with many threads.

Using jemalloc we were able to reduce our memory pressure over time dramatically.

civility · on Dec 8, 2018

Python is not efficient, but the bug you described seems like a problem with that implementation of malloc...

Gladdyu · on Dec 8, 2018

I'd say it's a problem in the interaction of that specific version of python with that specific malloc. From a performance perspective we could not afford to simply force malloc to only use mmap for every small allocation (as it would thrash the TLB and/or balloon memory usage).

Something they could have done inside Python is manually implement a short-lived objects space that lives entirely in pre-allocated memory, preventing the repeated allocations.

I should have mentioned that the app in particular was running python2.7 - some preliminary tests on porting to python3 seemed to indicate that the issue was significantly less severe there, though the problem only occurred very infrequently that restarting the affected processes when they started exceeding a memory threshold was a more cost-effective solution that spending ages debugging the interactions.

masklinn · on Dec 8, 2018

> I'd say it's a problem in the interaction of that specific version of python with that specific malloc. From a performance perspective we could not afford to simply force malloc to only use mmap for every small allocation (as it would thrash the TLB and/or balloon memory usage).

And other option might have been to switch to a malloc using mmap'ed size-class arenas like jemalloc or somesuch?

Waterluvian · on Dec 8, 2018

If a generator expression was a valid option, did it change what happened with your leak?

albertzeyer · on Dec 8, 2018

Can you construct a simple example case of that? I think you should report that upstream.

Gladdyu · on Dec 8, 2018

It's in the interaction between python and malloc and also migrating to python3 made the issue mostly go away, which was sufficient for our purposes.

amelius · on Dec 8, 2018

A generational garbage collector would not have that problem.

josefx · on Dec 8, 2018

As far as I know the garbage collector in the reference implementation is only a fallback for cycle detection. The various alternative implementations (pypy, IronPython, Jython, ... ) should have a modern GC in exchange for not supporting the same C API and predictable object disposal.

sqrt17 · on Dec 8, 2018

The title seems to suggest that there's a memory leak in (C)Python, and while it's tedious to debug mixed Python+C code it's a Nice Numba memory leak rather than one in Python or numpy.

jtd64 · on Dec 10, 2018

Agreed - But part of the solution is always allowing iterators in numpy to run to completion. Breaking in the middle causes a leak.