
Fixing a Tough Memory Leak in Python - jtd64
https://info.cloudquant.com/2018/12/numpyleaks/
======
Gladdyu
When I worked on a Python app that did real-time processing of (very large)
lists of python dicts, merely processing them in a list comprehension caused
"leaks" causing unbound virtual memory growth. The issue was that that for
small allocations, eg. interned strings, malloc would use the "brk" call
rather than "mmap". "brk" simply bumps the top of heap pointer, so if a more
long-lived object gets allocated during the time of the comprehension it
creates high-water marks in the heap.

The solution at the time was to process the large lists in smaller chunks at a
time to make it less likely that a small longer lived object would be
allocated in the middle, preventing later cleanup. However, a better solution
would probably have been to rewrite parts of the app in a less allocation-
happy language.

~~~
glandium
Python uses its own allocator for small sizes (below 512 bytes if I remember
correctly). I've had a similar problem to yours. See
[https://glandium.org/blog/?p=3698](https://glandium.org/blog/?p=3698) and the
followup
[https://glandium.org/blog/?p=3723](https://glandium.org/blog/?p=3723) . I
have a glibc bug open about the issue
[https://sourceware.org/bugzilla/show_bug.cgi?id=23416](https://sourceware.org/bugzilla/show_bug.cgi?id=23416)
.

~~~
MrRadar
Wow, I've experienced an issue that's very similar to that one if not the
same. Using malloc/free directly worked just fine on Windows and older
versions of glibc but lead to runaway RSS on at least some newer versions. I
ended up writing a small custom allocator on top of mmap to work around it.
It's incredibly frustrating when the memory leak is in the system allocator
and not your code.

~~~
X-Istence
We at Crunch.io run Python using jemalloc because we have found some really
bad behaviour in the default glibc malloc when using Python with many threads.

Using jemalloc we were able to reduce our memory pressure over time
dramatically.

------
sqrt17
The title seems to suggest that there's a memory leak in (C)Python, and while
it's tedious to debug mixed Python+C code it's a Nice Numba memory leak rather
than one in Python or numpy.

~~~
jtd64
Agreed - But part of the solution is always allowing iterators in numpy to run
to completion. Breaking in the middle causes a leak.

