
A Practical Guide to Memory Leaks in Node.js - shakes
https://www.arbazsiddiqui.me/a-practical-guide-to-memory-leaks-in-nodejs/
======
Ecco
I've always felt like "memory leak" wasn't the proper term for GC'ed
languages.

If you forget to free a malloc'ed buffer, yeah, you're leaking memory.

Here, in the example given, well, the developer would be doing something
stupid (i.e. keep track of something created at each request in a global
variable). This isn't a leak per se.

I mean, it's not a memory management bug you can fix: it's a behavior that's
inherently not sustainable and that will need to be changed.

~~~
millstone
GC theory starts with "dead objects should be collected", and then immediately
retreats to "unreachable objects should be collected" since unreachable
objects are dead.

But some objects are reachable, and also dead. My progress spinner advances on
a timer, but nobody will ask me to draw. I'll invalidate my bitmap cache on
screen resolution changes, but nobody cares about my bitmap.

Most GCs would rather leak than introduce memory unsafety. But their design
biases towards leaks: weak references become this weird expensive path and
that's why you don't use them routinely.

~~~
sweeneyrod
Collecting exactly the dead objects in all situations is not possible even in
theory:

    
    
        function f() {
            x = Object(1)
            if (halts(some_program)) {
                return Object(2)
            } else {
                return x;
            }
        }
    

The contents of x will be dead or not after the assignment depending on
whether some_program halts. But a GC that could determine that could solve the
halting problem.

~~~
zamadatix
That code doesn't make any sense. Either halts() itself is a function that
solves the halting problem or the else branch is a dead branch because in
order to be selected halts() must have never returned.

Dynamic code is where the GC can't know if something simply has a stale
reference or if the codepath that calls it simply doesn't exist yet. It's be
interesting to need to tag such data appropriately as that opens up a lot of
leak protection.

~~~
sweeneyrod
Yep, you're right. I should've just said that the condition of the if
statement can in general be undecidable.

------
nosianu
A "leak" we found as this:

Our code parsed a large HTML-like string read from a file and extracted a
small portion. Then we created an array with those extracted strings (many
files). The original large HTML-like string was no longer needed.

The problem: The (V8) runtime never created a new (very small) string and
copied the section. Instead, it kept the huge strings around. So while we
needed only 64 bytes from many kBytes of a string, we ended up keeping the
kBytes anyway. Since those were pretty big arrays we ended up with a _huge_
amount of wasted memory.

We ended up with a hack-function to do a substring creation that forced V8 to
create a new string, by using string addition, preventing V8 from "optimizing"
and using a pointer into the existing string (code shown is only the core part
of that function):

    
    
        s.substr(start, length - 1) + s.charAt(start + length - 1);
    

This was a process size difference of hundreds of megabytes, since we read a
lot of files and extracted a lot of values. Array(100000) of 64 byte strings
vs. Array(100000) of many kBytes of strings, just to give an idea of the
possible magnitude. The more long strings you extract small values from the
more of a problem you get.

This also could be a response to @Ecco. This leak is caused by internal
runtime behavior. There actually is an open issue for this, has been open for
quite some time. I don't understand it, this only isn't a huge problem because
not many people have code that extracts tiny parts from lots of strings and
then keeps references to those tiny strings around. But that's legit code, and
anyone who does runs into this problem, and it is not a problem of the JS
code. Maybe the optimization should force a copy if the large string could be
GCed, but sure, that's quite a bit of work. Still, the current state of simply
keeping references to the original string for all substrings seems problematic
to me.

The issue is this one I think (I only just googled quickly):
[https://bugs.chromium.org/p/v8/issues/detail?id=2869](https://bugs.chromium.org/p/v8/issues/detail?id=2869)

Somebody's blog post: [https://rpbouman.blogspot.com/2018/03/a-tale-of-
javascript-m...](https://rpbouman.blogspot.com/2018/03/a-tale-of-javascript-
memory-leak.html)

~~~
KingOfCoders
It's interesting as Java developer went through the same things 15 years ago
because Java Strings are immutable and had the same behaviour back then (I was
bitten by this back then too).

------
keitmo
A few years ago we had a nightmarish resource leak in our server. The code in
question was reading and parsing HTML, looking for a handful of specific tags
(title, description, etc). Under heavy load the server would be stable for a
few hours, then memory would suddenly explode and kill the NodeJS process.

The problem was caused by the HTML parser we were using. The parsing results
appeared to be a POJO but apparently there was much lurking under the surface.

The fix: `parseResults = JSON.parse(JSON.stringify(parsedResults))`

~~~
mekster
Which library was that?

------
swapsCAPS
I've found this to be rarely as simple as the post makes it out to be. It
might not be in one place, you might not know what to look for exactly and
there is no way of knowing where the actual leak is coming from. You just know
that some array or object is large. Also, instead of installing packages, you
can achieve the same with the chrome dev-tools and running your process with
--inspect.

------
tedeh
Now I'd like to hear if anyone has any tips on how to go from the "Alloc.
Size" sorted heap dump where you can clearly see a huge array of waste, to the
actual location/initialisation point of the array in your code base. Trivial
when you have one file and two external libraries, maybe not so trivial when
you have hundreds of files and hundreds of libraries.

~~~
wonnage
Usually the best method is to take two dumps and diff them. Just because
something shows up big in a heap dump doesn't mean it's a leak, it needs to be
big and continually growing.

------
sstephant
Memory leaks are not the only kind of leak I had to deal with in my career. I
think one the worst I have stumbled upon is a network connexion leak in a
database connexion pool.

