Hacker News new | past | comments | ask | show | jobs | submit login
How JavaScript works: memory management and common memory leaks (sessionstack.com)
374 points by zlatkov on Sept 13, 2017 | hide | past | web | favorite | 55 comments

> As of 2012, all modern browsers ship a mark-and-sweep garbage-collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last years are implementation improvements of this algorithm (mark-and-sweep), but not improvements over the garbage collection algorithm itself, nor its goal of deciding whether an object is reachable or not.

WebKit uses a constraint-based garbage collector that does not rely on reachability alone. This is an improvement over the classical garbage collection algorithm.

Nice article. Minor gripe about the static vs dynamic memory section. The requirement that the data sizes be known at compile-time for static memory (with the example of an array allocated to a user-inputted size), seems to be based on a past restriction of the C language. C has since remove the requirement that stack-based arrays are sized with a compile-time constant; there is nothing at the hardware/assembly level which prevents such arrays. So these stack-based non-compile-time-sized arrays don't fit into either the static or dynamic memory categories presented here.

I was bothered by saying that static memory is assigned on the stack (implied that it is only on the stack). As an embedded guy, at least in bare metal situations, local function statics, any const variables, and any globals are either in the read-only data section[1] or in the general data section[2], both before the heap and definitely not in the stack section.

[1] - .rodata section in gcc

[2] - I think this is in the .data and .bss sections, where .data is copied from the file and .bss is zero initialized before calling main by the crt. If you peak into a linker script, or dump one by passing --verbose to ld, you can even see where it puts all the C++ bits and pieces. A dump I did on Debian with g++ is here: https://gist.github.com/Phyllostachys/0682a3bda13ef9c6b49d04...

After looking at that default linker script dump, I noticed it didn't have the stuff I saw in the ARM linker script that I'm used to. So I attached one for a Silicon Labs EFM32 part. It's below the x86-64 linker script and is a little easier to read.

After a little investigation[1][2], it seems things get weird with an OS, as would be expected with virtual memory, et al.

[1] - https://stackoverflow.com/questions/16360620/find-out-whethe...

[2] - http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in...

Except for the fact that the arrays you are talking about are still put on the stack.

Apologies if unclear, that is what I meant be stack-based. Article implies that you cannot create a dynamically sized object on the stack.

Wait, is #3 for real? If that's the case it seems like a huge oversight.

Yeah, of the four things listed, I think it's the one that would trip me up the most. I think the safety net here would be dead code elimination: `unused()` should be detected as never called and removed during compilation/transpilation so that this wouldn't be an issue.

I am sort of surprised that `unused` is not picked up by the garbage collector in the first place though. Since JS functions are objects, shouldn't it detect an object that's not referenced during the mark and sweep?

In general, I really hate having to debug memory leaks in JS or Python. The interpreter for both will randomly allocate additional memory as it runs, so using tools like Valgrind is next to impossible. The only reliable method I've found is to pepper my code with logging statements that show what the current memory usage is, run the code like 1000-10,000 times, and see the points between which the memory usage goes up without coming down on a consistent basis. Python's built in `gc` module seems nearly useless for determining what's actually stuck in memory, and having a billion libraries that can have their own memory leaks is also not fun. These are the times I miss C: when you leak memory in C you know it because it becomes painful fast and it's usually easy to find, if your code is sane.

The fix is to implement the closures properly, by only closing over individual variable slots. It looks like the engines are implementing closures by closing over entire windows of slots--that is, if two functions have the same scope, they inherit the _union_ of the variables they reference as a single window/block of variable slots.

The original article has a much simpler explanation and solution: https://blog.meteor.com/an-interesting-kind-of-javascript-me...

To learn more about closures than you ever thought possible, try reading this paper describing how closures are implemented in Lua: http://www.cs.tufts.edu/~nr/cs257/archive/roberto-ierusalims...

I haven't done JS in a while, but it sounds to me like it is referenced, just not in source code. `someMethod` references it implicitly.

Is my understanding correct?

Yes. The function's frame contains a reference to the object storing local variables of the parent frame. To do better you need to store a list of variables referenced (which you can only do if there's neither direct-eval nor a with statement).

> In general, I really hate having to debug memory leaks in JS or Python. The interpreter for both will randomly allocate additional memory as it runs, so using tools like Valgrind is next to impossible.

I don't know about Python, but for JavaScript can't you use the built-in devtools? Maybe try grabbing a heap snapshot, or recording a record allocation profile and go from there?

Yeah. I thought so too (and still think so).

SO post: https://stackoverflow.com/questions/19798803/how-javascript-...

Chrome bug report: http://crbug.com/315190

Meteor blog (linked in article): https://blog.meteor.com/an-interesting-kind-of-javascript-me...

Live example (will crash due to memory leak): https://s3.amazonaws.com/chromebugs/memory.html


The reason this exists in all JS engines is for performance; it's easier to have on context record instead of several.

Other languages do not do this. Off the top of my head: Lua, Java, Scala

Yeah, there's a nice link in the comments on the chrome bug on how Lua does it with upvalues: https://bugs.chromium.org/p/chromium/issues/detail?id=315190...

> Live example (will crash due to memory leak): https://s3.amazonaws.com/chromebugs/memory.html

Happy to report that no crash in Nightly.

Edit: No crash in Edge either.

It'll crash Chrome because it puts stricter limits on JS memory. (Or something.)

Firefox and Edge won't crash, but you'll be using 3GB+.

Yup Firefox was at 3.6GB and edge at 2.8GB.

Would it be better to have to explicitly declare what variables you want to import into the closure like PHP or C++ do? C# also captures everything by default and reference which has tripped me up quite a few times.

That might help but it seems like either an implementation or language spec fix is in order. There doesn't seem to be a reason for a function without free variables to turn into a closure at all, thus preventing the issue.


Currently, the ECMAScript specification says nothing about GC.

And it seems every major JS engine has decided that this type of memory leak is okay.

So it's rather unlikely something will change.

It's for real, in V8. Other JS implementations may not have the same problem.

The particular memory issue is new to me (though now I can watch out for it, yay) but I'm not surprised... JS lacking proper lexical scope causes many issues.

Lexical scope is a semantics concern - observable behaviour that is required for correctness. The case under discussion is an implementation concern - there's no necessity for it to leak by creating a linked list of activation records as further analysis could break the chain. The two are not related.

when you say "proper lexical scope" do you mean just block vs function level scoping of variables? If so, i wouldn't say javascript is wrong it's just different.

Who is this article written for? There's a whole section on "What is memory?". If you are optimizing to remove memory leaks I really hope you already know what memory is.

It was helpful for me. I don't have a CS background and I have been coding javascript for around 4 years now. There are a ton of other devs who don't have a proper knowledge of basics. If you are well versed then you can skip over to the next section.

In my experience there are (I don't know how many) some programmers who, given how they were taught/learnt, can do productive work but generally don't know computing/programming in the abstract e.g. Memory at the machine level, or type systems.

can do productive work but generally don't know computing/programming in the abstract

e.g. Memory at the machine level,

I think it's the other way around --- their usual level of abstraction is too high to understand such things...

or type systems

...and slightly too low to understand others.

if you program you have some concept of what memory is. I found the review of basic terms helpful in understanding the common JS leaks. YMMV

m with you pal, memory leaks are the last topic you touch when optimizing a function, usually you start lowering the execution time, followed by I/O blocking issues then proceed to server-related issues, network latency and after all that is covered you start looking for "memory leaks" so explaining memory is usually unnecessary at this point, since usually JR's devs are more focused on producing software rather than optimizing it.

IMHO memory leaks are important only on embedded software because you start there with a really low memory available for your software to run.

It's also very important when building long-running applications (e.g. electron applications).

Jump straight to "The four types of common JavaScript leaks" section:


Oops, it looks like medium removes the hash on load so my copy/paste didn't work. I fixed my link.

The main problem with JS being sluggish (in electron apps for example) is memory and especially the GC.

I can optimize CPU usage all I want, but only after I optimized for minimum allocations, the tiny, but noticeable lags now and then would disappear.

The average javascript-GC must be really simple/naive compared to seasoned workhorses like the JVM's various GCs.

There I can happily create millions of short-lived objects before getting problems in a single-user application.

Well, you can run JS on the JVM through Oracle's Rhino (now Nashorn), but apparently perf is still largely worse than Sunspider or V8. The language doesn't lend itself to optimization as much as java does for JVM bytecode: https://blogs.oracle.com/nashorn/nashorn-architecture-and-pe...

Probably part of the problem is also the fact that JavaScript is a very dynamic language.

I think even the JVM team would struggle to improve on the state of the art in js vm tech. Their experience in making JVM might not be all that useful in the context of js.

Haven't read such an easy to read technical article in a while. Kudos!

having stuff do stuff for you is useful until it doesnt

I'm really glad that I use a garbage collected language. Unless I'm doing low level programming that requires controlling allocation and freeing, it's amazing. Yes, I still have to understand the basics of memory but I'm just very glad that most of the time, the basics are far more than enough.

I agree. I strongly dislike "magic" in programming. I prefer to call it "denial" because you need to know about the complexities anyway, and "magic" often means sweeping them all under the rug.

Exactly, you need to de-mystify the garbage collector so that it does what you want it to do.

At some point every abstraction above Assemby meets that criteria though. The lines are personal and mostly arbitrary.

Meh, if you try and split hairs, even assembly has magic in it nowdays. Not all instructions take the same amount of time. Some flush caches, thus cause unexpected memory behavior, etc.

However, I think it is fair that most people learn roughly what the side effects are of each line at a local level.

Ironically, this is an argument against many functional languages. There are not side effects of the logic, per se. However, there are massive implementation side effects that are not necessarily easy to reason on.

The saving grace for the vast majority of people is that typically you can get by without knowing all of this. The people that care, do care. But statistically you are not one of them. :)

Like I said - it's mostly arbitrary. ;)

That said I think the issues with assembly you mention aren't magic as such, they're just consequences of the commands. They don't really hide much (if anything) behind the scenes that you'd have access to anyhow.

It's just that CPUs do so much more than they used to.

It's not that simple though. Most modern CPUs that support the x86_64 instruction set don't actually run them as instructions on the hardware. They do all sorts of magic to queue operations, increase pipeline throughput, manage register access, make branch predictions, etc...

You can think of assembly on those cpus as a high level language. It has little correlation with what's actually happening in hardware.

This is EXACTLY the same type of "magic" that is getting complained about above. The real implementation details are hidden and unknown, but the abstraction is useful.

Ah interesting. Not familiar with x_64 really. Mostly 16 & 32 bit experience here.

I worded my post poorly. The "However, I think it's fair" was me agreeing with you. Pretty much completely.

I was just musing on how the arbitrary line is probably not as difficult to see as many other lines we have out there. I think this would fall into "systems languages" and related things.

True, in this context the side effects are not intentionally hidden under pretext that "it works automagically."

It's certainly subjective, but for me, what I pejoratively refer to as "magic" is things that there's just no escaping knowing, yet are abstracted in a way that obfuscates what's going on. Often, it's presented as "it just works" which ends up being a hindrance since there's just no getting around the thing that it's hiding for you.

> To prevent these mistakes from happening, add 'use strict'; at the beginning of your JavaScript files. This enables a stricter mode of parsing JavaScript that prevents accidental global variables.

I don't think using strict will prevent accidental global variables, such as this.var in global scoped function calls. Strictness main goal is to prevent inadvertently misspelled variables from going unnoticed.

Iirc, if you use strict the this gets set to null by default instead of the global object

More extremely junior posts being rated to the front page. Y combinator is changing and i don't like its new junior tutorial level.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact