Hacker News new | past | comments | ask | show | jobs | submit login
How JavaScript works: inside the V8 engine (sessionstack.com)
481 points by zlatkov on Aug 21, 2017 | hide | past | web | favorite | 110 comments



> How to write optimized JavaScript

This is all sensible advice if you're interested in writing fast-enough code, but I do find there's a lack of material for people who want to write fast Javascript. Pretty much the only thing I've found is the post by Oz[1], though I really don't want to have to compile Chrome.

For an example, I have a method in Javascript that does a mix of (integer) arithmetic and typed array accesses; no object creation or other baggage. I want it to go faster, and with effort I managed to speed it up a factor of 5. One of the things that helped was inlining the {data, width, height} fields of an ImageData object; just moving them to locals dropped time by ~40%.

Yet after all this effort, mostly based on educated guesses since Chrome's tooling doesn't expose the underlying JIT code for analysis, the code is still suboptimal. There's a pair of `if`s that if I swap their order, that part of the code allocates. How do people deal with these issues? A large fraction of this code is still allocating, and I haven't a clue where or why.

Perhaps I'm asking too much from a language without value types (srsly tho, every fast language has value types), but what I want is clearly possible: ASM.js does it! I don't really want to handwrite that, though.

[1]: https://www.html5rocks.com/en/tutorials/performance/mystery/ (E: This link is actually written from a different perspective than the one I read, but the content is the same.)


I have some experience with this stuff, and my go-to advice is:

  - Never construct or mutate objects in a loop or recursive call.
  - Preallocate the max of what you'll need at load time, then index into it.
  - Prefer typed arrays over objects.
  - Don't use strings. Work with numbers.
  - Remove function calls.
  - Do not call a function with differently shaped arguments.
  - If the JS engine can't prove a number is a 32 bit int, you'll take a large perf hit.
  - Break out your algo into a worker. Not only does it keep things responsive, but it might actually be faster due to less GC pressure.
  - The web stack is full of frightening gotchas that you'll discover. Recieving large data on a WebSocket can halt your DOM rendering. Leaking a Promise callback can completely blow a function into interpreter mode. And other fun times.
Finally: If you're hand optimizing and not getting sufficient results on your preferred browser, accept that you're screwed. The next browser will break your assumptions. Rethink your algorithm and data flow, or move it elsewhere, like a worker or server.

I do agree the tooling for this stuff isn't great, but I found Chrome's line-by-line perf counts to be excellent hints about where you could do better. It's often surprising but obvious in retrospect.


A lot of this exact advice was applicable when I was first learning C and C++ in college. Some of it was enforced by certain compilers' errors even. Interesting how the more things change, the more they stay the same.


A lot of this likely has to do with the fact that the underlying components of v8 are written in C++.


I agree with a lot of these bullets just as general design principles, and emphatic +1 to not hand-optimizing.

Just noticed this comment after posting https://news.ycombinator.com/item?id=15069116, which is specific to V8 but which I'd guess holds for other JS engines too. Relevant excerpt:

""" ...in just about every case, the real answer is "it depends". That's why V8 is ~1,000,000 lines of code, not 1,000.

Please don't try to follow some rule blindly, much less derive. More often than not, when you try to hand-tune for the last bit of performance, you'll actually trigger something that was introduced to streamline the other [95% of] cases/JS devs' code, and then you've spent hours on making your code less readable with no benefit.

There is no "Try This One Crazy Trick To Make Your JavaScript Fast!!!" (or even ten!) """

(And for the specific case of preallocation vs. growing as you go, right now, in most cases, the difference is negligible. In the others, it depends.)


I've searched for advice once too. At work I'm maintaining a compiler from C# to Java and JS and while the code emitted will by definition never do some things that are slow (e.g. call functions with differently shaped arguments or change the shape of objects) I'm in a far better position to implement other things that improve performance, if I knew what I could do. Of course, avoiding allocations helps other languages too and general algorithm changes should be done by a human, but things at the expression level that only change performance but not semantics could be easily applied over a whole codebase.

A large change we made recently was to extract private methods from classes and have them as normal functions in scope instead of looking them up from an object, but that was also mostly guesswork since "JS then no longer has to do a lookup for the function but instead can call it directly". In the end I still don't know whether or how much that changed, as ot coincided with a major library version change and we're faster than the old version overall.

Generally catering to not only Chrome, but also Firefox and Edge makes such things generally difficult as well, without a bunch of solid performance tests in place to catch regressions on certain browsers.

Ideas I had, but not yet tested/implemented:

· Inline compile-time constants so instead of having some.namespace.Type.CONSTANT, we'd get 25 in-place.

· Merge multiple boolean fields in a class into a single int and access them via bitwise operations from the getter. Might mostly help in how much fits into the cache.

· Append |0 to all sub-expressions of type int instead of only the outermost one. May help the JIT to not even emit code to check types in between.

· Using typed arrays instead of normal arrays if we have things like int[] or double[]. Needs a polyfill for IE 9 then.


What this tells me is don't bother to try to write performant code in javascript, which is scary!


Not what I said. I said to avoid /hand-optimizing/ code.

Trying to write performant code is good. Some examples of advice on that front: - use an appropriate algorithm, and make sure the logic is free of bugs. For example, accidental out-of-bounds array accesses can really hurt performance. - avoid needless computations. Don't do something (complex) twice if doing it once is enough.

If this sounds like "use common sense/generally good design principles", that's because it is (and hence the agreement with many of parent post's points that are simply and straightforwardly good design principles).

Let's take the example of "remove function calls". It's true that each call costs a few instructions, but unless the function you're calling is tiny, that overhead won't be measurable. If it makes sense for readability/maintainability/testability to split your code into functions, then by all means do it!

Another example is the "trick" to write "for (var i = 0, len = array.length; i < len; i++)" instead of the simpler "for (var i = 0; i < array.length; i++)". As http://mrale.ph/blog/2014/12/24/array-length-caching.html (from the original V8 team) explains in great detail, what seems like a no-brainer can actually do the opposite of what you'd expect --- and at the same time, won't make any difference whatsoever in most real code (i.e. aside from microbenchmarks).

Yet another example is https://www.youtube.com/watch?v=UJPdhx5zTaw. You can spend all day on it, but at the end of the day, the best thing to do is code correctly and sanely, not hand-optimize the bejeezus out of the thing (mostly yourself, in many cases).


Only we've seen what happens when everyone follows this: you get to the state that we're in today, where everything sucks but everyone's OK with it.


In V8, objects are actually arrays (sort of, they have an indexed store) - so doing `this[0]` is about as fast as doing it with an array (certain array things like for... of iteration are optimized though - but only if you haven't changed the prototype).


In case it wasn't clear, when I said prefer typed arrays over objects, I literally meant Typed Arrays [1].

These are not like arrays _or_ objects. They're faster than either, since there are less corner cases and type ambiguities.

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...


Is that definitely true? Some time ago I was tuning code that did lots of math on a big array of floats, and I found that using Float32Arrays was moderately but measurably slower than just using [].

This may have been before TurboFan, though.


You're right. In V8, values from a Float32Array are always converted to doubles upon access, even in cases like adding values from two Float32Arrays and storing them in another Float32Array where the result is provably the same. That costs a handful of cycles for each element.

Mozilla has a blog post about doing efficient float32 math: https://blog.mozilla.org/javascript/2013/11/07/efficient-flo...

V8 bug: https://bugs.chromium.org/p/v8/issues/detail?id=5541

Using a Float64Array is faster at the expense of memory usage. (I've never seen a case where the code generated by V8 bottlenecks on memory bandwidth fwiw.)


Thanks for the details, very useful!

I don't suppose you know if similar things are true for Int arrays? I think I once had a similar case there, where I was doing integer operations on a large set of ints, and regular arrays were faster than Int32. But it's hard to imagine there as well what the slowdown would be.


Good question. I haven't looked at that case. V8 can detect and optimize small ints (smi, fit in uint32, i.e. all types of typedarrays), but it might still take the easy/slow path and convert to doubles. Relatively easy to look at with node.js and irhydra.


In a few different cases (one node, one browser-based) I've had medium-sized read-only datasets of a couple million objects where I've tried to conserve memory by using typed arrays. While it saved a good amount of memory, property access was significantly faster via objects than typed views. I'm curious to re-run the benchmarks with Turbofan though, maybe things have changed.


Indeed, it heavily depends on what your JS engine thinks your data types are, your allocation patterns, and more. If your engine thinks you're using floats to index into a typed array that might be slower than using floats to index into an object. Even if you know your numbers are really only integers. Which is why JS performance is so puzzling sometimes.

One thing you can do with typed arrays that you cannot do with objects, though, is zero-copy into workers. You can use that technique to get pretty much free parallel performance on any number-crunchy task.

These things change all the time though; all the more reason for always measuring before you cut and investing into better profiling tools.


This is incorrect. Arrays are implemented as a type of object, but you shouldn't worry about that. Hashmaps and arrays both have ~O(1) random indexing. If you need an array, don't use an object.


Objects have a length overhead. At Bluebird we used objects and used a bitmask on the length so it's fast. (We use less slots than an array actually)

I was just pointing out that Objects weren't slow for indexed access.


> - Do not call a function with differently shaped arguments.

If you don't mind me asking, what did you mean by this?


This has to do with monomorphic versus polymorphic and megamorphic functions. Basically, the inline cache has finite capacity, and if every call of a given function takes objects of the same shape (they share a hidden class), then you don't need to worry about evicting your cache entries.

Once you start passing in objects of different shapes though, you're going to exceed the inline cache's capacity, and start losing out on the massive speedups the cache gains you.

A function that takes one inline cache entry is monomorphic, more than one is polymorphic, and more than the inline cache capacity is megamorphic. You want as many functions as possible to be monomorphic, polymorphic if you can't help it, and never megamorphic.

This post gets into a lot more detail: http://mrale.ph/blog/2015/01/11/whats-up-with-monomorphism.h...


If you call a function with a different number or type of arguments, or an object that has different keys, or even keys that you added in a different order, bets are off and the optimizer might not be able to reuse previously optimized versions of the function. And if you do this several times, the engine might give up optimizing your function entirely.


Worth noting that this is valid advice for nearly everything with a JIT (and many languages with generics without a JIT, since doing so likely bloats your dispatch logic, whether it's dynamic or built at compile-time).


> - Prefer typed arrays over objects.

Could you explain this (or provide a link to a resource that does)? I feel that this could be substantially useful but I have no idea how I'd leverage typed arrays over objects off the cuff.


README> Turbolizer is a HTML-based tool that visualizes optimized code along the various phases of Turbofan's optimization pipeline, allowing easy navigation between source code, Turbofan IR graphs, scheduled IR nodes and generated assembly code.

https://github.com/v8/v8/tree/master/tools/turbolizer is like IRHydra, but for Turbofan.

    svn export https://github.com/v8/v8/trunk/tools/turbolizer turbolizer
    cd turbolizer
    python -m SimpleHTTPServer 8000

    chromium-browser http://0.0.0.0:8000/

    chromium-browser --no-sandbox --js-flags="--allow-natives-syntax --code-comments --trace-turbo" testfile.html

    mkdir out
    cd out
    node --allow-natives-syntax --code-comments --trace-turbo ../testfile.js

    cat testfile.js
    function add(a,b) { return a + b; }
    add(23, 44);
    add(2, 88);
    %OptimizeFunctionOnNextCall(add);
    add(2, 7);

    ls out
    ... turbo-add-0.json ...


The current usefulness of turbolizer seems... mixed.

Turbofan generated code is highly variable. Changes in argument types and optimization order can result in very different code being emitted.

It's not like "all these cases were well optimized, but this one case wasn't, so let's fix/avoid that one". Instead, it's "well, that's a diverse mess of results, hmm...".

Perhaps that will change. One project objective is apparently optimization with fewer performance cliffs. But another objective is a compiler which better serves low-end mobile devices - thus raising the bar for what optimizations are worth spending time on. There's a tension between those two.

Also, CPU branch prediction makes interpreting the resulting assembly more interesting. Turbofan output has a lot of deopt guards. Tests that the assumptions used when writing the optimized code are still true. But say one bit of javascript optimizes to code with more guards than another. Do they matter? Maybe not, if the guards end up off mainline and hit cache.

So turbolizer seems helpful if you have a some red-hot performance chokepoint. But at present, perhaps less so for providing more general guidance.


> is like IRHydra

It does 'IR' part of the IRHydra, but it does not give you another important feature - ability to see how your function evolved over time and figure out reasons for it. Loading files one by one in search for the right version is pretty cumbersome.

I was always hoping that V8 will eventually get proper Dev Tools integration for compiler / runtime internals - and I think things have finally started happening in that area.


> cumbersome

Yes, turbolizer currently lacks some of IRHydra/2's mature featurefulness. No pretty "you're deopt is here". I've no idea whether Google would welcome contributions or not.

> I was always hoping that V8 will eventually get proper Dev Tools integration for compiler / runtime internals

Wistful sigh.


You can ask V8 to dump the assembly or intermediate representation or you can just run it with --trace-deopt and it'll tell you most of the time (generally) what's slowing your code down.

> I'm asking too much from a language without value types

JavaScript has value types, what it doesn't have is user defined value types. BigInts are coming (stage 3 already!) though.


To an extent I feel like there is a dividing line in the type of optimisations that I should aim for.

I'd aim for Typed arrays and avoiding intermediary object creation, because those reduce the intrinsic work load (especially for things like float32 where you are giving a nod to the system to allow slightly worse results)

Aything where the optimiser should be able to figure out what to do I'm much more hesitant. Inining variables is something that I don't feel should be neccessary.

Anything that is just massaging the optimiser into making the right desicion runs the risk of making it worse on a different engine. That different engine may even be a later version of your current engine.


Vyacheslav Egorov wrote an article[0] on monomorphic function calls and how they impact JavaScript performance. While it's very in-depth, it is over 2 years old at this point, so I'm not sure how well it applies to the current version of V8.

[0] http://mrale.ph/blog/2015/01/11/whats-up-with-monomorphism.h...


Hello @metrognome, three of your last four posts were [dead] even though none of them appear to break the HN guidelines... I vouched for the last two, the other one is too old for me as user to do anything about it.

You may want to contact the mods about this.


I don't think there are any really solid approaches here except to use IRHydra [0], which lets you decompile v8's optimized version of a function into an assembly-like representation. But I'm pretty skeptical of whether anyone but v8 engineers can really make use of it.

I think the only workable approach is what you're doing - have an approximate idea of how the compiler works and make intelligent guesses about what will help. But when you're tuning at the level where changing the order of if statements makes a difference, you're at the level where next month's update may reverse your fix, so in a lot of cases it'll be questionable whether it's worth the effort.

[0] IRHydra - http://mrale.ph/irhydra/2/


Ignition/Turbofan has a new IR - IRHydra 1 and 2 won't help. For Turbofan there's https://github.com/v8/v8/tree/master/tools/turbolizer .


You can still do

    d8 --trace-turbo                        \
       --trace-deopt                        \
       --code-comments                      \
       --redirect-code-traces               \
       --redirect-code-traces-to=code.asm   \
       --print-opt-code                     \
       your-file.js
and load resulting turbo.cfg and code.asm into IRHydra.

However most of the important features (like source and deopt-to-ir mapping) are broken.

I used to repair things in my free time - but I don't think I want to do it anymore.


> I used to repair things in my free time

Oh! Thank you for all your work.

I noticed my current node wasn't dumping .cfg's, and assumed that data path had died with TF.

> but I don't think I want to do it anymore.

Nod. Programming languages - civilization critical infrastructure funded like kindergarten bake sales.


Is the IRHydra is not working with the latest V8 pipeline (Ignition + Turbofan), AFAIK: https://github.com/mraleph/irhydra/issues/52#issuecomment-25...


I find the Steve Souders' books [1] "High Performance Web Sites" and "Even Faster Web Sites" methods are still fairly practical, not all of it applicable to JS directly, but still.

In the end, you'll need to test your changes in the most common browsers that affect you, because each browser's JS engine can vary a lot. The lookup cost you mention can be deceptively expensive, especially if you have anything resembling a deeper inheritance tree, or worse, proxies, getters/setters etc.

In the end, a lot of it doesn't matter, until it does.

[1] https://www.amazon.com/s/?keywords=steve+souders


The problem however is that JS optimization is very sensitive, depending on the browser vendor, version, and execution context, optimization advice can change dramatically.

Back when I worked on a perf sensitive JS codebase, we had a perf test suite, and depending on the details, often times locals were faster, but sometimes objects, and on a different browser it was arrays. And if you accident hit an ask.js opt it could be typed arrays, but mostly typed arrays were significantly slower. And when a new version came out this could all invert.


> There's a pair of `if`s that if I swap their order, that part of the code allocates. How do people deal with these issues? A large fraction of this code is still allocating, and I haven't a clue where or why.

Are you sure that allocation is a problem? Nursery allocation is extremely fast: it's just one overflow check and a pointer bump. Usually it's around 3-6 instructions.

Allocation in modern generational GCs is in a completely different performance category than C-style malloc is.


Pointer-bump allocations are fast in isolation, but if your code is doing one it means the compiler has gotten itself confused, and every time it gets confused it makes it harder for later code to be optimized. Compilers only really reason reasonably about value types (and even then you should never assume they're smart[1]).

That said, I don't actually know allocation is the problem; it's a suspect, but unlike a native script I can't just run `perf` to get performance counters and assembly listings. All I really know is that the non-allocating order is a lot faster.

[1]: If you can read assembly, this one's hilarious: https://godbolt.org/g/81qWoq


>[1]: If you can read assembly, this one's hilarious: https://godbolt.org/g/81qWoq

I only barely read assembly, so I was just trying to figure this one out.

So, the compiler has figured out that it just needs to test v against 0 (since the array is immutable), but it does this 2 times, and with a couple of apparently redundant 'je' instructions. Is that what's weird about the output?

In case it changed, this is the output I saw:

  example::exists_in_table:
    push rbp
    mov rbp, rsp
    mov al, 1
    test edi, edi
    je .LBB0_5
    je .LBB0_5
    test edi, edi
    je .LBB0_5
    je .LBB0_5
    xor eax, eax
  .LBB0_5:
    pop rbp
    ret


Basically, yes. The first test and je show the compiler has managed the hard part, but the next je obviously can't be hit, the test after that literally doesn't do anything, and the two jumps after that also can't be hit. It's funny because this is all so trivial.

If I write it as some C

    bool should_jump = edi == 0;
    if (should_jump) goto LBB0_5;
    if (should_jump) goto LBB0_5;
    bool should_jump = edi == 0;
    if (should_jump) goto LBB0_5;
    if (should_jump) goto LBB0_5;
I can guarantee the compiler is going to produce good code from it.


This looks like yet-another-phase-ordering-issue in LLVM. I hit so many of these it's not even funny. Arguably one could find C code that shows the exact problem under clang. Example of optimized LLVM IR for that sample (press LLVM IR on https://play.rust-lang.org/?gist=b60ec85886eff967b21c7abc899...):

    br i1 false, label %bb9, label %bb6
SimplifyCFG would instantly kill that. But it doesn't get to run again after whatever simplified the branch condition. I've heard rumors that LLVM is working on a new pass manager, maybe they will eventually fix this pervasive issue.


Apologies for prodding a sore spot! The phase ordering problem is a well known demon for compilers; makes me wonder why I've not heard anything about unphased compilation.


No worries, I'm just surprised you managed to find a simple testcase that exhibits it, so, really, thank you!

I've opened a (a bit vague for now) issue so at least I don't forget about it: https://github.com/rust-lang/rust/issues/44041

As for phase ordering - someone did show me recently VSDG, which claims to solve at least part of the problem (specifically, https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-705.pdf and http://sro.sussex.ac.uk/7576/1/Stanier%2C_James.pdf).


Awesome references, seems I have some fun reading :). (Trivia: Alan Mycroft actually supervised me for my undergraduate dissertation.)


You can at least use the browser profiler tools.


Browser profiling tools are great but their granularity stops at the function level - they can't tell you anything about what happens within a given function, just how that function compares to others.


I haven't used them much, specially since I tend to choose native projects if possible.

Thanks for the heads up.


I do! It's missing a lot of detail, though; it's optimized much more for overall performance analysis.


> Allocation in modern generational GCs is in a completely different performance category than C-style malloc is.

Yes and no. If enough allocation happens that a GC pass occurs while you're still running your JS hot loop you're pretty much screwed and all your performance vanishes. If you want to see this in the extreme go open an Android systrace or Chrome chrome://tracing trace result and do a JS profile while you do so. The single largest consumer of JS CPU time is the GC at nearly 13%.

The problem with modern generation GC langauges is they tend to be designed assuming that allocation is always cheap/free and it just isn't. That's a fairytale. It's just not true.

Also just because it only takes 6-13ns for an allocation to be given to you doesn't mean anything when the first access of that allocation is a cache line miss compared to if you did the same thing in C and it was on the stack, which is likely sitting in hot L1 primed and ready to go. Reusing the same memory address range is critical to getting maximum performance, and modern generational GCs are miserably bad at that.


> Yes and no. If enough allocation happens that a GC pass occurs while you're still running your JS hot loop you're pretty much screwed and all your performance vanishes.

GCs are expensive relative to not doing anything, but minor GCs are very fast.

> The problem with modern generation GC langauges is they tend to be designed assuming that allocation is always cheap/free and it just isn't. That's a fairytale. It's just not true.

I'm not making an argument about language design here. Obviously value types are a good thing.

> Reusing the same memory address range is critical to getting maximum performance, and modern generational GCs are miserably bad at that.

Not at all. The nursery is usually in cache, just as the stack is. Nurseries are usually implemented as two space copying collectors with small spaces, which are excellent for cache locality (arguably even better than stacks, because stacks can grow deep).


> Not at all. The nursery is usually in cache, just as the stack is. Nurseries are usually implemented as two space copying collectors with small spaces, which are excellent for cache locality (arguably even better than stacks, because stacks can grow deep).

I think you're still working under the assumption that the GC is keeping up with the allocation rate, which if you have a small allocation in a hot loop will largely not be true. Typically GC'd languages rely on escape analysis to handle this and not the generational GC at all, but if that fails then you're SOL because the generational GC is unable to keep up and unable to keep things in the fast path.


> Typically GC'd languages rely on escape analysis to handle this and not the generational GC at all

No, they don't. Most GC'd languages rely on generational GC, because escape analysis on its own doesn't directly provide a lot of performance benefits once you have a generational GC.

> if that fails then you're SOL because the generational GC is unable to keep up and unable to keep things in the fast path.

No, you aren't. Cleaning up dead objects (for example, temporaries created in a hot loop) in a nursery is extremely fast. The entire nursery semispace is typically in L1.


    The entire nursery semispace is typically in L1.
L1 is only going to be about 64k per core (eg, Kaby Lake). That'll have your stack, your working set, and a tiny bit of your nursery in it... as long as you don't get associativity problems.

If you read a bunch of data before making an object, I could see you easily evicting your nursery from L1 into L2, but then you only have about 4x as much space.


Not GP, but as someone who has worked with image manipulation and typed arrays in JS (which seems similar to what the GP was doing) I can say that this often boils down to linearly looping through a typed array and doing very simple operations. You know, one of those "best case" scenarios for cpu utilisation where memory access tends to be the bottleneck. If the allocation happens inside a tight loop like that I can imagine it ruining performance


Do these same optimisations work well on non-V8 browsers as well, or have we completely given up on avoiding the browser monoculture again?


I have been wondering for a while now if there's a terse / "embed some essentially-C code" style asm.js tool. Being able to drop into an optimizer for math / large data purposes, without going full emscripten, seems like it could be extremely useful.

That said - have you tried using emscripten on a C version of your code? Any better / worse?


Yes, it would be nice to just have some config flags to enable it on the debugger.


If you're into V8 internals, I'd recommend watching these talks by Franzsika Hinkelmann, a V8 engineer at Google: https://www.youtube.com/watch?v=1kAkGWJZ6Zo , https://www.youtube.com/watch?v=B9igDWV5ZUg and https://www.youtube.com/watch?v=p-iiEDtpy6I&t=606s

She's also recently started blogging at https://medium.com/@fhinkel


I was wondering, given that 90% of the javascript in browsers must be standard libraries (jquery, bootstrap & co) wouldn't it make sense for google to hash the source code for every published version of these libraries, compile those statically using full optimisation and ship the binaries as part of their updates to the browser, so that you only have to compile the idiosyncratic part of the code?


The fundamental problem is that interprocedural optimizations break that model. Optimizations that inline e.g. jQuery code into callers are very important. You end up needing to do optimization on jQuery either way.

Additionally, the code that JavaScript JITs generate is not position independent, and it contains no linkage information (symbol tables, etc.). It contains raw addresses of various browser objects within it. You would need to introduce some sort of linker for that to work, which is a lot of complexity.


The js landscape changes really fast, they could probably never keep up.


doesn't mean it can't be done


It could be done, yes, but you'd have to weigh the benefits of caching against the performance downsides of linking and IR serialization. I wouldn't be surprised if it ends up not being worth it in practice.


What are you suggesting "compile those statically using full optimization" would mean in this context?

The point of the article is that v8's optimizations work by watching the code execute. It can inline functions because it saw where they got called from; it can turn property accesses into fixed-offset lookups because it saw what object the accesses happened to, and so on.

(Unless you're talking about asm.js-style optimizations, but then the answer is presumably to use asm.js...)


Most of what jQuery does could be moved into standard browser APIs and then presumably implemented in native code. I feel that implementing it that way would certainly be significantly faster, because any performance loss from impossibility of trace-directed optimizations is outweighted by significant reduction of amount of crossings between native and JS code with completely different memory model. The fact that VM/native transition is expensive is exactly the reason why things like WebAssembly cannot directly manipulate DOM.

But this could not be implemented by writing C++ class that gets exported to JS code as $ when something that looks like jquery is loaded. In JS you can load jQuery and then monkey patch it, which will break when you use this kind of "transparent" optimization.


Most of what jQuery does could be moved into standard browser APIs and then presumably implemented in native code.

It pretty much has been. CSS syntax for selection lead to querySelectorAll, simpler AJAX calls to fetch, DOM manipulation functions were ported into the DOM spec wholesale.


I don't understand well enough dynamic language compilations to answer that but in principle, having 1 billion devices consuming CPU cycles to JIT recompile the same javascript paths over and over seems horribly wasteful.


> recompile the same javascript paths over and over

The person you are replying to's point is that they aren't the same paths, when you take into account the other code and data in the rest of the application, which effects how the code even in the shared libraries is compiled.


They are mostly the same path, like if I select something based on an ID to add a function as an event handler, it is the same path, just with a different argument.

I don't know, I am outside of my area of confidence here. But in a world where the technological advances in computing have less to do with increasing computing capacity than reducing power consumption, whether on mobile, laptops or servers, the jquery library that needs recompiling on every execution feels like the non-recyclable plastic bag of computing.


> like if I select something based on an ID to add a function as an event handler

Nah it's more complicated than that in reality (I work on dynamic languages and have a PhD in them). You just want to select based on an ID, but in order to make that operator fast, JavaScript would normally pass information about what you want to select, what possible values it could have, whether it's allocated on the heap or you somehow stored it somewhere else, and a whole host of other things.

You could say 'well don't do those optimisations then' but then you just have slow unoptimised code and the point was to make this fast.


I feel like this is one of those things that I have been thinking to my self for a little while now. "Why hasn't someone already solved this problem." In my mind there is no reason for my browser to download 26 copies of the same jquerry file.


Here is a proposal to use the hashes from <script> elements' Subresource Integrity attributes to cache JavaScript. There are some concerns about leaking users' browsing history based on resource load times.

https://hillbrad.github.io/sri-addressable-caching/sri-addre...


If [websites] use a common CDN then this can be avoided.

[edited, with thanks to delinka]


If the sites you're visiting use a common CDN then this can be avoided.


This was going to be my comment as well.. I tend to prefer cdnjs for as much as possible, since it seems to be the most diverse CDN in terms of supported libraries.

Though I've been trending away from using most libraries separately and bundling... may need to think about breaking things out again. For the most part, however, most of my bundle sizes are pretty small. React in dev, Preact for prod is the biggest mix... checking bundle size before/after adding a module (some will add a huge dependency tree).

There's no substitution for being judicious... and imho, even if you have to look it up frequently, using the DOM is usually better than loading in jQuery... even if you center around Array.from(document.querySelectorAll('...')) as your main usage.


The order of scripts matters, so it's not really possible except for the first script (similar to C/C++ #include optimizations). Even then, you'd have to deal with the possibility of someone modifying something the scripts create.


Decentraleyes [1] does something like that, although it still fetches them from a common CDN. That said, I'm not sure how well it works: there are many, many different versions of every different library around, and when they're bundled, you don't get to see them separate anymore at all.

[1] https://decentraleyes.org/


Adding an entire new layer of complexity seems like a bad tradeoff. You'd break the debugger. You'd break the security model.


Simply hashing these wont work since the scripts may be minified by different tools. Also, there's lots of different versions of these libraries, so we're talking about a fairly large number of precompiled libraries.


You wouldn't be able to optimise the standard libraries very well without any information about the program calling them and the data they are using, though. I guess it would turn out to be a lot slower in practice.


Maybe ship popular libraries as web assembly?


No, web assembly does not support dynamic typing that is used by many functions in these libraries. Dynamic typing may not be fast, but it is useful.


Right, because wasm is there to solve all our performance problems, isn't it?

And how exactly is your jquery.wasm going to access the DOM? Or expose methods to your application, for that matter?

wasm is not a drop-in replacement for JS.


There are plans for WebAssembly modules to directly access the DOM: https://github.com/WebAssembly/gc/blob/master/proposals/gc/O...


I really don't know much about wasm.

I just wondered though, from a birds eye view, does using wasm effectively give you control of everything within the "frame" of your browser?

Can I get rid of the DOM(boilerplate aside), get rid of CSS?

For example HTMLCanvasElement is a flat surface. I can do what I want within it as a scene graph renderer.

Could I do that with wasm? Could I invent my own 2D entities (x, y, width, height). My own 2D Constraints resize like Apple Devices etc?


It's still in the same sandbox as JavaScript. You don't get any new API's. There's only going to be speedup on CPU-intensive things.


> wasm is not a drop-in replacement for JS

It is not a "drop-in" replacement in any way, but I'm pretty sure it will eventually replace JS completely.


No, but if we're being honest, it's exactly what everyone wants it to be. A bytecode for the web.


Eventually the Dom might be replaced with some webgl renderer.


Could be done, although it would have to be done carefully.


If ES6 used hashes to version libraries this might be possible. I still think you'd have too many versions to keep track of.

In other spaces the pendulum is swinging away from shared libraries since storage space and bandwidth are relatively cheap.

Java is often distributed as "fatjars" these days. IOS and Android apps bundle their dependencies the same way as far as I know. The whole serverless movement is a move away from shared code for better or worse.


I'm thinking it's because they already have mounds of tech debt that would better address long term issues if they had the time to work on it. Take a look at the chromium bug tracker. It's overwhelming.


Interesting article--I'd love to see one just on GC.

I just downloaded the latest node.js sources and v8 still has a call to CollectAllAvailableGarbage in a loop of 2-7 passes. It does this if a much cheaper mark-and-sweep fails. Under production loads, that would occasionally happen. This led to pause-the-world-gc of 3600+ms with v8, which was terrible for our p99 latency.

The fix still feels weird -- we just commented out the fallback strategy and saw much tighter response time variance with no increased memory footprint (RSS).

I never submitted a patch though because although it was successful for our workload, I wasn't sure it was generally appropriate (exposed as a runtime flag) and I left the job before I could do a better job of running it all down.


I've had a similar, but differing issue on a server that ran a lot of one-off node scripts for things ranging from ETL, or queue processing. I found that I wanted to force GC after every, or every N items, because the memory could bloat out a lot before GC would happen and pause for several seconds... or, potentially starving out peer processes.

Fortunately, that was already in the box, though behind a runtime flag.


Is there a way to see what hidden class an object has? For instance, if an array of objects is parsed from json, were all the objects assigned the same hidden class? Alternately, can one obtain statistics about hidden class usage? Seems like this would be very helpful for real world apps, especially given the prevalence of data intensive Electron apps.

EDIT: https://www.npmjs.com/package/v8-natives haveSameMap seems to do exactly this!


> Also, try to avoid pre-allocating large arrays. It’s better to grow as you go.

Is this really true? I've only heard the opposite (preallocate arrays whenever possible) and I know that preallocation was a significant performance improvement on older devices with older javascript engines.


Hm, it might have something to do with this: https://groups.google.com/d/msg/v8-users/Vov_e5GUq3o/eqRT8mA...

> For preallocated arrays, 100,000 just so happens to be the threshold where V8 decides to give you a slow (a.k.a. "dictionary mode") array.


tl;dr: No, this is outdated, and please don't try to hand-optimize.

This is outdated. It was true in 2013, but changed in 2014 (https://codereview.chromium.org/416403002/ ; specific change in elements.cc).

Two tangential cents/Unpopular Puffin: there's a lot of misinformation or half truths that are propagated about V8 in an effort to boil all their optimizations down to One True Set of Advice, in this post and others.

The thing is, in just about every case, the real answer is "it depends". That's why V8 is ~1,000,000 lines of code, not 1,000.

Please don't try to follow some rule blindly, much less derive. More often than not, when you try to hand-tune for the last bit of performance, you'll actually trigger something that was introduced to streamline the other [95% of] cases/JS devs' code, and then you've spent hours on making your code less readable with no benefit.

There is no "Try This One Crazy Trick To Make Your JavaScript Fast!!!" (or even ten!)

(And in this particular case of pre-allocation vs. growing as you go? Usually the difference is so tiny it doesn't matter, and in the other cases --- it depends.)

Disclaimer: I'm not on the V8 team but I'm on a personal basis with some of them, and these are my own thoughts cobbled from casual conversations.


No, this is not true. Static allocation is always faster than dynamic allocation _if_ you know you're going to use it.

I think what they're suggesting here is not to dynamically allocate large arrays - it's to allocate "as you go" (increase when you need to).

For example, bluebird is very allocation averse - but we do actually grow a few things as we go (like the queue) if we didn't allocate enough ahead of time.


Why don't you benchmark it yourself and let us know what you learn. Data > Rumor.


> Now, you would assume that for both p1 and p2 the same hidden classes and transitions would be used. Well, not really. For “p1”, first the property “a” will be added and then the property “b”. For “p2”, however, first “b” is being assigned, followed by “a”. Thus, “p1” and “p2” end up with different hidden classes as a result of the different transition paths. In such cases, it’s much better to initialize dynamic properties in the same order so that the hidden classes can be reused.

Does that mean an object with n properties takes up O(n^2) memory for the class definitions or O(n!) if the classes do not guarantee a property initialization order?


An object with N properties will typically require N hidden classes, forming a chain back to the empty class. However most hidden classes only require a constant amount of memory, so this is O(N) allocation size.

If you initialize the same properties but in a different order, you'll get a new list of N hidden classes. In the worst case you could indeed have O(N!) hidden classes, if you construct with every possible initialization order.


Are you sure of that? If I recall correctly from the source an object with N properties typically requires 1 hidden class (for that object).

In order for it to require 8 it needs to add properties very dynamically. Am I remembering this wrong?


You can see the description of how hidden classes work on the v8 wiki [1]. The short explanation is that the way v8 finds a hidden class is by following a chain of transitions starting at the empty class. Intermediate classes have to be kept around to retain the transitions.

That is, in code like:

   function Point(x, y) {
      this.x = x;
      this.y = y;
    }
v8 retains the hidden class for {x}, so it can find the hidden class for {x, y}.

[1] https://github.com/v8/v8/wiki/Design%20Elements#fast-propert...


For a more comprehensive reference on v8 internals, there is http://wingolog.org/tags/v8, which has been around for a long time. It is even directly referenced from https://developers.google.com/v8/under_the_hood.

I don't think there's anything in this post that wasn't already explained in this reference, except the fact that now there's Ignition and TurboFan, but that doesn't fundamentally change anything.


Does this also mean that in the function that returns an object vs. prototype debate the former wins because object literals presumably require less class transitions?

let Ctor = function(a, b){ this.a = a; this.b = b; }

vs

let obj = (a,b) => ({ a:a, b:b });


Thanks for the post, great writing...




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: