Hacker News new | past | comments | ask | show | jobs | submit login
Array.shift Optimizations in Firefox's JavaScript Engine (2020) (lannonbr.com)
54 points by melvinroest 9 months ago | hide | past | favorite | 42 comments



While it's clever, it is also counter intuitive. I would expect an array to be 0(n) when inserting from the beginning. If I wanted something fast from both side, I'd use a double ended queue[1]. This, as the author shows, lead to unexpected behavior.

[1] https://en.wikipedia.org/wiki/Double-ended_queue


The problem is, JavaScript doesn't have a rich collections library built in, so the collections that do exist are pushed in all directions. The article you linked actually highlight's JavaScript's array as the double-ended queue implementation:

> Javascript's Array prototype & Perl's arrays have native support for both removing (shift and pop) and adding (unshift and push) elements on both ends.

Unfortunately, trying to predict O-notation bounds in JavaScript is a fool's errand. The spec usually (always?) doesn't mention expected bounds, so it can depend on the runtime and the heuristics the runtime uses to decide what underlying data structure to use. I was surprised to learn that even implementing isEmpty(ob) in Javascript is an O(n) operation on most runtimes.


As far as the spec is concerned, JS arrays are actually JS objects. The keys of JS arrays are technically stringified numbers. Because it is essentially a hashtable, there is no guarantee that the array is contiguous.

It's not commonly done, but you could add something like `"myNonNumberKey"` to your array and things would actually work mostly as expected because the methods in question only work on stringified keys that contain only numbers (numbers like "01" aren't counted and the length property doesn't include any of these non-number keys). I'm not completely sure, but I believe these extra properties prevent some optimizations, so use at your own risk.

When your array has holes, decent performance simply isn't possible, so giving hard O limitations would be mostly meaningless because the required performance characteristics would be so bad. This is where the JIT magic comes into play. They don't actually stringify your keys normally, but if they detect something requires these keys (eg, calling `Object.keys()` on your array), they can work around this at the expense of some performance.

If you use the third function parameter for something like `.map()`, they can handle it, but there are edge cases (eg, what happens when you are halfway through the array then unshift or splice the first item?) that will slow things down.

They could make stronger guarantees about arrays in the spec, but that creates a couple headaches. They would have to basically enshrine a second array specification. That specification would have all kinds of weird edge cases with the current array specification (like that `.unshift()` during a `.map()` issue). Those things are currently offloaded to implementations with the handwaving "you may do things better as long as the user can't tell the difference").

The other issue is that small implementations (eg, QuickJS or Duktape) that have hardware limitations in measured in kb don't have the space to guarantee these advanced features. They are constrained to pick just the one, slow implementation (maybe with a handful of the most important fast optimizations that are easy to implement) so they fit on the MCUs they were designed for.

Finally, there is a record/tuple proposal that could allow a lot stricter standards and better performance guarantees for iterating a long tuple in a way that behaves like an immutable array.


> The spec usually (always?) doesn't mention expected bounds

Actually, I was just reading the other day that there are a few places where the specs do explicitly state expectations, such as performance characsteristics for Map/Set/WeakMap/WeakSet

> Maps must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structure used in this specification is only intended to describe the required observable semantics of Maps. It is not intended to be a viable implementation model.

https://tc39.es/ecma262/multipage/keyed-collections.html#sec...


Bloom filter and gpu


> I was surprised to learn that even implementing isEmpty(ob) in Javascript is an O(n) operation on most runtimes.

That is surprising. Where `n` is the number of members, yes?

Surely that kind of operation should either return false immediately if there are no members, or return true in constant time if there is at least one member?

...although, do you have to iterate through the inherited members and check for `hasOwnProperty()`? Is it maybe O(n) where `n` is the number of enumerable inherited properties?


You would think, right?

The JS runtime itself must have a constant time way to know this, but it’s not exposed anywhere. The best you can do is iterate over the keys and bail the first time you get a key. But (at least in V8) iterating over the keys does some work up front that is O(n), even if you bail at the first key!


I think the issue is that the arrays are allowed to have holes:

    test = new Array(2)
    -> Array [ <2 empty slots> ]

    test[1] = true
    -> true

    test.length
    -> 2

    test[0]
    -> undefined

    test[3]
    -> undefined

    test.some(x => x)
    -> true

    delete test[1]
    -> true

    test.some(x => x)
    -> false


Isn't this kind of negated by the fact that `delete arr[n]` working the way it does has required JavaScript arrays to never have been "real" arrays, in the entire history of the language?


I don’t think so. There’s a little bit more nuance to it (because it involves array “holes”—almost a third kind of null), but delete on an array index is semantically closer to assigning undefined to the index. It doesn’t actually delete the presence of the index.


> It doesn’t actually delete the presence of the index.

It actually removes the index.

    > Object.keys([9,,9])
    [ "0", "2" ]
The value of the `length` property does nothing to imply the presence or lack of any keys. This is covered in the "exotic objects" behavior section in chapter 10 of the ECMAscript spec.


It removes the property key. It does not remove the index. You can see the distinction in the iterator produced by the array prototype’s keys method which, per ECMAScript spec chapter 23, yields the array’s indexes. And they are, in fact, determined by the array’s length.

You can delete every property corresponding to an array’s indexes, and you’ll get the same result.


> It doesn’t actually delete the presence of the index.

Doesn't it? Try:

    a = [ 6, 7, 8, 9, 10 ]
    delete a[2]
    for (x in a) { console.log(x) }


It doesn’t. It deletes the property “2”, but the index 2 remains present as a “hole” (like I mentioned before, almost a third kind of null). That removes it from iteration over the array object’s properties (i.e. your for-in loop), but doesn’t remove the index from the array or its iterable value representation: https://www.typescriptlang.org/play/?#code/MYewdgziA2CmB0w4E...

If you click run there and look at the logged output, you’ll see:

1. The length remains the same after the delete statement

2. The items after index 2 keep their index

3. The for-of loop continues to iterate over index 2 (which now produces an undefined value on read)


> the index 2 remains present as a “hole” (like I mentioned before, almost a third kind of null).

It's an interesting model, and I think it definitely has its uses when reasoning about JS arrays. I'm not yet convinced it's the best model, but it's something I'm going to have to let percolate through my brain for a while to weigh it up properly. Thanks.


I think you're misinterpreting how for-of works. For-of will iterate over deleted keys in the exact same way that it iterates over keys that never existed.

  let a = [];
  a[7] = 7;
  for (let v of a) console.log(v);
  > undefined, undefined, undefined, undefined, undefined, undefined, undefined, 7


> it should be able to run on any JS runtime with comparable performance

How long until every app ships their own wasm js runtime to ensure consistent performance characteristics?


if you are at a stage where this matters you will not be using Array.shift or even Array for that matter. JavaScript has lower level facilities for this.


Do you mean Uint8Array etc and ArrayBuffer and so on? Or something else?


I'm surprised to read that for less than 50K elements it's likely to be O(1). It's a clever trick to just move the memory address one element further :)


It's actually not exactly 50K, and likely varies across Chrome/V8 versions, operating system and/or machine architecture. On my browser the threshold is around 22500 elements. For a quick test, see [1].

The difference in V8 appears to be based on whether or not the array is stored in "large object space" [2], which seems to be used when the object is somewhere between 64kb and 128kb [3] (once again can depend on lots of factors). I wonder how many other optimizations depend on that?

[1]: http://www.lonniebest.com/BadShiftPerformance/ [2]: https://issues.chromium.org/issues/42202676 [3]: https://github.com/danbev/learning-v8/blob/master/notes/heap...


It has been a long time since I have read v8 source code, but I remember it depended on if the array is holey or not, so it would switch back to object based array instead of a contiguous array. (there were other conditions but it's the notable one I remember)


Reading ff source code, I saw something similar with a hole variable. I haven’t spent enough time on what happens though.

A good way to search the ff source code is to look at uniquely named JS functions and take it from there. I searched on reduceRight and found shift because of it.


Some more technical info that browsers nowadays have O(1) behavior when using Array.shift() [1].

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1348772


Disclaimer: when I worked on Internet Explorer in its twilight years I built a fork of BingBot that surveyed JS API usage and delivered a report to a Partner-level SE, whom I was in awe of, that I shared a team-room with.

In my experience, Array.prototype.shift() had limited applications because it mutates the source-array. I gathered it exists because at the time it was specc’d in the late 1990s the other cool kids, bash and perl, both had had args shift and JS didn’t want to feel left-out.

How different would the world be today if Array.prototype.shift instead froze the array and returned a reference to a +1 slice of the array instead? There was a fantastic opportunity to introduce FP and immutability to a whole generation of software people - which we (as an industry) squandered to our detriment. Ultra-new languages like Go and Zig still default to in-place mutable data and it’s maddening!

What hath shell scripts wrought?


Do you think Array.prototype.push() also has limited applications? It too mutates the source array, but must be the most commonly used Array method there is. And the two combined make for a lovely FIFO.

Anyone is certainly free to use Array.prototype.slice() if they want a view.


What are you doing with arrays that mutating them is maddening and has limited applications? Is this some kind of a cult?


> Is this some kind of a cult?

Yes, it’s called Functional Programming.


Hey now. In that cult (of which I’m a member), mutation is definitely maddening but its application is by no means limited (which is a non-small factor in it being maddening).


I'd take mutability for an array as given. If I wanted functional style data structures I'd use them. Personally I see zero reason for immutability of an array, imagine every add/push had to create a new instance of the said array.


I would expect arrays to be immutable and list type not


Sounds like it would go about as well as making me appreciate Shakespeare by forcing me to read it. It didn't work at all.

Do you enjoy reasoning about stale references in react? Now you can experience this joy everywhere with the power of immutability.


I wouldn't say go is exactly ultra new


It is important that such optimization will keep the array's full capacity when naively implemented. For example, calling `Array.shift` 999,000 times to an array with 1,000,000 elements will reduce it to 1,000 elements but its capacity will be at least 1,000 times that. A reasonable implementation would shrink the capacity when some threshold is reached [1].

[1] This threshold should be much lower than the growth threshold in order to prevent an unnecessary copying when the number of elements is around the threshold.


I don't think it's so clear-cut that the capacity should be decreased in that case, because it not only destroys the theoretical complexity of algorithms using it, but also because in practice there can be cases where an array gets increased to 1000000 elements, then it gets consumed to 0 elements, after which it again is increased to 1000000 elements for the next round of operations. Converting the array to be backed by a smaller allocation is not free.

Indeed, which case is even more common: large arrays getting smaller and then kept that way, or large arrays getting smaller to grow again later?

I suppose some more complicated metric (e.g. keeping track of reallocations per array) could help with the most common cases.


You have a good point. Unless JS exposes some API to directly control the capacity, we have to guess which use cases are more common and which use cases are fine to ignore. On the other hands though, JS's lack of capacity control means that developers are not conscious about the memory usage after all and JS engines have to do something about that anyway. Indeed, if you look at the actual bug [1], one commentator does mention exactly what I have said:

> This is a great idea! Both shift() and pop() can treat the elements array as a dequeue. A "shrink capacity to 1/2 when usage is <1/3" would give us O(1) amortized performance for shift() and pop(), while avoid edge cases where we keep shrinking/growing for alternating add/remove ops around the shrink/grow boundary.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1348772


I don't quite understand how it avoids the case for code that pushes lots of stuff to a vector, then shifts its all out, and then does this again—which, to me, doesn't seem like an edge-case at all.

Do any languages automatically reduce vector size when elements are removed from it? If this is the first one, then arguably it would be a bit surprising behaviour.


Decreasing capacity wouldn't affect the theoretical complexity of algorithms using it - same concept as increasing capacity during push still being amortized O(1)


I don't think it works that way when a container can grow and shrink, because an algorithm can trigger those operations any number of times, not just O(log(n)) times.


I'm not sure where you're getting O(logn) from here. If it couldn't work that way when a container can grow and shrink, then arrays wouldn't be able to have both amortized O(1) push and pop as they do in most relevant implementations.

As long as the expensive O(n) resize operation happens at some frequency relative to the size of the container rather than every constant size difference (i.e. capacity doubles when length == capacity and halves when length == capacity/2) then it will amortize out to O(1).

There's a proof available in section 2.1.2 of this textbook: https://opendatastructures.org/ods-java/2_1_ArrayStack_Fast_...


I disagree entirely on the 'reasonable' implementation part. If something was that big the memory was already allocated, and it's not unlikely to grow again. Shifting memory/resources within a lifespan of a particular instance (likely used as a queue) to another resource (and staying unchanged afterwards) is a rare occurrence in practice.


Some other languages also perform this optimization, for example Ruby and Crystal.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: