dinfuehr's comments

dinfuehr · 2024-11-18T16:02:10 1731945730

> The GC has to now move every object remaining in the vector down a step to make the vector dense again

Is there something which forces you to compact everything here? Or could you do what most GCs do and track that free entry in a free list?

aapoalas · 2024-11-18T16:42:27 1731948147

I would have to keep a free list per vector, and there's a lot of those, and it would defeat the point of keeping data packed and temporally colocated: I want to ensure that data was created together stays together and only ever gets more compact as it grows older.

Filling in empty slots would mean that likely unrelated data comes and pollutes the cache for the old data :(

dinfuehr · 2024-11-19T10:09:46 1732010986

I see. Thanks for the answers btw!

I get your point but a few gaps here and there likely don't matter at all for performance. At least it's a lot better than making everything super compact all the time. Assuming you are splitting vectors at some points into chunks: In such a world you could choose to get rid of chunks with a lot of gaps and move the remaining entries into other chunks. At that point you really have a regular GC.

And the free list could be stored in the vector itself. E.g. if an entry is empty it would store the pointer to the next free entry. So all you need is a single head/tail index for each vector.

I also wonder how you handle pointers which could point into one of many vectors. E.g. a field could easily point either to an object or an array. Do you plan to pack this vector id into the 32-bit value? If so wouldn't there be a lot of dispatch like this as well:

if (index & VECTOR_ID_MASK == OBJECTS_VECTOR_ID) { return objects[index&VECTOR_INDEX_MASK]; } else { .. }

I hope it's clear what I mean with this.

aapoalas · 2024-11-19T21:34:37 1732052077

Thank you for the discussion!

A few gaps won't matter, and that to me speaks of a split between major and minor GC making sense. However, I'm not really sold on that meaning a free list making sense. For one, if I split the heap vectors into single value parts, then holding free slot data in any of them will become somewhat complicated. Hence at least for the foreseeable future I'm 100% in on the compacted heap vectors idea :) Time will tell if the aggressive compacting makes sense or not.

Our JavaScript Values are the full 8 bytes (yes, this is large and it pains me, but it does give us all integers on stack, most doubles on stack, and up to 7 bytes of string data on stack), so a field that can point to any kind of object stores a byte tag and a u32 index. I might pack this down to 1+3 bytes or so, at the cost of supporting smaller maximum number of objects in the engine. JS Value itself would still probably remain 8 bytes because of the stack data benefits.

There is indeed dynamic dispatching through match statements, though it generally happens at the specification method level. Eg. A specification method to get a property from an object will match on the tag and then dispatch to a concrete method with the index as parameter. The indexes are typed as well, so from this point on we statically know we're dealing with eg. an Array.

So there is dynamic dispatch yes, but we try to eliminate it at the earliest opportunity. We probably still have more of that than a traditional engine would have though: A traditional engine will keep the tag on the heap and there is some dynamic dispatch done based on that, but at least your data lookup isn't based on dispatch.

dinfuehr · 2024-11-18T15:56:42 1731945402

Would this mean that each shape/structure/map get its own vector for each field in order for this to work?

aapoalas · 2024-11-18T16:34:28 1731947668

That would be one way; it would offer the best theoretical memory layout for well-behaved programs. But, I expect it to be very painful to work with and to come with some performance penalties in mixed use cases due to the extra indirection required.

No, my thinking is that properties would be stored into tables based on their size class: All objects that have 4-7 properties are in the same table, and all of their first property would be in the same slice, second property in another etc.

munificent · 2024-11-18T22:41:25 1731969685

That sounds to me like you'll end up getting little benefit from ECS then. Let's say the JavaScript program is iterating over a hundred thousand instances of some Foo class which happens to have 6 properties. You'd ideally want good use of cache, but if your object vector that has all of the Foo objects also happens to have all sorts of instances of other types that have the same field count mixed in there, then you're going to spend a lot of time skipping over those unrelated objects and refreshing the cache.

I know that ECS is treated as a silver bullet by a lot of people, but my experience is that it really only works well when the data you're working with is statically typed so that you can actually partition into arrays where each array does represent a single meaningful type.

aapoalas · 2024-11-18T23:15:51 1731971751

That is definitely possible: If it turns out to not make sense then I can at least always go back go the current system where all properties of an object as stored in a single array.

It's not as ECS'ssy as one would hope but it's at least proven technology :D