Hacker News new | past | comments | ask | show | jobs | submit login

JS requires somewhat different intuition. A JS value may be a BigInt, String, Function, etc. so you need to have something that can reference any of these.

The usual technique is NaN-boxing, where doubles are stored directly and non-doubles are represented with a NaN. A NaN gives you 51 bits to store to store your data. Usually you have a union type, with some bits for the type tag and the rest for data.

A BigInt may require more than 51 bits. So the obvious representation is to reserve a new tag value, and then in the data bits store a pointer to the heap-allocated structure that you describe. This could be a single array or a chunked linked list.

A further change would be to store small BigInts inline in those 51 bits, e.g. by reserving another tag. This would save a heap allocation at the cost of more branching.




This particular engine, V8, doesn't use NaN-boxing, because of the memory costs on 32-bit systems of using 64-bit values for boxed pointers. Instead, we use pointer tagging, relying that with word alignment, we have 2 bits free at the bottom of pointers.

If the lsb is unset, then the remaining 31 bits represent a "small integer", a.k.a. a Smi (on 64-bit, the top 32 bits make the Smi, so that they can be read out of memory with a normal 32-bit access).

If the lsb is set, then the remaining bits are the pointer, where the bottom bits have to be masked off to actually dereference it (practically, most accesses are offset-based field accesses anyway, so we just subtract 1 from the field offset). The other bit, the 2nd lsb, has recently started being used as a weak pointer marker.

So, the type tag isn't stored in bits of the pointer, but rather in the object being pointed to, not dissimilar to a vtable. This has the side-advantage that we can swap the type of an object in-place, without having to update pointers to it. BigInts are immutable, so they are stored as a heap object which holds the object tag (actually another pointer to a "map", that's a story for another time), a bitfield that stores sign and length, and then data inline (very similar to how we store strings).

(pointer|0x1) -> [BigIntMap, bitfield (length+sign), digit0, digit1, ...]


Thanks for the explanation! If it functions similarly to strings, which are also immutable, when you do any type of operation it causes a reallocation?


I believe so, in this baseline implementation. In fact, that's also true of operations on doubles (non-Smi) in unoptimized code, which are stored as "heap numbers". Avoiding these allocations is one of the things an optimizing compiler can do (and does do, for doubles).

That said, for short-lived BigInts, the allocations may not be as expensive as you might think, since the garbage collector is generational and short-lived objects won't leave the nursery.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: