
Adding BigInts to V8 - ingve
https://v8project.blogspot.com/2018/05/bigint.html
======
saagarjha
> That’s exactly how V8 multiplies BigInts: one digit at a time, adding up the
> intermediate results. The algorithm works just as well for 0 to 9 as it does
> for a BigInt’s much bigger digits.

Why not use a faster algorithm, like the Karatsuba algorithm[1] or the Toom-
Cook[2] algorithm?

[1]
[https://en.wikipedia.org/wiki/Karatsuba_algorithm](https://en.wikipedia.org/wiki/Karatsuba_algorithm)

[2]
[https://en.wikipedia.org/wiki/Toom–Cook_multiplication](https://en.wikipedia.org/wiki/Toom–Cook_multiplication)

~~~
tzs
Speaking of Karatsuba, in 1993 I wrote a C++ big integer package and used
Karatsuba. I had some questions about how to use it when the two numbers being
multiplied are not the same length. I did some experiments, and noticed hints
at some patterns in the data I was seeing. I asked about this on sci.math, but
as far as I know no one ever gave an explanation.

If anyone is curious and wants to figure this out, here's my old sci.math post
[1].

[1]
[https://groups.google.com/d/msg/sci.math/MX3MLCQ0zzA/XTqUoZk...](https://groups.google.com/d/msg/sci.math/MX3MLCQ0zzA/XTqUoZk2im0J)

~~~
delinka
"...the two numbers being multiplied are not the same length."

Do you mean as counted in bits? If so, prepend zeroes until they are the same
length. Does this negatively affect the algorithm's results?

EDIT: When I say 'prepend' I mean on the MSB end. In typical arithmetic on
paper, that'd be adding leading zeroes (prepending) onto the numbers.

~~~
taejo
You can do that - it's the "textbook" solution - but you can probably do
better by not making your inputs larger: in the worst case you're going from
multiplying N digits by 1 digit to N digits by N digits.

------
ScottBurson
> we allocate an object in memory. We make it large enough to hold all the
> BigInt’s bits, in a series of chunks, which we call “digits”

In the Lisp world, the halfwords or words or whatever that a bignum's bits are
divided into are called "bigits". At least, this is the term I heard around
MIT in the late 1970s. I always thought it was cute.

Anyway, I'm glad to see this. I think all garbage-collected languages should
provide arbitrary-precision integers as the default integer type, as Lisp has
for decades. This certainly goes for interpreted languages like JavaScript,
Python (which did adopt this policy at some point), Ruby, and even, I would
argue, Java and C#. This, for example, is to me not a bug in the algorithm,
but in the _language_ , as it wouldn't happen in Common Lisp:
[https://research.googleblog.com/2006/06/extra-extra-read-
all...](https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-
nearly.html)

(Bounded integer types are available in Common Lisp, but it would be strange
to use them for the bounds in a binary search algorithm, whose runtime is
almost certain to be dominated by the comparison function. In any case their
use would require a conscious choice on the part of the programmer.)

The sad thing about the way that Java and now JavaScript implement BigInts is
that they're a separate type from ordinary integers, rather than having a
single type that automatically changes between an immediate representation
("fixnums"), for integers that fit into a machine word less a few tag bits,
and an allocated representation ("bignums") for larger integers. The latter
provides the semantic benefits of arbitrary-precision integers at a much
smaller runtime cost than if they're all allocated.

~~~
sddfd
I agree that the approach "everything is a bigint" plus optimizations where a
fixed width int suffices would be nice from the semantic perspective.

However, Javascript has basically the "everything is a floating point number"
approach, and bigints would be hard to fit into that (do you want bigfloat as
well? do you want to reproduce float arithmetic imprecision, or be precise?
Etc.)

However, bigint could internally use just one fixed width number when possible
as an optimization - although the current implementation doesn't seem to do
this.

------
sjrd
I've implemented support in the Scala.js compiler to emit Longs (64-bit signed
integers) as BigInts about 6 weeks ago. [1] Unfortunately, without specific
optimizations that are not part of the baseline currently implemented in V8,
this implementation is _very_ slow. It's up to 60x slower than our production
implementation. [2] I'm looking forward to seeing small BigInts being
optimized as native 64-bit values!

[1] [https://github.com/scala-js/scala-
js/pull/3286/commits/deec9...](https://github.com/scala-js/scala-
js/pull/3286/commits/deec9719033f4ce21331eeca864182f17898831b)

[2] [https://github.com/tc39/proposal-
bigint/issues/117](https://github.com/tc39/proposal-bigint/issues/117)

~~~
Rapzid
I hope that after they collect feedback Int64 shows as a significant enough
use case to prioritize optimization.

------
tixzdk
This is also available in Node 10 behind the `--harmony-bigint` flag. I've
already made two modules with [https://github.com/emilbayes/secure-random-
uniform#bigint-su...](https://github.com/emilbayes/secure-random-
uniform#bigint-support-experimental) and
[https://github.com/emilbayes/biguintle](https://github.com/emilbayes/biguintle).
Lots of fun!

~~~
apaprocki
(Keep in mind that Node 10 will update to V8 6.8 before it becomes LTS, so it
will get BigInt without any flags.)

------
spullara
Not as useful as it could be since it has special syntax that isn't supported
in JSON where not having 64-bit ints is one of the bigger issues.

[https://webapplog.com/decreasing-64-bit-tweet-id-in-
javascri...](https://webapplog.com/decreasing-64-bit-tweet-id-in-javascript/)

The JSON use case was even one of the motivations behind BigInt in the first
place:

[https://twitter.com/sampullara/status/981634718964707328](https://twitter.com/sampullara/status/981634718964707328)

~~~
daxterspeed
While there's a few gotchas, rolling your own JSON parser isn't necessarily a
ton of work.

Or perhaps a proposal could be written to give numbers emitted by JSON.parse a
"raw" property, so they could be decoded using BigInt?

~~~
elmigranto
No need for custom parser. Right now, you can use custom format with reviver
and serializer functions (2nd arg to serialize / parse). Say, have them
serialize as string `^bigint[\d]+$` and revive appropriately.

~~~
daurnimator
But the point is to support large numbers with the traditional syntax.

e.g. so that {"foo": 1234567890123456789012345678901234567890} doesn't lose
precision.

~~~
elmigranto
Given that `int + bigint` throws in JS, this exact thing will never happen
imo, otherwise, you can't differentiate between types.

Maybe something like `{"x": 10n}`, but at that point, might as well do
reviver, especially if you already agreed on a format and use some to parse
out class instances vs plain objects.

------
epistemex
This is a welcome feature.

I think this for the most part is useful for WASM, but also
storing/reading/writing to binary buffers as in handling file formats that
stores (u)int64s (such as f.ex. CAF on the Mac) and with binary protocol
responses from environments such as Java/.Net that already supports bigints
(and of course, eventually Node), finance calculations.. nice.

Just curious, how will you deal with things like triple-comparing (`0x1n ===
0x1` ?) internally (from the perspective of performance)?

> BigInts [..] are always signed (in ref. to the >>> operator)

Disclaimer: I haven't taken a deep-dive into the BigInt spec yet, but things
like this comes to mind (say you want to shift and mask using unsigned 64-bit
values):

    
    
        0xffffffffffffffffn >>  24 => 0xffffffffffffffffn (hmm)
    
        0xffffffffffffffffn >>> 24 => 0x000000ffffffffffn (but won't be supported)
    

Will this be solvable with methods or will I need to instead manually mask off
the signed remainder?

~~~
venning
There's a related article linked in OP that answers your questions:
[https://developers.google.com/web/updates/2018/05/bigint](https://developers.google.com/web/updates/2018/05/bigint)

When mixing Number and BigInt, abstract equality (==) works fine, but strict
equality (===) is always false, since they are _two different primitive
types_.

Most bitwise operators are allowed, so long as all operands are the same type
(Number or BigInt). The only exception is zero-fill right shift (>>>) since
that doesn't make any sense with respect to BigInts.

From my console: (node 10 with --harmony-bigint)

    
    
      > 1n == 1
      true
      > 1n === 1
      false
      > 12345n >> 1
      TypeError: Cannot mix BigInt and other types, use explicit conversions
      > 12345n >> 1n
      6172n
      > 12345n >>> 1n
      TypeError: BigInts have no unsigned right shift, use >> instead

~~~
epistemex
> since they are two different primitive types.

Yes, I'm aware of this, but I was more asking in the realm of internal
performance/optimization (as I sort of point out - and not necessarily because
of the comparer). F.ex. they are in all _practicality_ the same integer and in
other cases, IIRC (please correct me if I'm wrong), V8 optimizes the native
Number that could easily be stored internally as IEEE-754, to actual internal
_integers_ when using bit-ops on them (to indicate to the optimizer you handle
integers) and so forth.

In this case I was curious if this would be optimized internally so these
comparisons could be moved directly to CPU registers as integers (assuming
being on a 64-bit arc) and compared there, for example. But, future will tell
- I'm equally aware of that optimizations is not first priority at this stage
(it's just me excited to see this manifest). It's in any case a welcome and
useful feature optimized or not.

apaprocki linked (thanks!) to a method that can do unsigned right-shift via a
method (1.1.11 BigInt::unsignedRightShift).

------
elmigranto
I wish we also had regular word-sized ints as well instead of everything being
double precision with all kinds of weird edge cases.

------
westoque
I've been a JavaScript developer for years and it's a surprise to me that
there's a `Number.MAX_SAFE_INTEGER` and `Number.MIN_SAFE_INTEGER`.

It seems that I've never had the use case where I exceeded the maximum or
minimum integer count. I figure there's a good amount of JavaScript developers
that have the same experience.

But good article nonetheless.

~~~
jsheard
Anyone who's used the Twitter API from Javascript has probably encountered the
integer limit, because Tweet IDs are 64-bit values that often exceed
MAX_SAFE_INTEGER.

Their JSON API works around that limitation by returning IDs as both "id" (a
number that may not be representable in Javascript) and "id_str" (the same
number in JS-friendly string form).

~~~
BinaryIdiot
Wow, why would they ever send "id" as a number? Seems like a terrible idea
when it doesn't fit in the medium's spec (JSON).

~~~
dragonwriter
JSON spec doesn't restrict the range of numbers [0], though it does note that
because different platforms that may use JSON have different limitations and
IEEE 754 doubles are widely supported, valued that aren't representable as
IEEE 754 doubles may have interoperability issues.

[0]
[https://tools.ietf.org/html/rfc8259#section-6](https://tools.ietf.org/html/rfc8259#section-6)

~~~
xtian
Sure, but why would you _ever_ return an ID as a number? IDs are explicitly a
data-type where arithmetic operations or other numeric transformations will
not yield meaningful results.

String is the best choice in JSON but ultimately only somewhat less wrong
(since string transformations won’t yield meaningful results for an ID
either). That’s why GraphQL defines a discrete ID scalar.

~~~
bonesss
I blame RDBMS-centric thinking. Things that look like numbers will perform
well in an index, and we get free ID generation, so... why think about hard
things, just throw it in there.

In terms of _identity_ an ID is never a number, always a unique value. It
should be its own type.

------
johnhenry
As someone who likes to play around with math, it's "fun" to be able to
implement functions that output large numbers, such as tetration [0]; which
can be written as:

const tetrate = (a, n)=> n=== 0n ? 1n : a __(tetrate(a, n - 1n));

While I expected this to take a long time to run for sufficiently large inputs
(on my machine tetrate(7n, 3n) takes about 40 seconds to produce ~3.8 * 10 ^
695974), some inputs immediately throw a "Maximum BigInt size exceeded" error.
This isn't surprising, however; as there has to be _some_ practical limit, but
I wonder if anyone knows how V8 determines this? Is it based on available RAM?
Do (will) other browser implement something similar?

[0]
[https://en.wikipedia.org/wiki/Tetration](https://en.wikipedia.org/wiki/Tetration)

~~~
ScottBurson
I tried this in Common Lisp:

    
    
      (defun tetrate (a n)
        (if (= n 0) 1
          (expt a (tetrate a (1- n)))))
      (compile 'tetrate)
      (time (integer-length (tetrate 7 3))
    

I used 'integer-length' to check that a number of the correct magnitude was
produced without actually having 700kB of digits dumped into my REPL buffer.
On a new MacBook Pro running SBCL, this took, ah, 624 _milli_ seconds. So
there's some room for optimization of the JavaScript implementation :-)

------
mratsim
I would be very interested in the test suite you used.

I've finished my own bigint implementation in the Nim language:

\- stack and power-of-2 only for uint256, uint512, uint1024, etc),

\- using recursive types (uint128 = 2x uint64, uint256 = 2x uint128).

It is already quite optimized (division using Burnikel and Ziegler recursive
division for example), but I want to make sure it works fine.

[https://github.com/status-im/nim-stint](https://github.com/status-im/nim-
stint)

~~~
judofyr
Interesting approach. If I understand your library correctly, you need to
choose a bitsize at compile-time? I thought you often wanted unlimited
precision (aka. possibly heap-allocated) when using bigints?

~~~
mratsim
Indeed, in my case I didn't need arbitrary precision, just extending uint64 to
uint128, uint256, uint512, ... Fixed size are perfectly fine for a variety of
use cases, can be stack-allocated and are much faster due to the
specialization.

------
hossbeast
It would be cool to post some performance results from loops of common
operations, comparing the various types and various bigint sizes.

------
bufferoverflow

        big_array[0] = 456;
        TypeError: Cannot convert 456 to a BigInt
    

That's very un-javascript-like.

------
epberry
Part of me feels like a hidden motivation for doing this was so Google could
continue to ask their “implement a BigInt class” interview question :). Cool
write up though - I enjoyed it.

------
leshow
A friend and I where wondering how the bigint is represented in memory. It
looks like it may be a bit for the sign, a byte for the type, a word for the
count, and and array of words for the number? My question was whether the
whole struct along with the array is stored in stack memory (meaning it will
all have to be moved when resized) or if the array of words is behind a ptr.
Am I way off base? Does anyone know the answer?

~~~
ridiculous_fish
JS requires somewhat different intuition. A JS value may be a BigInt, String,
Function, etc. so you need to have something that can reference any of these.

The usual technique is NaN-boxing, where doubles are stored directly and non-
doubles are represented with a NaN. A NaN gives you 51 bits to store to store
your data. Usually you have a union type, with some bits for the type tag and
the rest for data.

A BigInt may require more than 51 bits. So the obvious representation is to
reserve a new tag value, and then in the data bits store a pointer to the
heap-allocated structure that you describe. This could be a single array or a
chunked linked list.

A further change would be to store small BigInts inline in those 51 bits, e.g.
by reserving another tag. This would save a heap allocation at the cost of
more branching.

~~~
Leszek
This particular engine, V8, doesn't use NaN-boxing, because of the memory
costs on 32-bit systems of using 64-bit values for boxed pointers. Instead, we
use pointer tagging, relying that with word alignment, we have 2 bits free at
the bottom of pointers.

If the lsb is unset, then the remaining 31 bits represent a "small integer",
a.k.a. a Smi (on 64-bit, the top 32 bits make the Smi, so that they can be
read out of memory with a normal 32-bit access).

If the lsb is set, then the remaining bits are the pointer, where the bottom
bits have to be masked off to actually dereference it (practically, most
accesses are offset-based field accesses anyway, so we just subtract 1 from
the field offset). The other bit, the 2nd lsb, has recently started being used
as a weak pointer marker.

So, the type tag isn't stored in bits of the pointer, but rather in the object
being pointed to, not dissimilar to a vtable. This has the side-advantage that
we can swap the type of an object in-place, without having to update pointers
to it. BigInts are immutable, so they are stored as a heap object which holds
the object tag (actually another pointer to a "map", that's a story for
another time), a bitfield that stores sign and length, and then data inline
(very similar to how we store strings).

(pointer|0x1) -> [BigIntMap, bitfield (length+sign), digit0, digit1, ...]

~~~
leshow
Thanks for the explanation! If it functions similarly to strings, which are
also immutable, when you do any type of operation it causes a reallocation?

~~~
Leszek
I believe so, in this baseline implementation. In fact, that's also true of
operations on doubles (non-Smi) in unoptimized code, which are stored as "heap
numbers". Avoiding these allocations is one of the things an optimizing
compiler can do (and does do, for doubles).

That said, for short-lived BigInts, the allocations may not be as expensive as
you might think, since the garbage collector is generational and short-lived
objects won't leave the nursery.

------
DeepYogurt
Is this expected to arrive in other browsers JS engines?

~~~
apaprocki
Yes, in all engines. You can read the current status update that will be
presented at the May TC39 meeting in a few weeks:
[https://docs.google.com/presentation/d/1Bx13eCgM9Rujs4zNdfO9...](https://docs.google.com/presentation/d/1Bx13eCgM9Rujs4zNdfO95de_4wehhx8RxNx5aZl-3D4/edit#slide=id.p)

~~~
bdash
It looks like the link within the presentation to WebKit's issue tracking
their implementation is incorrect. It links to
[https://bugs.webkit.org/show_bug.cgi?id=180731](https://bugs.webkit.org/show_bug.cgi?id=180731),
which is about support for BigInt in the Web Inspector, when it should link to
the issue tracking the entire feature at
[https://bugs.webkit.org/show_bug.cgi?id=179001](https://bugs.webkit.org/show_bug.cgi?id=179001).

------
guidovranken
Couldn't they just have taken the BoringSSL bignum code and written some
wrapper code for it?

------
jwilk
How do they do single digit multiplication?

Most(?) 64-bit CPUs have a 64bit × 64bit → 128bit multiplication instruction;
but AFAIK the C++ language doesn't have anything like this.

~~~
stephencanon
Both clang and gcc expose it via the __[u]int128_t extension.

~~~
jwilk
That's how they did it indeed. They have also fallback code for when this the
double-width integer type is not available.

[https://github.com/v8/v8/blob/6.8.137/src/objects/bigint.cc#...](https://github.com/v8/v8/blob/6.8.137/src/objects/bigint.cc#L2289)

------
ramensea
Does anyone know if this will effect how V8 parses JSON?

~~~
apaprocki
This is the specification as it relates to JSON:
[https://tc39.github.io/proposal-bigint/#sec-
serializejsonpro...](https://tc39.github.io/proposal-bigint/#sec-
serializejsonproperty)

Basically, serialization will throw if there is a BigInt present. But, the
.toJSON() callout is supported if a particular environment wants to opt-in to
supporting it in some manner.

------
calebh
What happens when the num_digits field overflows?

~~~
munificent
On a 64-bit machine, the max value of num_digits is
18,446,744,073,709,551,616. Since each digit is its own 64-bit value, if
num_digits reached its maximum value, you'd need to allocate
147,573,952,589,676,412,928 bytes of memory for the digits themselves, or a
little over 147,573 petabytes.

