

Ask HN: How would you handle items' IDs beyond 2^64 (18.4 quintillion)? - sgy

Every item (submission&#x2F;comment) on Hacker News is assigned to a unique ID number.<p>Assuming that 2^64 is now the maximum number a computer can handle at once, what&#x27;s the best way you can handle the ID numbering system when there are more than 2^64 items?
======
felixgallo
there won't be more than 2^64 items.

[https://www.wolframalpha.com/input/?i=2%5E64+divided+by+%28p...](https://www.wolframalpha.com/input/?i=2%5E64+divided+by+%28population+of+the+world+in+2050%29)

~~~
sgy
The number 2^64 is not the limit itself here. What I meant with it is the
number of unique digits combinations we can have at 64 slots. Your calculation
would rather be
[https://www.wolframalpha.com/input/?i=99999999999999999999%2...](https://www.wolframalpha.com/input/?i=99999999999999999999%2F%28population+of+the+world+in+2050%29).
But this is not what I meant.

What can you do best when you run out of unique combinations and the CPU can't
handle more than 64 slots at a time? Upgrade to 128-bit architecture is not
the answer I'm seeking.

------
zck
The software that HN runs on can handle arbitrarily large numbers. Check it
out on [http://tryarc.org/](http://tryarc.org/):

    
    
        arc> (expt 2 64)
        18446744073709551616
        arc> (expt 2 65)
        36893488147419103232
        arc> (expt 2 165)
        46768052394588893382517914646921056628989841375232
    

Note that this isn't converting the number to a floating-point; it's exact
precision:

    
    
        arc> (+ (expt 2 165) 1)
        46768052394588893382517914646921056628989841375233

~~~
sgy
Perhaps my point wasn't very clear. I didn't mean that a computer won't be
able to compute the value of 2^64 or 2^128 or 2^x.

In a 64-bit architecture, the data bus or datapath width will be 64-bit long.
That is, we only have 2^64 unique numbers to pass in the CPU at a time. What's
the best to do when we need more unique figures? Use unsigned integers, for
instance?

2^64 is the number of combinations not the the maximum number itself.

~~~
dragonwriter
> In a 64-bit architecture, the data bus or datapath width will be 64-bit
> long. That is, we only have 2^64 unique numbers to pass in the CPU at a
> time. What's the best to do when we need more unique figures?

Use data structures that are too big to fit in the CPU all at once. The same
way that we did on 8-bit CPUs when we needed more than 256 unique ID values.
The problem is vastly more rare with 64-bit machines, but the solution remains
the same.

Obviously, comparing such value takes more clock cycles than comparing values
that fit in the CPU all at once, but that's the price you have to pay.

> Use unsigned integers, for instance?

Signed vs. unsigned, with the same width, provides the same number of unique
values. Just different values. So that wouldn't help.

~~~
dllthomas
_" Signed vs. unsigned, with the same width, provides the same number of
unique values. Just different values. So that wouldn't help."_

Unless you had been reserving negative values to mark errors when returning
ids, in which case making use of that top bit as id space lets you double the
number of ids. Of course, you could still treat it as signed or unsigned...
it's just a possibly related issue.

------
bjourne
Your assumption is just incorrect. Computers can already handle numbers
millions of bits long. The size of the numbers are only bounded by ram and
processing power not at an arbitrary limit.

~~~
sgy
You're right, they do handle numbers of millions bits long and rather
billions. But a 64-bit architecture allows the CPU's MDR datapath to read
numbers only 64-bit long at once.

My question is: what about the case when the number is longer by one bit,
let's say? What's the best/most efficient way to handle it?

~~~
tptacek
I don't understand this question. Isn't the answer you're looking for just
"use a bignum library"?

~~~
sgy
Kind of. I'm looking for a manual and efficient implementation to handle that
<=> The speed of arithmetic IS a limiting factor here.

~~~
tptacek
So you're just looking for comparative review of different bignum libraries?
That shouldn't be too hard to find. Bignum performance is important for
several applications, most notably public key cryptography.

Here's a good sort of survey piece to read:

[https://www.imperialviolet.org/2010/12/04/ecc.html](https://www.imperialviolet.org/2010/12/04/ecc.html)

You'll be particularly interested in the limb scheduling optimization, which
is tailored to the processor.

~~~
sgy
I'm into figuring out a fast, self-developed mechanism more than using
libraries. But you absolutely got it: a solution when the number doesn't fit
in a single CPU register.

Great read by the way. Thank you.

------
frou_dh
Put a separate number in front of the id and use that to shard. The historic
ids would be a.k.a. 0,<id> and the next lolillion would be 1,<id>

~~~
sgy
You just added two entries to the maximum number of digits a processor can
handle at once. I didn't get your suggested solution in full: how would you
hande the extra identifier?

~~~
frou_dh
They only become digits if and when they're parsed out of the textual query
string as such. The structure of HTTP query strings is not something the
processor cares or even knows about. To it, "=1,333" isn't an equals sign
followed by two comma-separated numbers, it's simply 6 bytes of data. The
meaning of that data is an application level concept.

~~~
sgy
What I meant with digits is zeros and ones, of course not strings/characters.
There's no disagreement about the data context in the lower level.

To be more clear, let's suppose that the maximum a CPU can read at once is one
character (8 bits). If we went by your suggestion to add one or two extra
characters [on the high level], or 16 bits [on the lower level], the processor
won't be able to handle the 24 (8 + 16) at one time, simply because the max is
8 at a time.

~~~
Edman274
Are you asking "how can a computer compare strings that are longer than their
native word size"

~~~
sgy
Well, not exactly, but rather the most efficient way to deal with elements
(big numbers, in specific) that aren't going to fit in a single CPU register.
Current libraries I've had a look at has some limitation on the speed of
calculations.

------
glimcat
The solution to this problem is as old as Lucretius.

Go to the edge, then add a bit.

Or, whatever, get two computers. Or twenty, they're cheap.

~~~
sgy
Any extra bit added to the edge won't be handled by the processor. Do you
think having a cluster would solve such kind of problem efficiently?
Especially that HN used to run on one core in the backend.

------
oppositelock
Hash them to 64 bits, deal with occasional collisions, which will be unlikely
in any real world scenario.

------
ahazred8ta
Bear in mind, it would take 1 Billion users posting 18 Billion times each, to
reach 2^64. I'm not holding my breath. (and computers can actually handle
128-bit numbers just fine)

~~~
sgy
The whole concept is how to solve problem at its edges, whether 64 or 128,
let's say in 10 years time. Plus, would you use IBM System/37 or DEC VAX to
run HN? 18 billion posts by 1 billion users will cumulatively make sense in
the context of time.

------
SamReidHughes
How refreshing, a classic style of troll! Upvoted and flagged.

