

Ask HN: How would you store a number with a 10^6 digits and all numbers  - sycren

Very large numbers have a few applications, the largest prime numbers in the world have millions of digits [1]. 
On stackoverflow, one solution for storing a million phone numbers is to use a trie. [2][3]  
Another way would be to use alphabets as a method of encoding the numbers. For example if all the unicode characters were used in a string we could have 110,000^(String length) [4]  
Here Pi is displayed to a million digits [5].  As a unicode .txt this is 2MB. If I was to store every combination of number from 0 to this number it could prove to be impossible. How would you suggest storing such large numbers?<p><pre><code>  [1] http://primes.utm.edu/largest.html
  [2] http://stackoverflow.com/questions/5262465/storing-1-million-phone-numbers
  [3] http://en.wikipedia.org/wiki/Trie
  [4] http://en.wikipedia.org/wiki/Unicode
  [5] http://www.piday.org/million.php</code></pre>
======
strangestchild
Presumably it rather depends on what you're storing such large numbers for. If
you want to store a single number n (for the sake of argument, let's say a
positive integer) for later reference, and have no knowledge in advance of any
special properties of the number, you can't do better than ceil(log_2(n)) bits
of information, whatever scheme you use for storing it.

On the other hand, if you know your number has particular properties (for
example, say you knew it was an even number between 10,000 and 1,000,000
digits long) you would be able to use that information to store it more
efficiently. See <http://en.wikipedia.org/wiki/Information_theory> for the
mathematical background for treating this sort of problem.

However, many space-efficient ways of storing data are not time-efficient when
it comes to retrieving it or performing operations on it. I can refer to huge
numbers using very dense mathematical notation - for example, I could mention
A(100,100) (see Petrushka's comment on the Ackermann function) - which would
allow me to refer to an astronomically huge number in ten characters - but
this is useless for most practical purposes as it would take an inordinate
amount of time to calculate. Similarly, I could 'store' the quadrillionth
prime in memory by simply writing "the quadrillionth prime" - but this is
probably not what you wanted.

The data structure you use will depend enormously on what you want to use it
for. For phone numbers, it is helpful to be able to auto-complete numbers a
use is typing, which would be difficult to achieve if you simply concatenated
them with separators and compressed the data. For storing words in a
dictionary, you might use a trie, but that becomes a poor choice of structure
if what you actually want is to be able to easily identify words that rhyme
with each other.

In summary, it is important to identify what sort of numbers you are storing,
what operations you intend to be able to support on your dataset, and how you
intend to access the elements. In principle, for arbitrary positive integers,
you can't do better than the bound I provided - and if you want more than to
just store the data without using it, you may end up using rather more than
that.

~~~
sycren
Say I was trying to find the next mersenne prime what would you suggest?

------
Petrushka
Knuth's up-arrow notation and the Ackermann function are both designed to make
very large numbers easy to notate, but unfortunately I don't know how
difficult it would be to write a program to find an expression for any number
in one or the other (I'm sure there are plenty of others on HN who can). Might
still be worth looking into however.

<http://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation>

<http://en.wikipedia.org/wiki/Ackermann_function>

~~~
strangestchild
In principle, not every number can be directly expressed as the result of a
nontrivial Ackermann function or Knuth operator (just as not every number is a
square or a cube) - although there is a section in the Knuth page you linked
on how to represent any nonnegative integer as a series of Knuth-type
operations.

I would imagine, though, that to represent an arbitrary integer between 0 and
n still needs ceil(log_2(n+1)) bits - since you have to distinguish between
n+1 choices, whichever notation you use.

------
brudgers
I can't remember where I read the story, but it goes:

There was alien who recorded all the knowledge of Earth. Each fact was encoded
as a numerical string and all the strings were concatenated. The concatenated
string was converted back to a number. The number was treated as length. The
alien took a stick, cut it that length, placed it in his pocket and left.

Not remembering the source drives me nuts every time I think about one of
these storage problems. Not that it's a practical solution of course.

~~~
jtran21
It's on page 48-49 of Martin Gardner's "aha! Gotcha" book.

------
barakstout
How about a very long link-list?

