Hacker News new | comments | show | ask | jobs | submit login
Ask HN: How would you store a number with a 10^6 digits and all numbers
7 points by sycren 1648 days ago | hide | past | web | 7 comments | favorite
Very large numbers have a few applications, the largest prime numbers in the world have millions of digits [1]. On stackoverflow, one solution for storing a million phone numbers is to use a trie. [2][3] Another way would be to use alphabets as a method of encoding the numbers. For example if all the unicode characters were used in a string we could have 110,000^(String length) [4] Here Pi is displayed to a million digits [5]. As a unicode .txt this is 2MB. If I was to store every combination of number from 0 to this number it could prove to be impossible. How would you suggest storing such large numbers?

  [1] http://primes.utm.edu/largest.html
  [2] http://stackoverflow.com/questions/5262465/storing-1-million-phone-numbers
  [3] http://en.wikipedia.org/wiki/Trie
  [4] http://en.wikipedia.org/wiki/Unicode
  [5] http://www.piday.org/million.php

Presumably it rather depends on what you're storing such large numbers for. If you want to store a single number n (for the sake of argument, let's say a positive integer) for later reference, and have no knowledge in advance of any special properties of the number, you can't do better than ceil(log_2(n)) bits of information, whatever scheme you use for storing it.

On the other hand, if you know your number has particular properties (for example, say you knew it was an even number between 10,000 and 1,000,000 digits long) you would be able to use that information to store it more efficiently. See http://en.wikipedia.org/wiki/Information_theory for the mathematical background for treating this sort of problem.

However, many space-efficient ways of storing data are not time-efficient when it comes to retrieving it or performing operations on it. I can refer to huge numbers using very dense mathematical notation - for example, I could mention A(100,100) (see Petrushka's comment on the Ackermann function) - which would allow me to refer to an astronomically huge number in ten characters - but this is useless for most practical purposes as it would take an inordinate amount of time to calculate. Similarly, I could 'store' the quadrillionth prime in memory by simply writing "the quadrillionth prime" - but this is probably not what you wanted.

The data structure you use will depend enormously on what you want to use it for. For phone numbers, it is helpful to be able to auto-complete numbers a use is typing, which would be difficult to achieve if you simply concatenated them with separators and compressed the data. For storing words in a dictionary, you might use a trie, but that becomes a poor choice of structure if what you actually want is to be able to easily identify words that rhyme with each other.

In summary, it is important to identify what sort of numbers you are storing, what operations you intend to be able to support on your dataset, and how you intend to access the elements. In principle, for arbitrary positive integers, you can't do better than the bound I provided - and if you want more than to just store the data without using it, you may end up using rather more than that.

Say I was trying to find the next mersenne prime what would you suggest?

Knuth's up-arrow notation and the Ackermann function are both designed to make very large numbers easy to notate, but unfortunately I don't know how difficult it would be to write a program to find an expression for any number in one or the other (I'm sure there are plenty of others on HN who can). Might still be worth looking into however.



In principle, not every number can be directly expressed as the result of a nontrivial Ackermann function or Knuth operator (just as not every number is a square or a cube) - although there is a section in the Knuth page you linked on how to represent any nonnegative integer as a series of Knuth-type operations.

I would imagine, though, that to represent an arbitrary integer between 0 and n still needs ceil(log_2(n+1)) bits - since you have to distinguish between n+1 choices, whichever notation you use.

I can't remember where I read the story, but it goes:

There was alien who recorded all the knowledge of Earth. Each fact was encoded as a numerical string and all the strings were concatenated. The concatenated string was converted back to a number. The number was treated as length. The alien took a stick, cut it that length, placed it in his pocket and left.

Not remembering the source drives me nuts every time I think about one of these storage problems. Not that it's a practical solution of course.

It's on page 48-49 of Martin Gardner's "aha! Gotcha" book.

How about a very long link-list?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact