
Sorting 1 million 8-digit decimal numbers in 1MB of RAM  - joslin01
http://stackoverflow.com/questions/12748246/sorting-numbers-in-ram
======
ChuckMcM
This was a Google interview question for a while (I know I got it as one) I
think it has since made it to the banned question list. The 'trick' was, as
answer #2 elided, to use bits to store the fact that you had observed the
number and then you could dump them back in sorted order.

So imagine you have a bit field where 'one bits' indicate you have seen the
number and 'zero bits' indicate you haven't. And you compress it with run-
length encoding.

Your initial data structure is '99999999:0' (all zeros, haven't seen any
numbers) and then lets say you see the number 3,866,344 so your data structure
becomes '3866343:0,1:1,96133654:0' as you can see the numbers will always
alternate between number of zero bits and number of '1' bits so you can just
assume the odd numbers represent 0 bits and the even numbers 1 bits. This
becomes (3866343,1,96133654)

Anyway, as you get numbers you will split and coalesce the bit space until
you've either seen all numbers (0, 99999999) or you run out of memory because
a hacker has sent you only the even numbers from the space.

Its a clever 'math trick' which explores how you think about numbers (are they
a thing or a representation). But I never thought it really gave a good
indication of whether or not you would be a good candidate for the company.

~~~
jules
It is extremely unlikely that you'll be able to sort a random sequence of 1
million 10 digit numbers like that, or ANY method for that matter, unless the
particular sequence is highly compressible. You won't even be able to store
the answer in the average case. You'll need 10^6*lg(10^10) bits to store the
raw input, from which you'll be able to save about lg(10^6!) bits because it's
sorted. That comes out to more than 1.7 megabytes.

Edit: the exact amount of bits necessary is ceil(log_2((1e10+1e6-1) choose
1e6)), which is ~1.756 megabytes.

Edit edit: see udiv's comment below: that 1e10 should be 1e8 of course.

~~~
uvdiv
They're 8-digit decimals, not 10, so it works out to 0.96 MiB.

I think the parent's solution works. The typical separation between numbers is
about 2^7: that works out to 8 bits per number -- 7 for the length of the
'00...0', plus one length-one '1'. 8 bits * 10^6 is 1 MiB exactly. You need a
variable-length encoding, so you can fit 2^7 in 7 bits while still allowing
larger numbers to be encoded.

~~~
ChuckMcM
There is fairly intuitive way to reason about the storage requirement. Lets
start with you've seen no numbers in the range, you have the tuple
(99,999,999, 0). That is two 32 bit numbers or 8 bytes of storage. Now lets
say you've seen all the numbers in the range, you've got (0, 99,999,999),
again 8 bytes of storage, two 32 bit numbers. Since we're describing a linear
bit field, the worst case is : (1,1,1,1,1,1,1,1,1, ... 1,0) every odd number
and none of the even numbers. So the memory to encode the virtual bit field
will be (assuming 32 bit numbers always) 4 bytes * n where 'n' is the number
of distinct regions in the bit field. With a perfectly uniform random number
generator each number generated has possible outcomes:

1) It splits a region (it causes a string of zeros to have a one in them)
(adds 8 bytes)

2) It coalesces a region (it removes the last remaining zero between two
regions) (subtracts 8 bytes)

I'm sure there are other solutions, and as was pointed out it doesn't really
deal with duplicates. I suspect a multi-value bit field might cover that.

~~~
jules
With 10^6 random numbers between 0 and 10^8 almost all operations will be
splits, and you'll end up with 8 bytes * 10^6 = 7.6 megabytes. So indeed the
chances of this working for a random sequence are _extremely slim_. You have
to have a huge number of coalesces, which is just extremely unlikely when you
choose 10^6 random numbers in a range of 10^8. So for almost all inputs this
method doesn't work.

I simulated it with a Python program and you get about 10^4 coalesces. That
doesn't even begin to make a dent in 10^6, and storing this will still take ~7
megabytes.

~~~
uvdiv
_With 10^6 random numbers between 0 and 10^8 almost all operations will be
splits,_

Right.

 _and you'll end up with 8 bytes x 10^6 = 7.6 megabytes._

Not if you encode cleverly. The successive differences will be on the scale of
10^8/10^6 = 100, which is a very small number. It takes 7 bits to store, or at
least 8 bits in a variable-width encoding.

~~~
jules
Sure, if you encode cleverly. That is the essential difficulty of the problem
which he glosses over (note that he explicitly says that he will store the
differences as 4 byte numbers). Getting it under 1.5MB (or even 1.1MB) is
probably easy, but I have yet to see an encoding which is sufficiently compact
to fit in 1MB.

------
codex
All of these bizzaro explanations leave me wondering if I've missed something.
Can't this be elegantly solved via the pigeonhole principle?

There are one million numbers to be sorted, out of a space of one hundred
million. That means that no number may be more than 100 away from any other
number once fully sorted (7 bits). Therefore, you can simply use deltas to
encode your numbers, as 7 bit deltas * one million numbers < 1 MB RAM.

EDIT: should've been clearer: no number may be more than 100 away from any
other number _on average_ once fully sorted. Therefore, it's an average of 7
bits per number, maximum. Duplicates are even easier, since it's only one bit
to encode the duplicate (a delta of zero).

EDIT 2: As for the encoding to be used, I think a universal code or Golomb
code would probably be sufficient. They can get quite close to natural
entropy.

~~~
tolmasky
Duplicates are allowed, so you could for example have 500,000 zeros and
500,000 99,999,999's, which are more than 100 apart.

~~~
codex
That is one of the more ideal distributions, as each number gets one (zero)
bit for delta, and the number at the jump point gets a large but globally
inconsequential number of bits. Bits per number is still less than 7.

------
damienkatz
Since no computational bound has been placed, this problem could be solved in
n^2 by an insertion sort and keeping the list of numbers sorted in memory as
they are received. Then the problem then boils down to encoding a list of
sorted 8 digit decimal #s, where it's possible to insert new #s.

Since the #s are stored sorted and bounded in size, they can be encoded as
deltas which will be more space efficient than storing absolute values. Now we
just need to figure out the worst case encoding and will 1 million values fit?

~~~
andrewcooke
i think that's the clearest answer i've seen. really, it's pretty much
equivalent to the other answers, but at least for me it's by far the easiest
to grasp. although i guess that may be because the idea is simply becoming
familiar after reading the others.

the best "high level" explanation, i think, is that you are compressing the
_sorted_ numbers, which are therefore not random, and so concerns about the
incompressibility of random streams are completely irrelevant.

------
uvdiv
Information theory to the rescue! Well, not really.

8 decimal digits takes 26.6 bits. An ordered list of 10^6 of these takes 3.17
MiB. The information contained in the ordering is lg(10^6!) ~= 2.20 MiB [0].
So as an _unordered_ multiset, the information content is 0.96 MiB. It's at
least _theoretically_ possible to store the input in 1 MiB of working memory.
But only just; in fact it's significant that the problem specifies 2^20 bytes,
because for an SI megabyte (10^6 bytes), it wouldn't work.

I don't think it's _actually_ possible though. The answers here don't do it.
LZMA accomplishes nothing.

[0] Stirling's approximation in base 2: lg(n!) ~= n lg(n) - n/ln(2)

~~~
dpark
> _The information contained in the ordering is lg(10^6!)_

Can you point me to a resource that discusses the information theory behind
this claim? I'm interested in learning more, but don't know what to search
for.

~~~
uvdiv
You have N elements, so they have N! possible permutations. The ordering of
the list is one such permutation; to describe one message out of a language of
N! possible messages, it takes lg(N!) bits.

~~~
dpark
Interesting. Thanks for explaining. Are there ways to store (multi)sets that
take advantage of this fact to reduce size? I can't envision a way to utilize
this information meaningfully. It seems like any storage scheme would imply
_some_ ordering.

~~~
uvdiv
Yes, the other commenter ChuckMcM showed one

<https://news.ycombinator.com/item?id=4680259>

An ordering of a storage scheme doesn't always store information. E.g. if the
list is sorted, the ordering is completely determined by the data -- it's
redundant.

For storing integers, an idea to is store the pairwise differences between
sorted elements. E.g. [30,10,20] -> [10-0,20-10,30-20] = [10,10,10]. If you
have N integers of typical size X, the differences will be much smaller,
typically on the scale of X/N. With variable-width integers, X/N takes lg(N)
fewer bits to encode than X. So you save N lg(N) bits (asymptotically the same
as lg(N!)), compared to storing the integers literally.

~~~
dpark
> _Yes, the other commenter ChuckMcM showed one_

I saw that comment. I'm not clear how that's not still storing ordered data,
or rather how the order there is not storing information.

Obviously my information theory knowledge is weak.

> _if the list is sorted, the ordering is completely determined by the data --
> it's redundant._

I don't understand this. In what way is the information redundant? If there's
3.17 MB of data in the ordered list, and 2.2 MB of data in the ordering
itself, the information stored in the ordering cannot be redundant, because
that would mean >4.4MB of information is stored in the ordered list.

------
mikeash
Since code size is not restricted, the obvious solution is to store the entire
list of numbers received in the program counter. In short, put every possible
input combination into the code, and just follow whatever branch pattern you
receive.

This requires no memory other than that for the networking stack. It is, of
course, also completely impractical.

~~~
curiousdannii
This was my thought as well. But if you can afford several TB of ROM you can
probably afford to increase your RAM to 4MB.

~~~
emn13
it's not TB of ROM. It' much, much more: 2^(2^20) addresses... Even if you're
faking it and the addresses aren't real (inevitable with those numbers),
you're still just cheating and storing data in a multi-megabyte _dynamic_ (not
ROM!) program counter. After all, you need to distinguish all those
addresses...

~~~
mikeash
It's not cheating! If they wanted to limit the size of the program counter
then they should have specified a limit on the amount of code!

------
tolmasky
EDIT: I put this same description into the stackoverflow page with an image
which might help visualize it.

I think one way to think about this is from a combinatorics viewpoint: how
many possible combinations of sorted number orderings are there? If we give
the combination 0,0,0,....,0 the code 0, and 0,0,0,...,1 the code 1, and
99999999, 99999999, ... 99999999 the code N, what is N? In other words, how
big is the result space?

Well, one way to think about this is noticing that this is a bijection of the
problem of finding the number of monotonic paths in an N x M grid, where N =
1,000,000 and M = 100,000,000. In other words, if you have a grid that is
1,000,000 wide and 100,000,000 tall, how many shortest paths from the bottom
left to the top right are there? Shortest paths of course require you only
ever either move right or up (if you were to move down or left you would be
undoing previously accomplished progress). To see how this is a bijection of
our number sorting problem, observe the following:

You can imagine any horizontal leg in our path as a number in our ordering,
where the Y location of the leg represents the value (image:
<http://i.stack.imgur.com/aJp4b.png> ). So if the path simply moves to the
right all the way to the end, then jumps all the way to the top, that is
equivalent to the ordering 0,0,0,...,0. if it instead begins by jumping all
the way to the top and then moves to the right 1,000,000 times, that is
equivalent to 99999999,99999999,..., 99999999. A path where it moves right
once, then up once, then right one, then up once, etc to the very end (then
necessarily jumps all the way to the top), is equivalent to
0,1,2,3,...,999999.

Luckily for us this problem has already been solved, such a grid has (N + M)
Choose (M) paths:

(1,000,000 + 100,000,000) Choose (100,000,000) ~= 2.27 * 10^2436455

N thus equals 2.27 * 10^2436455, and so the code 0 represents 0,0,0,...,0 and
the code 2.27 * 10^2436455 and some change represents 99999999,99999999,...,
99999999.

In order to store all the numbers from 0 to 2.27 * 10^2436455 you need lg2
(2.27 * 10^2436455) = 8.0937 * 10^6 bits.

1 megabyte = 8388608 bits > 8093700 bits

So it appears that we at least actually have enough room to store the result!
Now of course the interesting bit is doing the sorting as the numbers stream
in. Not sure the best approach to this is given we have 294908 bits remaining.
I imagine an interesting technique would be to at each point assume that that
is is the entire ordering, finding the code for that ordering, and then as you
receive a new number going back and updating the previous code. Hand wave hand
wave.

~~~
kamaal
I think the answers to most of these seemingly impossible questions depends on
discovery of isomorphic problems spaces where they can be solved a lot more
easily.

In your case you translated the problem to a different problem space and
solved it there. The other contributors in stack overflow tend to do the same
eg: Using network latency, compression etc to solve these problems.

These sort of solutions become very interesting when they become isomorphic to
some other real world problems.

------
spitfire
Radix sort. Done.

Depending on his dataset characteristics a radix sort can have a space
requirement as low as a few hundred bytes to sort several million values.

EG: 8-bit values, Simply make a 256byte array. Increase the appropriate count
on each value when you see a value. When you've gone through the list, loop
through the array outputting count values at that index. It's also quite cache
friendly, mind the last time I compared was on a Pentium PRO to quick sort.

For larger datasets, you actually want to compare on the digits (LSD or MSD
first), and that'll take more memory.

EDIT: Originally posted that it'd take 256 bytes of memory. That's not true
for his dataset.

~~~
danbruc
There is not enough memory to keep all the numbers and therefore radix sort is
no viable solution. But I think you actually meant counting sort, but this
will not work either. It requires 100,000,000 slots each 20 bits wide for a
total of almost 238.5 MiB.

------
dangoldin
Reminds me of the sleep sort that was on here a few months ago. If you don't
care about the run time and your processor was fast enough it may be hacked to
work.

<http://rosettacode.org/wiki/Sorting_algorithms/Sleep_sort>

------
optimiz3
As others have mentioned, you need 7 bits per number (on average) if you store
the numbers as deltas in sorted form. So 7M bits out of 8388608 bits yields
1.32MB of working set.

One could implement a simple block allocator, where each block contains a
sequential list of deltas.

The trick to fast insertion is to place new blocks at addresses interpolated
between 0 and 10^8. If there is a collision, merge blocks. If the input
distribution is off, physically reallocate colliding blocks left or right into
free space.

So inserting the numbers 10, 20, 1000, 2000, 1M, 2M would give you a heap
looking like:

[head->[10,10]->[980]->[1000]->[998000]->[1000000]->tail]

As more numbers are inserted, blocks combine until you end up with one
contiguous block.

------
jedberg
Am I the only one who thinks this was someone's homework assignment?

~~~
kevinburke
It does sound similar to the opening problem in the book "Programming Pearls."
The solution proposed there was to write temporary files to disk and do a
K-ways sort.

~~~
gburt
Except in this case, you have no disk.

------
mmaunder
Compressed buckets of number ranges. I love it because it's incredibly easy to
understand which makes the code easy to maintain, and it works.

------
carl8
Here's the solution that I came up with prior to following the link or reading
any comments.

We're given 1 million integers from 0 to 99,999,999. We only have 1MB of RAM,
or an average ~8 bits for each of the million numbers. So we can't store the
numbers directly since they take ~27 bits each.

First thought was to use a bitset but that would require 100 million bits, and
we only have ~8 million bits RAM, so that's not going to work. Also need to
deal with duplicates.

How about this. Something similar to a selection sort algorithm that stores
deltas of distances between sorted numbers. As a number is streamed in, we
start scanning from the beginning of the list until it's correct position
found, where it is inserted and then push down the remaining numbers. This
will be O(n^2).

Since the average delta distance between numbers is about 100, we'll use
8-bits to store the delta value. Value 0 means the number is a duplicate of
the current number. Values 1-254 mean add this number to the current number
for the new number. Value 255 means add 255, then use the next byte as the
delta value (repeat until the value != 255).

(Case 1) 1 million ints exactly 100 apart: 0, 100, 200, 300, 400, ...,
99999800, 99999900 Stored as a list of 8-bit delta values: 0, 100, 100, 100,
100, ..., 100, 100 (1 million bytes total)

(Case 2) 1/2 million zeros, then 1/2 million values of 99999999. Stored as:
start: 0, 0, 0, 0, ... (1/2 million zero deltas) then: 255, 255, 255, 255, ...
(99999999 / 255 = 392,156 times repeated, which gets us to number 99,999,780)
then: 219, 0, 0, 0, 0, ... (another 1/2 million zero deltas)

So the total amount of storage for Case 2, which I presume is worse case (but
correct me if I'm wrong!) is: 500,000 + 392,156 + 1 + 500,000 = 1,392,157
bytes to store the delta values.

1MB = 1,048,576 bytes, so I'm over by 343,581 bytes... (so close!)

We'll have to modify this scheme so that we reduce the number of 255 values,
which should not be hard to do and will get us under the 1MB size limit. Or we
could try something fancier like huffman coding to reduce the size of the
delta values.

------
teeja
Compressed bucket-ranges works fine for random date, but if the data are
skewed in some way (e.g. normal-curve shaped), the bucket-sizes must be
adjustable.

An alternative method is to use one bucket into which all values below a limit
(e.g. 20,000,000) are sorted as they arrive, and compress all the rest. When
10^6 values have arrived, transmit the bucket and then reuse the empty bucket
repeatedly to sort the compressed values.

------
Zenst
Just a thought but given "have a computer with 1M of RAM and no other local
storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP
connection, "

I was wondering why you can't use TCP as a form of storage, possibly many ways
but latency and buffers would actualy work for you as crude storage. Not that
it is need in this case but it is one form of queue that could be abused to
store data.

~~~
greenyoda
That was actually suggested by one of the replies:

[http://stackoverflow.com/questions/12748246/sorting-1-millio...](http://stackoverflow.com/questions/12748246/sorting-1-million-8-digit-
numbers-in-1mb-of-ram/13000176#13000176)

~~~
Zenst
I must of missed that one, though good too know I'm just as sane as others out
there. Many forms of storage that most forget is out there as not directly
obvious. Even the display ram is usable - blame my ZX81 days for that
perversion.

------
recon517
Store each number as a delta from a previous one, first number as a delta from
0. Each number starts with a code describing its length: 0 (duplicate), 7, 9
or 27 bits. For storing a code use values '1', '01', '001' and '0001'.
Calculate statistics for different code types and when space is tight replace
most common codes with shortest ones.

~~~
alexkus
Bit late to this as I've been ill.

If you try that encoding scheme with some random data you'll find it won't
always fit.

I took 1,000,000 random numbers between 0 and 99,999,999 and sorted them and
calculated the deltas. Here's the distribution of the sizes of the deltas:-

    
    
       diffbits: 1 nos 5009
       diffbits: 2 nos 19937
       diffbits: 3 nos 38275
       diffbits: 4 nos 72319
       diffbits: 5 nos 128146
       diffbits: 6 nos 202000
       diffbits: 7 nos 252698
       diffbits: 8 nos 202681
       diffbits: 9 nos 72750
       diffbits: 10 nos 6148
       diffbits: 11 nos 37
       totbits = (1*5009)+(2*19937)+(3*38275)... = 6,408,685
    

That's the total number of bits required to store just the deltas, but doesn't
include the length encoding.

8 _1024_ 1024 - 6408685 = 1979923 spare bits.

Each of the 999,999 deltas will take at least one bit to determine the size of
the delta, leaving you with 979,924 bits for extra length encoding and wastage
(if you pack multiple lengths under one encoding).

There's no way I can see (after trying lots of permutations) to be able to
encode all of those bunches of diffs using those few remaining bits.

What's even more difficult is that you don't have the space to 'calculate
statistics for different code types' because you can't fit anywhere near all
of the deltas into that 1MB of memory without having them encoded perfectly
anyway. Calculating them on partial data is all that you can do, and that's
not going to be accurate enough because you've no idea what the unseen data
has to hold (it could all be duplicates or 11 bit differences...).

I'm keeping going on looking at this (and trying not to look at any of the
posted solutions).

~~~
recon517
You are right, my solution is not really working. But I still believe that
there might be a working solution of encoding deltas.

Thank you for your reply!

------
thedufer
Does anyone know what the original question was? Its clear from the answers
that the input used to be some kind of stream (or, it was unspecified and
everyone jumped on this interpretation), whereas now it is in ROM.

~~~
tikhonj
No, the input is 1 million 8-digit decimal numbers over TCP.

The ROM is used to store the program itself, so the code doesn't take up any
of the limited memory itself.

~~~
thedufer
I misread it. Thanks.

------
Cieplak
Now, can you please solve this with 50% less code using Haskell?

~~~
StavrosK
Go ahead.

~~~
Cieplak
I'm am guessing it would take substantially more LOC to do this in Haskell
than in Java. The problem seems to lend itself much better to OO design
patterns than to functional ones.

edit: unless perhaps you make use of haskell's FFI, then maybe haskell could
beat it on LOC

~~~
ben0x539
I really don't see how OO design patterns come into play here. I haven't seen
anyone suggest OO-based solutions.

~~~
Cieplak
The POC code offered by Renat Gilmanov is OO.

~~~
mfukar
That's not a solution.

------
steve8918
What did the question mean by "8 digit decimal number"? Did that mean anything
from 0.0000001 to 99999999?

------
tosseraccount
Use a machine with 32 bit bytes.

