Hacker News new | past | comments | ask | show | jobs | submit login
Sorting 1M 8-digit numbers in 1 MB of RAM (2012) (stackoverflow.com)
266 points by saadalem 61 days ago | hide | past | web | favorite | 60 comments

I'm a fan of the actual (in-RAM, no tricks) problem and its variants. The entropy of the sorted list is 0.96MiB so it is theoretically possible (see https://news.ycombinator.com/item?id=4679756), but I've not seen a good writeup of an actual algorithm.

My personal favorite variant of this is sorting 1M 32-bit integers using 2MiB of RAM. This is possible and I coded it up for fun a few years ago. Here's the specific encoding I used for the sorted list: A '0' bit followed by 12 bits of payload generates a number which is those 12 bits added to a global state counter. A '1' bit increments the global state counter by 2^12. It's easy to work out the memory requirements (it's constant for all sorted lists of the same length): 2^20 bits for the '1' bits (to reach a maximum value of 2^32) plus 10^6 * 13 bits for the integers. This is exactly 1,756,072 bytes, which leaves plenty of extra space.

ETA: unfortunately a simple adaptation of this technique doesn't quite work for this problem; the optimum payload size for 1e6 integers up to 1e8 is 6 bits, but even with that the representation takes 1046kiB, and the budget is 1024kiB.

I think the second solution from SO is almost the same as yours, including having begun with Golomb coding: http://preshing.com/20121026/1mb-sorting-explained

However, they explain that Golomb coding isn't quite enough for the 1M case and switch to arithmetic coding instead.

Ah seems I was a bit too optimistic with my estimation of the number of bits. Shame, that means a simple solution is pretty much impossible.

>The entropy of the sorted list is 0.96MiB so it is theoretically possible (see https://news.ycombinator.com/item?id=4679756), but I've not seen a good writeup of an actual algorithm.

I reckon the easiest way to actually achieve that bound is to use the combinatoric number system. I've seen people refer to arithmetic encoding but I'm not too sure on the exact details, they're probably encoding the gaps between numbers but then there'll be some loss as those aren't IID, using the exponential distribution does seem to get you below the bound though, even if it is extremely annoying to implement.

For the combinatorial number system you basically need to sum (10^100 + 10^6 choose x + k) for all numbers x where k is the position of x in the (sorted) list. This would have been many times easier if you didn't need to take possible repeated values into account, but it is what it is.

If I'm honest both are complicated enough that I worry they'll be nigh impossible to implement without accidentally using 2MB of RAM.

Edit: Actually, just use Golomb-Rice codes for the gaps, if you pick the parameters right you'll need [d/128]+7 bits (rounded to the nearest integer) for a gap of size d, which is good enough. Since every gap uses a whole number of bits you'll just need to update and insert a few bits to update, rather than basically rewriting the whole thing (although you'll still need to shift the entire tail by a few bits).

>"I'm a fan of the actual (in-RAM, no tricks) problem and its variants."

This particular problem as stated on S.O. aside do you consider disk-based sorting one of the tricks? I would be curious to hear some of the other tricks as well as variants of this problem space.

I consider it not in-RAM.

There's another variant, iirc sorting telephone numbers, in "Programming Pearls," but it requires a different approach.

The classic programming interview version of this question is to infer from "telephone numbers" that they're all unique and so you just allocate the provided memory as a big bitfield, mark all the numbers you have in your input list, then scan through and read them back in order.

Interesting is this a counting sort then?

Might you or someone else have a link that states the whole interview question?

> Interesting is this a counting sort then?

Not really as all the input values are unique. So counting wouldn't do any good.

It's basically using a bitfield long enough to encompass all possible values and then switching the bit of the input value.

As an example, if you are dealing with 3 digit numbers, then the max possible value is 999 and the min possible is 0. So you need a bitfield that is 1000 bits long. And if your input values are 999, 2, 123 and 7. Then you set the 999th bit, 2nd bit, 123rd bit and the 7th bit. Now when you read the bitfield from the 0th bit to the 999th bit, you will find 4 set bits - the 2nd bit, the 7th bit, the 123rd bit and the 999th bit. So instead of viewing them as ordinal numbers but cardinal numbers, we get 2, 7, 123 and 999 which is sorted.

Another way to view it is as setting the positions in an array. Instead of bits, lets say you have an array of 1000 ints all initialized to 0. Lets call this array MyBitArray. And if your input values are 999, 2, 123 and 7 then

MyBitArray[999] = 1;

MyBitArray[2] = 1;

MyBitArray[123] = 1;

MyBitArray[7] = 1;

Now if you pass through the array in order and check for the indices with the value set to 1, you'll see that indexes 2, 7, 123 and 999 are set to 1 which incidentally gives you your sorted list.

The difference between the array example and the bit example is that the bit example is much more space efficient. But the overall idea is the same.

Seconded, I googled bitfields and Stack Overflow and still don't understand the OP even a little.

For sorting 3,5,8,2,7: you take 0000000000, representing all 10 single digit numbers (0-9), and encode the given numbers into it as 0011010110. Now you just read out each one in order: 2,3,5,7,8. No need to store the actual numbers themselves.

It's possible that I'm getting this wrong, but this is my understanding of it.

The original Pearls page seems to have been taken down, but the web archive remembers it:


> My personal favorite variant of this is sorting 1M 32-bit integers using 2MiB of RAM. This is possible and I coded it up for fun a few years ago.

Is it possible for you to share the code? Still don't quite get how that encoding related/helps in sorting the numbers.

Interesting, sort of bucket sort using a linked list if I understand correctly. How do you determine the optimal number of bits - 12 for 32bit and 6 for 1e8 = 30bit?

Why is the entropy of the sorted list smaller?

The entropy of a signal is the logarithm base 2 of the number of signals that you could possibly be trying to send. Because with perfect compression, that is how many bits you need to encode that many different signals.

Since not all lists are sorted, there are more lists of a given size than sorted lists.

1M (ordered) list of numbers < 10^8 requires 3.16MB of RAM(10^6*log_2(10^8) bits). The theoretical lower bound to store unordered numbers is 0.96MB(log_2(comb(10^8+10^6-1, 10^6)) bits. Basically calculating number of different possible types of lists, taking its log is the entropy. Both the lists [1, 2] and [2, 1] are same if we don't want the ordering information. In the second case we just need counts of number with property that count[0] + count[1]+...+count[10^8] = 10^6 as counts give perfect information for the sorted list.

Because it's easier to compress (compared to when its contents is in a random order).

I have only ever answered one question on SO, and this was it. I used to be pretty active on some of the other sites (cstheory and the old theoretical physics stack exchange) but this answer got enough attention that I immediately vowed not to post to SO again, since it could only damage my track record there.

Thanks. Since you're here, I'd like to take the liberty of asking:

1. What was the inspiration behind that answer?

2. Have you stumbled upon other such quirky techniques elsewhere that rival yours?

3. Any interesting anectode about someone who put your solution in production / research paper / homework and later reached out to you for help?

There have been a few edits to my answer since I originally posted it. The original version had a bit at the top about it being against the spirit of the question but intended to amuse. It's not a practical solution at all.

Re 1: I'm not sure there was any particular inspiration. I've worked on quantum computing since 2004, so I guess I spend a lot of time thinking how to make computers in weird ways. Plus I spent college working at a number of ISPs doing tech support, so I used to be reasonably up on networks.

Re 2: Yep. In my field there is this idea of quantum sneaker-net. This is far more useful than my answer and potentially solves a major problem in the field I work in, but is also totally off the wall.

The basic idea is that you can make an extremely high speed low latency network for quantum communication by shipping giant refridgerators around the world on ships (using a quantum effect and the existing [non-quantum] internet). The paper is here: https://www.nature.com/articles/srep36163

Re 3: Somebody here pointed out pingfs which seems to take the idea to a whole new extreme, but reading the website it seems like it was already being worked on in 2011. I doubt anyone implemented my answer in any way. It was somewhat intentionally impractical.

This is basically an ICMP-based version of sleep sort: https://rosettacode.org/wiki/Sorting_algorithms/Sleep_sort

They're both similar and very clever, but they do operate differently. Sleep sort works by setting up a delay and callback based on the values that you are sorting, so timing is crucial. The solution in this topic is using the network as a queue to store values and is constantly throwing them back, only pulling them into the sorted stack when they reach the max threshold. Once an item is pulled down, its value is recorded and isn't echoed out again. The process is repeated until the list is complete. This works without regard to timing.

"Once COUNTER reaches 1000000, you have all of the values stored in the incessant stream of ICMP requests"

This sounds to me like...


It's complicated by the fact that ICMP packets do not necessarily arrive in order.

I wonder if the author was familiar with https://en.wikipedia.org/wiki/Delay_line_memory

Reminds me of pingfs: https://github.com/yarrick/pingfs

Oh, cool! In my defence, I can only say that I believe my answer predates this.

The accepted answer seems prone to losing data on dropped packets, no?

A similar (but I think safer) approach might be to read ahead all the packets to determine what values you can begin to send. Then you could request retransmission. This requires the sender to play nice and buffer all the data fore you but that seems more realistic than assuming the router will do so.

"prone to" is putting it mildly. ICMP echo requests tend to be processed by the often relatively anemic control plane CPU rather than ASICS and are at the bottom of the heap in terms priority. If you send a router 1MB of ICMP echo requests, it's virtually guaranteed to drop some or even most of them.

This solution is how RAM was implemented on some early computers, with acoustical memory in mercury delay lines. Crazy! I saw one at the computer history museum and realized that it weighed tons!

I think you can assume 8-digit numbers can be stored in 27 bits

The relationship between cardinality of the codomain and image of the numbers is approximately 100

I would try to convert incoming numbers into pairs/tuples of N + differences (which can be done using less bits)

This question is not the same as -- but has a similar feel to -- the first programming problem in the classic book "Programming Pearls"[1]. You can read it on O'Reilly Safari these days[2].

Even if you don't read the book, that first chapter, "Cracking the Oyster", is worth a read. It's a fun problem.

[1]: https://amzn.to/2vWkZ0D

[2]: https://learning.oreilly.com/library/view/programming-pearls...

Hang on, he decrements VALUE only by 1 during the sorting phase, which assumes that the 1M 8 digit numbers are consecutive, which we know they might not he because he said there could be duplicates....

After decreasing VALUE by 1, you should run T (T >> 10000000) times. If VALUE is not found in your stream, it is not transmitted. Similarly, you should always run T times, to account for all duplicates (you can't just 'break' when finding VALUE)

I'm vaguely thinking set a TCP sequence number according to each number you want to sort, then use the out of order capability of TCP/IP to do it for you...

But you’d need to either send your million packets somewhere with enough memory (if you can do that, why not just send the numbers to the box with sufficient memory in the first place), and you’d need to be able to look at each input number and decide what sequence number to assign to the outgoing packet (you can’t use the number itself as numbers may be duplicated but seqnums may not, and you can’t have enough memory to know if it’s a duplicate without solving the problem. Also you won’t be able to have missing seqnums in the set of all packets you send, and wherever you send the packets won’t put up with you sending a million packets (let alone the 2^27 or so you might need if you sent every 8-digit number) out of order.

The accepted solution is some wizardry that works because 1M (just) fits into 1MB... I wonder if in the intervening years 1M hasn't become 2M or more?

I'm not really smart enough to come up with such a solution, I'd probably just try and find a way to plug something in between the input or output Ethernet port that has a bit more memory.

1M (ordered) list of numbers requires 3.5MB of RAM(10^6*log_2(10^8) bits). The theoretical lower bound to store unordered numbers is 0.96MB(log_2(comb(10^8+10^6-1, 10^6)) bits)

I think your calculation for ordered lists is just wrong. I think the right answer should be about 3.17MB, not 3.5.

Someone wrote 3.5MB, didn't bother to double check but the formula is correct.

I'd sort the input as it arrives, using heap sort.

The thing is, you could just not worry about the stream, as it is TCP/IP based. The core idea is to read a number from the stream, append to the array that makes up the heap, and run heap sort.

The only important thing is to keep the tcp/ip connection alive. If sorting takes too much, tcp will take care of sending the last sent number again.

This might not be super efficient though.

This could also be done in batches (4kb? 8kb?) -- we don't know about network speed or latencies, so we might work on that.

You can't because there's not enough RAM to hold all the numbers. So you'd be sorting the input as it arrives, yes, and then at some point you'd run out of RAM and have to stop processing new numbers.

There's enough RAM to hold all the numbers. You just have to invent the representation to store all the numbers in the available RAM. Naive representation using 27 bits per number won't hold them all indeed. But if you would store them ordered and store difference rather than number using variable bit encoding, it would take less space.

Just simple math: imagine that you have to store numbers 0, 100, 200, ..., 99 999 900 as a worst case scenario. As you're going to store the differences, you'll need to store "0" and then "100" 999999 times. If you would be able to represent "100" with 8 bits, you'll be able to keep your data inside 1 MiB limit. One simple encoding: 1xxxxxx for 7 bits; 01xxxxxxxx for 8 bits; 001xxxxxxxxx for 9 bits, etc. It should be enough for this task, although you could invent better encodings.

That my idea too. It is very unlikely that the numbers are spread apart enough that you can't store the deltas in compact forms.

Example starting your base as zero if the first number 100 you can store that in 1 byte, use the high bit to flag a larger number so if the next number is say 10000 you can store it in two bytes. But let's say the next number is 3000, you need to update the first delta, insert the new number and continue on. You probably can store most numbers that way.

No it isn't, because the problem specifies that you have 1 MB of RAM and no other local storage. You don't have a disk to write back out to.

Has anyone implemented this?

That assumes external storage, which is not given in the problem statement.

This problem 20 yrs ago might be worth attempting to solve. Not sure it is worth the complexity and effort. Having said that the thinking & creativity could be reused. Not sure it is a complexity that we solve every day.

Compact / succinct data structures and clever algorithms are still very important when data gets large enough -- consider e.g. a database running complex queries, where a good encoding makes the difference between staying in memory and going to disk. Basically the machines became larger but so have the problems to solve.

Yes but this is far beyond compact, it's squeezing in so close to the edge of what fits that the performance inevitably drops off a cliff.

But there are for instance modern embedded systems which might be very resource constrained and knowledge like this would be useful there.

A more practical version of that same solution would be to use some serverless endpoint to do the sorting (or even just the storing).

OK, since we're now in evil question territory: How long should the wire to the router be to store everything? Assuming no buffer in the router and gygabit ethernet.

Its easy. 1kk numbers fit almost whole 1MB. So most of the numbers from the range will be there if we assume normal distribution.

So we need to compress the data first in a way we can still count it. So we take a sinus function and mark multipliers where a number does not fit sinus. So we spare 0,5MB of RAM at least. Then we iterate the range, multiply sinus and output sorted numbers.

Rest of the RAM can be used for example for bitcoin mining

Bitcoin mining hinges on computing hashes, though.

It was a figure of speech. Still my way is valid

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact