My personal favorite variant of this is sorting 1M 32-bit integers using 2MiB of RAM. This is possible and I coded it up for fun a few years ago. Here's the specific encoding I used for the sorted list: A '0' bit followed by 12 bits of payload generates a number which is those 12 bits added to a global state counter. A '1' bit increments the global state counter by 2^12. It's easy to work out the memory requirements (it's constant for all sorted lists of the same length): 2^20 bits for the '1' bits (to reach a maximum value of 2^32) plus 10^6 * 13 bits for the integers. This is exactly 1,756,072 bytes, which leaves plenty of extra space.
ETA: unfortunately a simple adaptation of this technique doesn't quite work for this problem; the optimum payload size for 1e6 integers up to 1e8 is 6 bits, but even with that the representation takes 1046kiB, and the budget is 1024kiB.
However, they explain that Golomb coding isn't quite enough for the 1M case and switch to arithmetic coding instead.
I reckon the easiest way to actually achieve that bound is to use the combinatoric number system. I've seen people refer to arithmetic encoding but I'm not too sure on the exact details, they're probably encoding the gaps between numbers but then there'll be some loss as those aren't IID, using the exponential distribution does seem to get you below the bound though, even if it is extremely annoying to implement.
For the combinatorial number system you basically need to sum (10^100 + 10^6 choose x + k) for all numbers x where k is the position of x in the (sorted) list. This would have been many times easier if you didn't need to take possible repeated values into account, but it is what it is.
If I'm honest both are complicated enough that I worry they'll be nigh impossible to implement without accidentally using 2MB of RAM.
Edit: Actually, just use Golomb-Rice codes for the gaps, if you pick the parameters right you'll need [d/128]+7 bits (rounded to the nearest integer) for a gap of size d, which is good enough. Since every gap uses a whole number of bits you'll just need to update and insert a few bits to update, rather than basically rewriting the whole thing (although you'll still need to shift the entire tail by a few bits).
This particular problem as stated on S.O. aside do you consider disk-based sorting one of the tricks? I would be curious to hear some of the other tricks as well as variants of this problem space.
There's another variant, iirc sorting telephone numbers, in "Programming Pearls," but it requires a different approach.
Might you or someone else have a link that states the whole interview question?
Not really as all the input values are unique. So counting wouldn't do any good.
It's basically using a bitfield long enough to encompass all possible values and then switching the bit of the input value.
As an example, if you are dealing with 3 digit numbers, then the max possible value is 999 and the min possible is 0. So you need a bitfield that is 1000 bits long. And if your input values are 999, 2, 123 and 7. Then you set the 999th bit, 2nd bit, 123rd bit and the 7th bit. Now when you read the bitfield from the 0th bit to the 999th bit, you will find 4 set bits - the 2nd bit, the 7th bit, the 123rd bit and the 999th bit. So instead of viewing them as ordinal numbers but cardinal numbers, we get 2, 7, 123 and 999 which is sorted.
Another way to view it is as setting the positions in an array. Instead of bits, lets say you have an array of 1000 ints all initialized to 0. Lets call this array MyBitArray. And if your input values are 999, 2, 123 and 7 then
MyBitArray = 1;
MyBitArray = 1;
MyBitArray = 1;
MyBitArray = 1;
Now if you pass through the array in order and check for the indices with the value set to 1, you'll see that indexes 2, 7, 123 and 999 are set to 1 which incidentally gives you your sorted list.
The difference between the array example and the bit example is that the bit example is much more space efficient. But the overall idea is the same.
It's possible that I'm getting this wrong, but this is my understanding of it.
Is it possible for you to share the code? Still don't quite get how that encoding related/helps in sorting the numbers.
Since not all lists are sorted, there are more lists of a given size than sorted lists.
1. What was the inspiration behind that answer?
2. Have you stumbled upon other such quirky techniques elsewhere that rival yours?
3. Any interesting anectode about someone who put your solution in production / research paper / homework and later reached out to you for help?
Re 1: I'm not sure there was any particular inspiration. I've worked on quantum computing since 2004, so I guess I spend a lot of time thinking how to make computers in weird ways. Plus I spent college working at a number of ISPs doing tech support, so I used to be reasonably up on networks.
Re 2: Yep. In my field there is this idea of quantum sneaker-net. This is far more useful than my answer and potentially solves a major problem in the field I work in, but is also totally off the wall.
The basic idea is that you can make an extremely high speed low latency network for quantum communication by shipping giant refridgerators around the world on ships (using a quantum effect and the existing [non-quantum] internet). The paper is here: https://www.nature.com/articles/srep36163
Re 3: Somebody here pointed out pingfs which seems to take the idea to a whole new extreme, but reading the website it seems like it was already being worked on in 2011. I doubt anyone implemented my answer in any way. It was somewhat intentionally impractical.
This sounds to me like...
A similar (but I think safer) approach might be to read ahead all the packets to determine what values you can begin to send. Then you could request retransmission. This requires the sender to play nice and buffer all the data fore you but that seems more realistic than assuming the router will do so.
The relationship between cardinality of the codomain and image of the numbers is approximately 100
I would try to convert incoming numbers into pairs/tuples of N + differences (which can be done using less bits)
Even if you don't read the book, that first chapter, "Cracking the Oyster", is worth a read. It's a fun problem.
I'm not really smart enough to come up with such a solution, I'd probably just try and find a way to plug something in between the input or output Ethernet port that has a bit more memory.
The thing is, you could just not worry about the stream, as it is TCP/IP based. The core idea is to read a number from the stream, append to the array that makes up the heap, and run heap sort.
The only important thing is to keep the tcp/ip connection alive. If sorting takes too much, tcp will take care of sending the last sent number again.
This might not be super efficient though.
This could also be done in batches (4kb? 8kb?) -- we don't know about network speed or latencies, so we might work on that.
Just simple math: imagine that you have to store numbers 0, 100, 200, ..., 99 999 900 as a worst case scenario. As you're going to store the differences, you'll need to store "0" and then "100" 999999 times. If you would be able to represent "100" with 8 bits, you'll be able to keep your data inside 1 MiB limit. One simple encoding: 1xxxxxx for 7 bits; 01xxxxxxxx for 8 bits; 001xxxxxxxxx for 9 bits, etc. It should be enough for this task, although you could invent better encodings.
Example starting your base as zero if the first number 100 you can store that in 1 byte, use the high bit to flag a larger number so if the next number is say 10000 you can store it in two bytes. But let's say the next number is 3000, you need to update the first delta, insert the new number and continue on. You probably can store most numbers that way.
So we need to compress the data first in a way we can still count it. So we take a sinus function and mark multipliers where a number does not fit sinus. So we spare 0,5MB of RAM at least. Then we iterate the range, multiply sinus and output sorted numbers.
Rest of the RAM can be used for example for bitcoin mining