Hacker News new | past | comments | ask | show | jobs | submit login

Can you explain how you merge a compressed sorted stream with the sorted chunk, to obtain a bigger compressed stream (without extra memory, or quadratic complexity) ?

I can see a solution with memory usage : size of( compressed sorted stream ) + size of( compressed sorted chunk ) + size of(compressed stream merge output)

This would work but it means you need to fit your compressed sorted stream in less than a single MB instead of 2MB (and I'm ignoring memory fragmentation problems)




There are a bunch of ways to do this. One of the more straightforward is to use a temporary buffer big enough to hold both the new values and the existing stream, and start by shifting the existing stream to be right aligned (using memmove or the equivalent). Then read bits from the old stream while writing bits into the new stream. Using Elias-Fano coding (which I believe is an instance of Golomb coding), the write position will never overtake the read position, so the merge can be done in-place.


Thanks. I think I understood it now : https://news.ycombinator.com/item?id=27553076

In my solution though when the extra temporary buffer is very small,in the worst case, we need to make sizeof(compressedStream)/sizeof(temporaryBuffer) memmove which can potentially make the algorithm quadratic.


One method is to move the compressed output from the top to the bottom of the memory with each cycle:

A = Existing compressed sorted chunk; B = Compressed merge output; C = New sorted chunk

Start with:

AAAAAAAAAAAAAAAAAAAAAAAAAAA-----CCCCCC

Merge into B, start at top of A and working your way down:

AAAAAAAAAAAAAAAAAAAAAA-----BBBBBCCCC--

Finish this iteration like:

--------BBBBBBBBBBBBBBBBBBBBBBBB------

Then refill C and sort it:

--------BBBBBBBBBBBBBBBBBBBBBBBBCCCCCC

Merge B back into A, but forwards this time:

AAAAAAAAAA------BBBBBBBBBBBBBBBB---CCC

AAAAAAAAAAAAAAAAAAAAAAAAAAA-----------


Thanks, I got it. The main trick is the standard "interview" trick of writing from right to left.

I'm not sure your solution work though without memory moving, because in your second step if the values of A are larger than the values of C, the growing B will hit the non decreasing set A.

To convince my self, I wrote for myself the following sequence.

00000224446666666666888888_11133333355555

BytesOfTheCompressedStream_NewSortedChunk________________

BytesOfTheCompressedStream_NewSortedChunk________weNsetyB

OfTheCompressedStream_SortedChunk________________weNsetyB

OfTheCompressedStream_SortedChunk________detroSfOweNsetyB

TheCompressedStream_Chunk________________detroSfOweNsetyB

TheCompressedStream_Chunk_______detroSehTdetroSfOweNsetyB

CompressedStream_Chunk__________detroSehTdetroSfOweNsetyB

Here notice that we hit the existing streams so we need to memmove

CompressedStream_Chunk_esserpmoCdetroSehTdetroSfOweNsetyB

dStream_Chunk__________esserpmoCdetroSehTdetroSfOweNsetyB

dStream_Chunk____knuhCdesserpmoCdetroSehTdetroSfOweNsetyB

Stream_____maertSknuhCdesserpmoCdetroSehTdetroSfOweNsetyB

___________maertSknuhCdesserpmoCdetroSehTdetroSfOweNsetyB

Then reverse it in place and memmove




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: