

10,000 ints in 1 page of memory (4KB) - phamilton
http://undiscoveredfeatures.blogspot.com/2010/12/10000-ints-in-1-page-of-memory-4kb.html

======
amccollum
Yeah, the title is misleading because really the problem described in the blog
post is to store some unknown number of ints between 1 and 10,000. The better
solution than a straight bitmap of 10,000 bits is to use all 4KB and devote
three bits to each slot. Then you can remember up to 7 occurrences of each
integer.

There's no need for bloom filters here because you can already cover all the
possibilities using a bitmap.

~~~
yish
Additionally you could decide to only store up to 6 occurrences of each
integer in the bit array, use the value of 7 as an indicator that there are 7
or more occurrences and store the actual counts for those cases in a sorted
array occupied by the remaining 346 bytes left in the 4KB page.

------
copper
This looks like a very simplified bloom filter made with a very bad hash
function. Mind you, though, I am pretty sure I'd never rediscover that
particular data structure by myself :)

------
podperson
The problem is inadequately defined.

Basically it depends on whether you expect to have to record more than 32000 /
14 = 2200 or so values, in which case simply keep a sorted list of the actual
integers using a sorting/insertion algorithm of your choice, or you expect to
receive fewer than 8 occurrences of a given integer, in which case use a
10,000 x 3-bit deep bitmap, or you don't care how many occurrences of each
integer there were, in which case use a 10,000 x 1-bit deep bitmap.

In any event the unintelligible and buggy proposed solution doesn't seem like
a good choice.

------
Jabbles
I'm confused. Is 10000 the number of integers or the maximum integer that has
to be stored?

I would have thought it (in general) impossible to store more than
2^15/log2(10000)~2500 integers between 1 and 10000 in 4kB, but I look forward
to hearing more.

~~~
Fargren
If you know the maximum integer you can receive, you can do a sort of bucket
sort, where the Nth bit in the page represents whether you have revceived
number N. So yo can store as many numbers as bits you have available.

~~~
Jabbles
Well obviously. That idea is not (IMO) worthy of a news post. But it doesn't
say that the integers are distinct. I'd be interested to hear if it was
possible to store numbers _in general_ better than above.

~~~
bdonlan
If the integers cannot be guaranteed to be distinct, and all of them must be
presented at the end, then no algorithm which restricts itself to O(1) memory
usage can be correct. Since the problem restricts you to using 4KB of RAM
(O(1)), and there is presumably a solution, you can conclude that you need not
count non-distinct inputs - but yeah, it's kind of a bad problem description.

~~~
Jabbles
I'm not sure you can change the question just because there is no answer :P

4KB of RAM is not the same as an O(1) memory requirement, since we have
already restricted the number and range of integers to 10000 (i.e.
constants),so there is nothing left to vary. The page size just presents a
target ratio for compression, not an asymptotic restriction on memory usage.

