
Realtime Metrics for 128 Million Users with Redis: 50ms + 16MB RAM - pooriaazimi
http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps
======
firefoxman1
This was posted a while ago, and I have since implemented bitmaps myself. One
thing I learned from the documentation[1] is that setting an initial bit at a
very high-numbered place (like 2^30 - 1) takes a while to allocate (compared
to Redis's normal speed) and blocks other operations in the process.

In my case, and it appears to be true for Spool too, I don't know what bit
will be set first. It could be 12 or it could be 2938251, so to prevent a
slowdown if the initial bit is a high place I use buckets of bitmaps, each
holding around 8 million bits.

[1] Read _Warning:_ <http://redis.io/commands/setbit>

~~~
antirez
There is an easy fix for this, that is also the recommended way to use bitmaps
in Redis: split your bitmap among multiple keys.

For instance you want to set bit i, but you want k bits per every key, the you
do:

    
    
        keyname = "bitmap:"+(i/k)
        keybit = i%k
    

k can be fairly large, like 128k bytes per key. It's still small but big
enough for keys overhead to be negligible.

~~~
firefoxman1
Wow, that's almost exactly how I implemented it:

    
    
       var bucketSize = 8190;
    
       ...
    
       var bucketNumber = Math.floor(userId / bucketSize),
           bitInBucket = userId % bucketSize;
    
    

...correction on my last comment, looks like I use ~8 thousand bits per
bucket, not 8 million.

------
latch
Repost from <https://news.ycombinator.com/item?id=3292542> if anyone's
interested in reading the comments from then.

------
mattlong
I know disk space is cheap these days, but at 16MB/metric/<level-of-
granularity>, it seems like your metric dataset would grow pretty quick. With
just 10 metrics tracked daily, thats another gigabyte per week. Of course it
does come with the benefit of maintaining all the raw data since you never
roll up or aggregate the data...so the pros probably outweigh that con. :)

~~~
kijin
Redis stores everything in RAM, and RAM is not as cheap as disk. Adding GB's
of RAM every week will quickly get rather expensive. But I guess you could
dump old data to disk and load it back to Redis only when you need it. It
might even compress well, depending on what the metrics track.

~~~
ewb
"But I guess you could dump old data to disk and load it back to Redis only
when you need it."

Redis has a mode which does this automatically I believe (and it's the default
if I remember correctly).

~~~
Erwin
Isn't Redis still single-threaded for queries, but saving in the background?
That seems a little risky: you've got your 100 million users setting bits in
your bitsets and suddenly everything blocks for 10 seconds while old data is
being loaded from disk.

------
reitzensteinm
It's not often you see 16MB these days and it turns out not to be a typo.

------
mattparlane
The only problem with this method is that it requires that IDs are integers,
start at 1 and increment by 1.

I'm using MongoDB and IDs are 12-byte values of which the first four are a
timestamp. Does anyone know of a way to make this method work, ideally without
adding another field to the collection?

~~~
simonw
The comments on the article address this - the OP is using UUIDs as the
primary key for their users, but each user is also assigned an "analytics key"
which is an integer that started at one. You can even use the redis INCR
command to generate these on demand.

------
ericd
Kind of a disingenuous title, since that time sort of implies that redis is
handling that many users and that that's the average response time...

~~~
pooriaazimi
Sorry - Although I think the word 'Metrics' must debunk that implication, I
can understand what you mean.

I hope it's clearer now.

was:

    
    
       Realtime Metrics with Redis: 128 Million Users + 16MB RAM = 50ms
    

is:

    
    
       Realtime Metrics for 128 Million Users with Redis: 50ms + 16MB RAM

~~~
ericd
I still think that those user numbers in the title evokes a mental image of a
certain type of load with a certain type of response time. I think if you got
rid of the response time, it would be less linkbaity, because then it's clear
that your focus is on the amount of storage it would take. It's not very
important either way.

