

Save Redis memory with large list compression - glebm
http://blog.glebm.com/2013/08/31/redis-compressed-lists.html

======
ihsw
Tumblr made a blog post[1] about exploiting similar compression techniques.

Using that technique, personally I've had great success storing data in hashes
by JSON-encoding it beforehand, where it would normally be like so:

    
    
        HSET user:159784623 username ihsw
        HSET user:159784623 email blahblah@gmail.com
        HSET user:159784623 created_at 1377986411
    

But instead it's like so:

    
    
        HSET user:156039 687 {"username":"ihsw","email":"blahblah@gmail.com","created_at": 1377986411}
    

Where we divide data into "buckets" that are 1024 in size and given an ID of
159784623, the resulting bucket ID is 156039 and the remainder is 687.

    
    
        id = 159784623
        bucket_size = 1024
        remainder = id % bucket_id
        bucket_id = (id - remainder) / bucket_size
    

Using this I've been able to reduce memory usage anywhere from 40%-80% (yes,
80%), which depends compressibility of the data (length and randomness of each
hash item).

I've also been replacing dictionary keys with integers and it further reduces
the size of the data being stored by an additional ~30%.

    
    
        HSET user:156039 687 {"0":"ihsw","1":"blahblah@gmail.com","2":"1377986411}
    

It shouldn't be underestimated how these simple techniques can make such a
significant impact, especially when the gains are quite considerable. JSON-
encoded data may be quite verbose, so CSV may add considerable gains too, but
JSON can accommodate missing keys.

Lists and sets can also accommodate "bucketing" of data, however that comes
with added complexity of accommodating the variety of redis commands that come
with those data structures (BLPOP, SADD, SDIFF, etc).

[1] [http://instagram-
engineering.tumblr.com/post/12202313862/sto...](http://instagram-
engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-
simple-key-value-pairs)

~~~
nemothekid
The blog post you referenced I think was done by instagram, not tumblr. I
remember reading it and doing something similar. They also found doubling the
`hash-max-ziplist-entries` options to 512 from 1024 also didn't cause much CPU
load.

Something you can also try as well is encoding with msgpack rather than JSON.
You can save 30% from the original JSON string without losing the keynames.

~~~
ihsw
Indeed it was Instagram, I stand corrected.

I'm aware of Msgpack and I was under the impression that it doesn't handle UTF
data very well, however that was some time ago and I may be mis-remembering.
Msgpack libraries are also supposed to be quite a bit faster at
encoding/decoding, right?

------
neuroscr
It's not clear if the wallclock savings was consistent over multiple runs.

And by moving one look-up out of redis into ruby, doesn't seem like the right
thing to do. That increases complexity by now requires an application layer to
process the data source.

I'd like to see how this compares with just simply increasing list-max-
ziplist-entries.

~~~
glebm
The times were consistent, clone the repo and run `rake bm` to try it. max-
ziplist cannot be set to the order of millions, since redis does have to unzip
it on every access and you wouldn't get any range lookup boost. You can
implement this in redis lua if app layer separation is a concern.

~~~
neuroscr
Ah in that context, this makes more sense. Thanks for the reply.

------
DrJosiah
Redis doesn't compress contents of lists (or hashes or zsets) in a "ziplist",
it packs them in a concise manner that omits structure overhead.

Source: the source code of Redis itself
[https://github.com/antirez/redis/blob/unstable/src/ziplist.c](https://github.com/antirez/redis/blob/unstable/src/ziplist.c)
, Redis documentation: [http://redis.io/topics/memory-
optimization](http://redis.io/topics/memory-optimization) , and/or my book:
[http://bitly.com/redis-in-action](http://bitly.com/redis-in-action) .

~~~
glebm
Thanks, so it's packing it not compressing, I have updated the article
accordingly.

