Hacker News new | past | comments | ask | show | jobs | submit login
Improving Key Expiration in Redis (blog.twitter.com)
118 points by ngaut 9 days ago | hide | past | web | favorite | 14 comments

Note: Redis 6 expiration will no longer be based on random sampling but will take keys sorted by expire time in a radix tree, so all this problems are going away because of an architectural change. I talked about this in the latest 1/2 years in several conferences but never wrote anything. I'll follow up with an implementation inside the unstable branch soon since this need to stabilize and be tested a lot before Redis 6 goes stable.

Great to hear.

Do you have any insight in to why there's the drop in performance on the problematic line in the article?

The problematic code is:

    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;
The post focused a lot of microarchitectural issues, but I saw no mention of much bigger fish: is it really the case that the if statement always evaluates to false in Twitter’s use case? Knowing nothing about the code in question, I bet that the if condition, in fact, sometimes or always evaluates to true, and that the code doesn’t work as well when this happens.

Yes I wondered if `timelimit_exit was the cause as well.

I'm not sure that's possible in this case:

1. The || operator should short-circuit

2. I'm not sure how a jg would sanely be used to check what is effectively a boolean.

> The if statement compiled to 3 important instructions, mov, cmp and jg. Mov will load some memory into a register, cmp will compare two registers and set another based off the result and jg will do a conditional jump based off the value of another register.

Interesting puzzle. Is it possible that asking the compiler to make this jump likely or unlikely (GCC's __builtin_expect) to be taken might help?

> We took advantage of gcc’s __builtin_expect to change how the code was compiled. It didn’t make any difference in performance.

Ah yeah - missed that part on the first readthrough.

This is super interesting stuff, simply because of the thought process. Every once and a while you run into those tech problems that really push your limits and it's cool to see how they worked through it.

I'm always jealous when I read a blog post like this, because I can't imagine working in an environment where you can actually spend time investigating something or even start something without telling somebody how long it's going to take (give or take fifteen minutes) and the answer better not be more than four hours.

I'm jealous that they didn't adopt a solution that they didn't understand. So many times I've fixed unknown problems in terrible ways and not been given time to actually find the core problem.

Like 'filesystem access is slowing us down' so we move the filesystem in to RAM. Well, it's fast now, but why are we hitting disk so much?

That code line looked uncomfortably wrong:

    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;
So, there's a single database... what if its `server.dbnum` is zero? This is then setting `dbs_per_call` to zero? Is `dbs_per_call` a global variable? Etc.

Of course we don't have the wider code context here, but that if-statement looks like its the kind of check to exit a loop and it ought be followed by a `break` or `return` statement etc.

    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;
Based on http://download.redis.io/redis-stable/src/expire.c :

- `server.dbnum` appears to be a global

- `timelimit_exit` is a function static, and appears to always be one of {0, 1}. It's not obvious to me that it would 'prefer' either value.

- `dbs_per_call` (along with `timelimit_exit`) is used as part of the termination condition for the induction variable.

I guess that that I don't find it too surprising (in retrospect) that this conditional is expensive. Although, I must admit that I would have suspected timelimit_exit as being the source of the latency.

I do find it interesting that branching on the value of server.dbnum is much slower than using the same value in an unconditional assignment, given that the result is presumably used shortly later in another conditional as part of the for loop.

That was a fun read - I'd hazard a guess that the load from memory is not cached and will get started, then the compare is speculatively executed, but then the conditional jump and whatever follows won't fit into the cpu pipeline and will cause it to wait for the fetch to complete.

One thing you might try is to add a superfluous fetch of the same memory location a few hundred instructions before to get it going before you hit the jump, so it's cached already by that time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact