
Improving Key Expiration in Redis - ngaut
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/improving-key-expiration-in-redis.html
======
antirez
Note: Redis 6 expiration will no longer be based on random sampling but will
take keys sorted by expire time in a radix tree, so all this problems are
going away because of an architectural change. I talked about this in the
latest 1/2 years in several conferences but never wrote anything. I'll follow
up with an implementation inside the unstable branch soon since this need to
stabilize and be tested a lot before Redis 6 goes stable.

~~~
anitil
Great to hear.

Do you have any insight in to why there's the drop in performance on the
problematic line in the article?

------
amluto
The problematic code is:

    
    
        if (dbs_per_call > server.dbnum || timelimit_exit)
            dbs_per_call = server.dbnum;
    

The post focused a lot of microarchitectural issues, but I saw no mention of
much bigger fish: is it _really_ the case that the if statement always
evaluates to false in Twitter’s use case? Knowing nothing about the code in
question, I bet that the if condition, in fact, sometimes or always evaluates
to true, and that the code doesn’t work as well when this happens.

~~~
plasma
Yes I wondered if `timelimit_exit was the cause as well.

~~~
ShroudedNight
I'm not sure that's possible in this case:

1\. The || operator should short-circuit

2\. I'm not sure how a jg would sanely be used to check what is effectively a
boolean.

------
mmastrac
> The if statement compiled to 3 important instructions, mov, cmp and jg. Mov
> will load some memory into a register, cmp will compare two registers and
> set another based off the result and jg will do a conditional jump based off
> the value of another register.

Interesting puzzle. Is it possible that asking the compiler to make this jump
likely or unlikely (GCC's __builtin_expect) to be taken might help?

~~~
lttlrck
> We took advantage of gcc’s __builtin_expect to change how the code was
> compiled. It didn’t make any difference in performance.

~~~
mmastrac
Ah yeah - missed that part on the first readthrough.

------
Raidion
This is super interesting stuff, simply because of the thought process. Every
once and a while you run into those tech problems that really push your limits
and it's cool to see how they worked through it.

------
commandlinefan
I'm always jealous when I read a blog post like this, because I can't imagine
working in an environment where you can actually spend time investigating
something or even start something without telling somebody how long it's going
to take (give or take fifteen minutes) and the answer better not be more than
four hours.

~~~
anitil
I'm jealous that they didn't adopt a solution that they didn't understand. So
many times I've fixed unknown problems in terrible ways and not been given
time to actually find the core problem.

Like 'filesystem access is slowing us down' so we move the filesystem in to
RAM. Well, it's fast now, but why are we hitting disk so much?

------
willvarfar
That code line looked uncomfortably wrong:

    
    
        if (dbs_per_call > server.dbnum || timelimit_exit)
            dbs_per_call = server.dbnum;
    

So, there's a single database... what if its `server.dbnum` is zero? This is
then setting `dbs_per_call` to zero? Is `dbs_per_call` a global variable? Etc.

Of course we don't have the wider code context here, but that if-statement
looks like its the kind of check to exit a loop and it ought be followed by a
`break` or `return` statement etc.

------
ShroudedNight

        if (dbs_per_call > server.dbnum || timelimit_exit)
            dbs_per_call = server.dbnum;
    

Based on [http://download.redis.io/redis-
stable/src/expire.c](http://download.redis.io/redis-stable/src/expire.c) :

\- `server.dbnum` appears to be a global

\- `timelimit_exit` is a function static, and appears to always be one of {0,
1}. It's not obvious to me that it would 'prefer' either value.

\- `dbs_per_call` (along with `timelimit_exit`) is used as part of the
termination condition for the induction variable.

I guess that that I don't find it too surprising (in retrospect) that this
conditional is expensive. Although, I must admit that I would have suspected
timelimit_exit as being the source of the latency.

I do find it interesting that branching on the value of server.dbnum is much
slower than using the same value in an unconditional assignment, given that
the result is presumably used shortly later in another conditional as part of
the for loop.

------
terryf
That was a fun read - I'd hazard a guess that the load from memory is not
cached and will get started, then the compare is speculatively executed, but
then the conditional jump and whatever follows won't fit into the cpu pipeline
and will cause it to wait for the fetch to complete.

One thing you might try is to add a superfluous fetch of the same memory
location a few hundred instructions before to get it going before you hit the
jump, so it's cached already by that time.

