
Redis latency spikes and the 99th percentile - r4um
http://antirez.com/news/83
======
aaronharnly
A coworker pointed me to this interesting post noting that in a browser-facing
web server (not the same as hits to a datastore), most users will experience
the 99th percentile for at least one resource on the page.

[http://latencytipoftheday.blogspot.it/2014/06/latencytipofth...](http://latencytipoftheday.blogspot.it/2014/06/latencytipoftheday-
most-page-loads.html)

~~~
baruch
This is only true if these latency outliers are evenly distributed over time,
if they indeed happen once every 30 minutes than only users requesting a page
at the same time will be affected.

~~~
aaronharnly
Excellent point, yes.

------
willvarfar
A nice follow-up and details digging into yesterday's "This is why I can't
have conversations using Twitter" post:

[https://news.ycombinator.com/item?id=8526208](https://news.ycombinator.com/item?id=8526208)

~~~
hardwaresofton
Agree.

I really wanted to comment "just always make a blog post in reaction, twitter
is not a good medium for explaining complex ideas" on that thread, but thought
it would be unproductive.

Please just do this every time, people who care will read, people who don't
care will probably not comment for fear of looking stupid.

------
perlgeek
Why is forking on xen slow? Most google hits for "xen forking slow" seem to
point to some discussions about redis, but I guess other software would suffer
from that too.

~~~
nknighthb
It's important to remember two things: First, in a more general way, fork() is
kind of slow just by virtue of being a system call and involving several
steps, so applications try to avoid making lots of calls to fork (e.g.
webservers long ago stopped doing the naïve fork-for-every-request model).

Redis uses fork in a way nobody else seems to. Applications using tens of
gigabytes of RAM -- databases, media editing, etc. -- just don't usually fork
except during a startup process.

Redis uses fork for persistence. It's clever, and it works great in general,
but it's weird! Unique, even. It seems much more likely to hit a fork()
weakness in a noticeable way than almost any other application out there.

~~~
JoeAltmaier
Fork has always been a mutant feature. Split the entire process into twins? So
that the caller can immediately exec() and discard the tediously-cloned twin
and become a different program? Its the most egregiously inefficient feature
ever to grace an operating system. Page tables, stacks, heap allocations, file
handles - all cloned and then discarded.

~~~
antirez
In this Regard, Redis is the first system using it the proper way I guess ;-)
I mean, all this copying is not discarded but used to create a point-in-time
snapshot.

~~~
mcguire
No, every Unix command shell uses the copied program and file descriptors.

------
eamsen
Side note: you can set you Y axis format to "ms" in Grafana to make the values
more descriptive and add the "Points" chart option under display styles to
make the mean values visible, which are obscured by the 90th percentile bar in
your chart. Also I assume label is wrong, it says 90ths percentile in the
chart, but you speak of the 99th percentile.

~~~
antirez
Graphs are not mine, it's from Stripe engineers, so I've no control in how
they are generated. About 90th vs 99th, I was talking about 99th but in the
case of Redis latency spikes due to fork, you would get exactly the same graph
as _all_ the requests are delayed in this moment.

~~~
eamsen
Missed the "Stripe blog post" part, sorry for the misdirection.

~~~
antirez
np at all, thanks.

------
arielweisberg
I'm kind of glad Redis did the fork approach first. It's the reason I went
with a userspace COW implementation in my work instead of forking and that
paid huge dividends. It's the difference between starting COW in 10-20
milliseconds versus seconds and most of that time is distributed coordination
not flipping the COW boolean.

When you crank up the density per node to 256 or 512 gigabytes even bare metal
is problematic and in some domains like telecommunications they don't care
that the spikes are concentrated because they cause cascading failures.

I think a userspace COW implementation in Redis would be a big project because
you would need a different strategy for every data structure. Being single
threaded also makes it challenging to get other/more cores to do the
serialization and IO. It's very doable just not within the current philosophy
of thou shalt have one thread per process.

~~~
eldavido
I think "userspace COW" is the wrong approach here.

The entire idea of getting a point-in-time snapshot of something subject to
rapid change is problematic. Your options boil down to (1) make a "snapshot"
and save that (Redis's current approach) or (2) accept that point-in-time
consistency might be impossible, and work around it.

I wonder whether it'd be possible to have two persistence strategies in Redis:
"consistent" and "low-latency". "Consistent" would use the current fork(2) COW
behavior, "low-latency" would do some kind of one-chunk-at-a-time block copy
and amortize the latency spike over the entire operation, while having the
overall effect of less of a "cliff" to latency.

------
r4um
fork test by redis labs (2012) [https://redislabs.com/blog/testing-fork-time-
on-awsxen-infra...](https://redislabs.com/blog/testing-fork-time-on-awsxen-
infrastructure#.VFJNvtaiPEI)

~~~
antirez
cc1.4xlarge is on pair with bare metal apparently...

~~~
pja
That would make sense if the other EC2 Xen hosts were running PV guests (as
was required for the smaller / older EC2 hosts IIRC? ) & the cc1.4xlarge was
PVHVM, according to: [http://lists.xen.org/archives/html/xen-
devel/2012-06/msg0102...](http://lists.xen.org/archives/html/xen-
devel/2012-06/msg01021.html)

~~~
bbgm
cc1.4xlarge was the first Linux HVM EC2 instance. All new families support HVM
(and some are HVM only)

 _Should probably add a disclosure here. I ran the EC2 instance product
management team for several years and continue to be involved._

~~~
pja
Looks like a smoking gun then.

------
toddh
Curious, why fork in the main thread? Forking traditionally is a pretty
heavyweight operation. Perhaps versioning might be more performant?

~~~
eldavido
Redis uses a reactive architecture using non-blocking I/O. They fork to get a
point-in-time consistent snapshot that can be written to disk. The problem is
that fork blocks, and while blocked, it stalls the event loop, subjecting
incoming requests to stalls.

~~~
fulafel
In Unix blocking means going into IO wait. In this case it's just slow when
emulated ("paravirtualiuzed") by Xen.

------
Beltiras
I know that redis really wants to be a persistent kvstore. I had a problem
with a large website when I increased caching by 2 orders of magnitude (enough
RAM to play with). When it came time to write a snapshot to disk, everything
died for 5 minutes. Turned it off and haven't thought about it since. I'm not
sure I'll ever shed my RDBMS predelections.

------
eldavido
This is a huge problem on wall street, where trades must have predictable
latencies. Stop-the-world garbage collectors are another source of latency
"catastrophes".

------
personZ
Does fork not use copy-on-write? I would expect it to add overhead to all
overlapped memory operations going forward, but am really surprised if it
literally duplicated the entire memory contents.

~~~
giuliano108
Memory fragmentation seems to play a role too. Even if you have (barely)
enough free RAM, machines can start swapping and a Redis server in that state
is bound to wake you up at night... :(

~~~
xxs
If you run a server with anything but "swapoff -a" you're doing it wrong. (or
even having any swap partition in /etc/fstab). If the server has to swap, get
more RAM or scale your stuff out but never swap.

