
Defeat your 99th percentile with speculative task - jrpelkonen
https://bjankie1.github.io/blog/
======
colonelxc
"The percentile values we observed had quite unusual distribution" It is
pretty normal to have a long tail in the last few percentile/tenths.
Fortunately, most latency monitoring tools do account for this now. Another
gotcha is that 99.9 isn't the end of your tail either. Sometimes looking at
the "100th percentile" request isn't useful/such an outlier, but you should
know it exists.

(regarding "Speculative task") Also called "Hedged Request" here in an article
called "The Tail at Scale"[1]

[1] [http://www-
inst.eecs.berkeley.edu/~cs252/sp17/papers/TheTail...](http://www-
inst.eecs.berkeley.edu/~cs252/sp17/papers/TheTailAtScale.pdf)

~~~
zzzcpan
Note that overhead that matters for Google doesn't matter as much for most
other companies. Don't overcomplicate things just because Google does it like
this.

------
spullara
I have found in many distributed systems the latency of top level requests
follows a log-normal distribution, which sounds like what you are describing.
But tuning when you launch backup requests you can trade off overhead for
reduced tail latencies. After experimenting with it at Twitter I then found a
Jeff Dean paper from Google describing it as well. Haven't been able to find
that paper again though. Here is his presentation where he discusses backup
requests:

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44875.pdf)

~~~
hnaccy
>Non-intuitive: remove capacity under load to improve latency (?!)

Does he elaborate on this?

~~~
spullara
The most common case I can think of is when you have local caches and randomly
distribute traffic amongst your servers. The hit rate on the cache improves as
you reduce the number of machines. If the cache expires before the same
machine is hit again, you get no benefit from the cache at all. Even in the
case where you are hitting the cache you are doing more original requests as
you add more servers.

------
KirinDave
I've seen this approach appear without a name very often in highly concurrent
Haskell, Erlang and Elixir code. Often a "safe" race combinator appears in
good libraries that handles graceful termination. So much so that you can
often find local libraries that look like this Haskell code:

    
    
        speculatively :: Int -> IO a -> IO (Either a a)
        speculatively ms action = race action (threadDelay ms >> action)

------
philsnow
If your backend is Redis, how is starting more speculative hits to Redis going
to help, since it's single threaded?

I mean if this is indeed helping it seems that it's not Redis that's the long
pole, but maybe the request routing mesh or something.

~~~
xjia
Variations are probably caused by TCP IMO

~~~
limeyx
Wonder if for certain requests that are guaranteed to return smallish data
sizes, having Redis support UDP would make sense and remove the TCP issues ...

------
maho
But wouldn't this solution mean that once there is a small glitch in the
system, say due to higher load than normal, there are 2x the requests and the
system goes down completely?

I don't mean to be negative of the solution, I am just curious.

~~~
cortesoft
Yes, I am always suspicious of techniques that add additional load under
failure states. Good way to cascade your failure.

------
tshanmu
isn't it just masking the root cause of whatever thats causing the delay in
the first place?

~~~
KirinDave
Why does the consumer of a remote resource care about an emphemeral root
cause? It's a massive scope creep for your API, which has SLAs to keep.

For all you know you're in the middle of a horizontal scaling event and hit an
overloaded box, and there is no problem.

~~~
madamelic
I agree.

There is no point in having your service try to speculate what happened when
that isn't its job.

As long as your service doesn't need to ensure things happen only once or you
build in that recovery mechanism (identifying unique events and throwing them
out if your system has already seen them), being able to toss work to another
instance is great in my mind.

~~~
bajsejohannes
I think the OP is saying the people involved should investigate the root cause
instead of working around the problem. Not that the _service_ should try to do
it when needed somehow.

Like philsnow says in a different comment
([https://news.ycombinator.com/item?id=16832566](https://news.ycombinator.com/item?id=16832566))

> If your backend is Redis, how is starting more speculative hits to Redis
> going to help, since it's single threaded?

I agree; really understanding this problem is a lot better than just blindly
retrying because it seems to work. Is there something wrong with the network
that'll affect every service?

Retrying could be the solution, but it should be so with the acknowledgement
that it's incurring technical debt.

~~~
viraptor
I think it's a false dichotomy if you're thinking of finding the root cause
_instead_ of speculative requests.

You can do none, one of them, or both. You may have team skills to tackle
none, one, or both. You may or may not control / have access to one of the
sides. You may have a timeline for solving this problem where you decide which
task is faster to finish. Finally, you may have a recurring problem which
makes you lose a specific amount of money every time and a speculative request
stops that now, while putting people on a performance debugging mission may
only potentially have a positive return in the future.

Sure, it's a technical debt. It's not illegal - you just need a really good
reason to commit to it.

------
karmakaze
A week ago I might have thought this would be good for our microservices, as
they were already using short timeouts with a retry but abandoning the
original request. This is better. However it has limitations and isn't the
solution. It suffers from request amplification if the called service also
depends on other services and applies the same retry policy. It it still
request latency bound. The main takeaway I've learned for bounding
microservices is that the inflow of dependent data should be asynchronous and
request latency dependent only on the service's data.

------
pspeter3
This seems like a useful practice. Do you log how often this happens and which
task is the winner? It seems like without that, you may not notice actual
issues.

~~~
KirinDave
Since the tasks are identical, the only real value such a metric would have is
in considering memory allocation for green threads and other issues around
memory pressure and resource pooling.

Usually, it doesn't matter why the consumer finds it slow; it could be
anything from an unusual I/O issue (AWS is notorious for brief but painful
networking issues) to a VM peer suddenly becoming ill behaved before being
throttled.

~~~
dswalter
Is your list of things AWS is notorious for available on your blog somewhere?
That's valuable information.

~~~
KirinDave
AWS is notorious for having absolutely awful networking. Every other CSP rep
rightly digs on them.

They're also notorious becuass ECS is SO BAD.

~~~
Scramblejams
In your experience, who’s got the best networking?

