
Exponential Backoff and Jitter - alexbilbie
http://www.awsarchitectureblog.com/2015/03/backoff.html
======
arghbleargh
> It’s worth noting that none of these approaches fundamentally change the N^2
> nature of the work to be done, but do substantially reduce work at
> reasonable levels of contention.

I don't think that statement from the article is true if I understood
correctly... the performance gains are precisely because you're reducing from
N^2 to N log N.

An interesting theoretical question would be to identify the optimal backoff
policy. I think FullJitter should be close to optimal, but maybe you can
squeeze out a little more with something more sophisticated.

Edit: I just realized the DecorrelatedJitter (not sure why it's called that?)
makes a lot of sense as a minor optimization because if you already waited a
long time and still failed, that suggests there is a lot of contention, and
you should wait even longer.

------
tptacek
Great post. There's a classic LBL paper about self-synchronization in (I
think) time protocols that demonstrates the same conclusion in a more dramatic
way.

~~~
mkempe
Maybe this one? "The Synchronization of Periodic Routing Messages" (1994) [1]

[1]
[http://ee.lbl.gov/papers/sync_94.pdf](http://ee.lbl.gov/papers/sync_94.pdf)

~~~
tptacek
Yep! Sally Floyd f.t.w!

Thanks for finding that.

------
diggs
What a coincidence, I found myself writing a little Go lib for exponential
back off yesterday, and was wondering what the purpose of introducing jitter
was and what effects it would have. I need it for interacting with the Twitter
stream api and they're pretty clear that when they say exponential they mean
it. But jitter looks very useful in some situations after reading this, so I
added that to my lib as well...

It's here if anyone's interested it [https://github.com/Diggs/go-
backoff](https://github.com/Diggs/go-backoff)

~~~
ekimekim
If we're talking about implementations in various languages, I might as well
link to my capped backoff with jitter implementation for python.

[https://github.com/ekimekim/pylibs/blob/master/libs/backoff....](https://github.com/ekimekim/pylibs/blob/master/libs/backoff.py)

(apologies for the long link, I keep all my one-file python libs in a single
git repo for sanity. I _think_ i've uploaded this one to pypi, where it's
available under the name "backoff")

~~~
RickHull
Hi Mike!

Thought you might be interested in my Elixir impl, though not as featureful:
[https://github.com/rickhull/backoff/blob/master/lib/backoff....](https://github.com/rickhull/backoff/blob/master/lib/backoff.ex)

Example usage:
[https://github.com/rickhull/backoff/blob/master/examples/dem...](https://github.com/rickhull/backoff/blob/master/examples/demo.exs)

------
powertower
Wouldn't this be completely alleviated by placing each new connection into a
queue, instead of rejecting it and telling the client to try again? Or am I
reading this wrong?

~~~
djb_hackernews
The simplest way to understand OCC is to basically take any of your models and
add a version attribute and whenever you update an object you also do an
UPDATE ... WHERE version={obj.version}.

This is useful when you don't want writes to overwrite previous writes. If the
version check doesn't hold it fails letting your client know that your
assumptions about the state of the data are wrong. You then have a few
options, like either redoing your computation using the new state or not doing
anything at all because the current state is good.

------
TheLoneWolfling
The version with jitter but without exponential backoff is still doing O(n^2)
work, I'm pretty sure, although with a (much) smaller constant.

Just something to keep in mind.

Also, I find the inherent time taken / work done tradeoff interesting.
Something involving a limited amount of server state might work better on the
work front while keeping the work done limited. (Something like "please don't
try again for x ms" sent as a reply)

The ideal case is, what, 2n-1 calls (everyone sends at once, server replies to
the first guy and schedules all other clients such that there is no
contention) and O(n) (what is the scale factor here? 20ms?) delay? Are there
any algorithms that come close to the limit?

------
oautholaf
I wish the experimenter had explored random distributions other than uniform.
I suspect uniform distribution may be non-ideal here and other distributions
may produce better results.

------
encoderer
I've always thought of "jitter" as just adding a dash of entropy to the system
I'm building.

I suppose a textbook example is... If I'm managing a pool of long-running
threads I want to periodically bounce, I could write logic to throttle
respawning threads. Or I could just add a rand() to the conditional --
introduce "jitter" \-- and let probability theory "throttle" for me.

I'd rather build on top of probability than statistics.

------
msellout
Randomness is an easy way to ensure you're not doing something systematically
wrong.

------
kabdib
Nice post.

Use of pure color in a graph makes me sad, though. This colorblind guy had to
crack open a paint program and match up RGB values. A few dotted/dashed lines
go a long way (and I suspect, for more than just colorblind folks, too).

~~~
TheLoneWolfling
Next time, just do a hue shift. Alternatively, there are filters that lighten
red / darken green (or vice versa).

(Assuming red/green. There are analogous methods for other types of color
blindness, however.)

