

YouTube Strategy: Adding Jitter Isn't A Bug  - yarapavan
http://highscalability.com/blog/2012/4/17/youtube-strategy-adding-jitter-isnt-a-bug.html

======
adpowers
I worked on a project that implemented a task queue using a database table.
Workers would SELECT FOR UPDATE a couple of rows, mark themselves as owners,
commit the change, and then do the work outside of the transaction.

The workers were configured to fetch new tasks every 30 seconds. With 10
workers you'd expect tasks to get fetched from the queue every 3 seconds, but
that is not what we were seeing. The tasks were only getting picked up on 30
second boundaries. What was going on?

It turns out that the tasks were piling up. As soon as one task tried to
update at the same time as another it would get blocked on the database lock.
Its own transaction would then run really quickly following the first
transaction. However, since these two tasks were run in immediate succession,
now they were synced for life. They both slept for exactly 30 seconds, the
first one wakes up a few tens of milliseconds earlier and grabs the lock, the
second one wakes up and blocks on the lock, and this happens in perpetuity.
Eventually, due to small randomness, all tasks entered lockstep and would be a
small thundering herd against the database.

This was noticed by a developer and fixed by introducing a small jitter in the
sleep time. After the push our tasks were picked up in three seconds and our
end-to-end workflow time got substantially shorter.

------
sophacles
I'm not sure I like overloading the term jitter in this context. The title
made me assume they meant packet jitter, aka the standard deviation around
packet delivery times, which video is sensitive to when not taken into
account. This is a similar concept around cache and resource access times, but
a desirable property rather than a something to deal with or eliminate.

~~~
simmons
Heh. I've been immersed in video coding for the past few weeks, so not only
was I shocked and intrigued at the title, but it took me a few passes to
realize what was actually being said. :)

------
gdubs
So the idea that Jurassic Park would have hired a Chaotician to do systems
analysis was completely realistic. Except, in this real world example
turbulence is introduced to _maintain_ stability. Fascinating.

~~~
MockMyBeret
Read about Spread Spectrum <http://en.wikipedia.org/wiki/Spread_spectrum>

------
nwmcsween
Why rely on jitter to fix bottlenecks? Why not sample the bottleneck and do an
action based on feedback? e.g. update:

    
    
      if (resource_busy)
        retry when resource free
    

where retry is a random time based upon statistics of past usage. I don't like
arbitrary random solutions when measurement will work much better.

~~~
Drbble
Uh, your solution is to retry after random delay? That's jitter.

~~~
MockMyBeret
Retry after random delay <> jitter. Jitter is the deviation of a signal from
its reference source. Retrying after a random delay is standard collision
avoidance.

------
MockMyBeret
[http://en.wikipedia.org/wiki/Carrier_sense_multiple_access_w...](http://en.wikipedia.org/wiki/Carrier_sense_multiple_access_with_collision_detection)

------
K2h
here they talk about random jitter (when it is a result) of approaching a
Gaussian distribution. I bet the random jitter they are injecting for time
delay in the article is not Gaussian, but flat.

<http://www.tititudorancea.org/z/jitter.htm>

