Two weeks ago, I wrote an article titled Exponential Backoff and Jitter for the AWS Architecture blog. It looks at OCC in particular, but the lessons are applicable to all distributed systems. The bottom line is that exponential backoff is good, but not sufficient to prevent both wasted time and wasted effort.
Communication in distributed systems isn’t the only place that adding randomness comes in handy. It’s a remarkably wide-spread idea, that’s found use across many areas of engineering. The basic pattern across all these fields is the same: randomness is a way to prevent systematically doing the wrong thing when you don’t have enough information to do the right thing.
One classic distributed systems example is in the paper The Synchronization of Periodic Routing Messages (thanks tptacek). Sally Floyd1 and Van Jacobson2 simulate synchronization emerging in previously unsynchronized systems communicating over a network. This leads to short-lived spikes in contention, and other correlated effects on the network. Their solution is to add randomness, which breaks the loop that creates synchronization. While the exact set of protocols and technologies they look at is very 1990s, the lessons are timeless.
Closely related to these uses of jitter is dither, or adding noise to prevent artifacts when quantizing. Dither is most visible in images, where it can make a huge difference in quality3:
Technically, dither is a way to remove correlation between quantization error and the signal being quantized. That sounds complex, but the underlying concept is extremely simple. Imagine a simple system where we’re rounding a vector of reals to the nearest integer. If those reals are nicely distributed, it works well, but sometimes it works very poorly. If we start with
[ 1.4, 1.4, 1.3, 1.4, 1.2, 1.4, 1.1, 1.0, 1.4 ]
it rounds to
[ 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
leaving the error
[ 0.4, 0.4, 0.3, 0.4, 0.2, 0.4, 0.1, 0.0, 0.4 ]
There are two problem here. We’ve introduced a bias, because all the errors are positive. The error also looks a whole lot like the signal, and there’s clearly information in the signal that’s left in the error. The solution is to add some noise, the simplest case being uniform noise of half a quantization level (in our case, between -0.5 and 0.5).
[ 1.52, 1.09, 1.34, 1.04, 1.31, 1.83, 0.93, 1.49, 1.67 ]
After rounding, we’re left with the error
[ -0.6, 0.4, 0.3, 0.4, 0.2, -0.6, 0.1, 0.0, -0.6 ]
Which has much less bias (-0.6 versus +2.6), and the remaining noise doesn’t look like the underlying signal. That’s a good thing if you care about spectral artifacts.
One point of talking about jitter and dither together is to point out the similarities. In both cases, we’re looking to spread out our error. In the case of jitter it’s error that we have because we don’t have complete knowledge of our distributed system. In the case of dither it’s error we’re introducing to have the opportunity to throw out some information. The other point is to invite thought about the advanced techniques of dither (such as error diffusion and noise shaping) and whether they have useful analogs in distributed systems.
Footnotes: