Marc's Blog

About Me

My name is Marc Brooker. I've been writing code, reading code, and living vicariously through computers for as long as I can remember. I like to build things that work. I also dabble in machining, welding, cooking and skiing.

I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I work on databases, serverless, and serverless databases. Before that, I worked on EC2 and EBS.
All opinions are my own.

Links

My Publications and Videos
@marcbrooker on Mastodon @MarcJBrooker on Twitter

Optimism vs Pessimism in Distributed Systems

What—Me Worry?

Avoiding coordination is the one fundamental thing that allows us to build distributed systems that out-scale the performance of a single machine1. When we build systems that avoid coordinating, we end up building components that make assumptions about what other components are doing. This, too, is fundamental. If two components can’t check in with each other after every single step, they need to make assumptions about the ongoing behavior of the other component.

One way to classify these assumptions is into optimistic and pessimistic assumptions. I find it very useful, when thinking through the design of a distributed system, to be explicit about each assumption each component is making, whether that assumption is optimistic or pessimistic, and what exactly happens if the assumption is wrong. The choice between pessimistic and optimistic assumptions can make a huge difference to the scalability and performance of systems.

I generally think of optimistic assumptions as ones that avoid or delay coordination, and pessimistic assumptions as ones that require or seek coordination. The optimistic assumption assumes it’ll get away with its plans. The pessimistic assumption takes the bull by the horns and makes sure it will.

To make this concrete, let’s consider some examples.

Example 1: Caches

Distributed caches almost always make assumptions about whether the data they are holding is changed or not. Unlike with CPUs2, distributed caches typically aren’t coherent, but we still want them to be eventually consistent. By eventually consistent we mean that if the write stream stops, the caches eventually all converge on containing the same data. In other words, inconsistencies are relatively short-lived.

Possibly the most common way of ensuring this property—that inconsistencies are short-lived—is with a time to live (TTL). This simply means that the cache only keeps items around for a certain fixed period of time. The TTL provides a strong3 upper bound on how stale an item can be. This is a simple, strong, and highly popular mechanism. It’s also a pessimistic one: the cache is doing extra work assuming that the item has changed. In systems with a low per-item write rate, that pessimistic assumption can be wrong much more often than it’s right.

One downside of the pessimistic approach TTL takes is that it means the cache empties when it can’t talk to the authority. This is unavoidable: caches simply can’t provide strongly bounded staleness (or any other strong recency guarantee) if they can’t reach the authority4. Thus the pessimistic TTL approach has a strong availability disadvantage: if a network partition or authority downtime lasts longer than the TTL, the cache hit rate will drop to zero.

Two more optimistic patterns are quite commonly used to address this situation (especially in DNS and networking systems). One approach is to synchronously try fetch the new item, but then optimistically continue to use the old one if that’s possible (optimistic because it’s making the optimistic assumption that the item hasn’t change). A subtly different approach is to asynchronously try fetch the new item, and use the old one until that can complete. These protocol seem very similar to TTL, but are deeply fundamentally different. They don’t offer strong recency or staleness guarantees, but can tolerate indefinite network partitions5.

Example 2: OCC

Optimistic concurrency control and its tradeoffs with pessimistic locking-based approaches is a classic topic (maybe the most classic topic) in distributed databases. I won’t try advance that debate here. Instead, to summarize: optimistic concurrency control is a way of implementing isolated (as in ACID I) transactions that assumes that other concurrent transactions don’t conflict, and detecting at the last moment if that assumption is wrong. Pessimistic approaches like the classic two-phase locking, on the other hand, do a whole lot of coordination based on the assumption that other transactions do conflict, and it’s worth detecting that early while there’s still time to avoid duplicate work and make smart scheduling decisions.

OCC systems, in general, coordinate less than pessimistic systems when their optimistic assumption is right, and more than pessimistic systems when the optimistic assumption is wrong.

Comparing these two is approaches is a hard enough first-order problem, but to complicate things further the choice between optimism and pessimism leads to a number of second-order problems too. For example, the number of contending transactions depends on the number of concurrent transactions, and the number of concurrent transactions depends on lock wait times in pessimistic systems and retry rates in optimistic systems. In both kinds of systems, this leads to a direct feedback loop between past contention and future contention.

Example 3: Leases

Leases are a kind of time-based lock widely used in distributed systems. In most systems, a lease is replacing a number of coordination steps. One component takes a lease, and then uses that lease as a license to multiple things without worrying that other components are doing conflicting things, or may disagree, or whatever. Freed from the worry about conflicts, the lease-holding component can avoid coordinating and go ahead at full speed.

Leases are an interesting blend of pessimism (I’m assuming other things are going to conflict with my work, so I’m going to stop them in their tracks) and optimism (I’m assuming I can go ahead without coordination for the next bounded period of time). If the pessimism is wrong, all the heartbeating and updating and storing of leases is wasted work. As is the time other components could have spent doing work which they wasted while waiting for the lease.

Conclusion

One way I like to reason about the behavior of systems is by writing sentences of the form “this component is assuming that…”

For our TTL example, we could write statements like:

These statements are a tool to help structure our thinking about the behavior of the system. The third one—the availability-staleness tradeoff—is especially powerful because its often a hidden assumption people make when choosing a strict TTL.

By coloring each assumption as pessimistic (coordination-requiring) or optimistic (coordination-avoiding), we can also structure our thinking about the best time to coordinate, and make sure we’re being consistent in our choices about when and why coordination is needed.

Footnotes

  1. And, in a lot of ways, the fundamental thing that allows us to build machines that out-scale the performance of a single in-order core.
  2. Or some CPUs, at least. Most of the CPUs we’re familiar with keep their caches coherent using protocols like MESI. These protocols are interesting, because they allow coordination avoidance for unmodified items, at the cost of tracking state and ownership and assuming that the coherency protocol is correctly executed by all participants.
  3. Only strong if the TTL clock starts ticking at the time the item fetch started. Most implementations don’t do this, and instead start the clock at the time the item fetch ended, leading to potentially unbounded staleness.
  4. Following a similar argument to the one Bailis et al make in Section 5.2 of Highly Available Transactions, for which they cite Gilbert and Lynch somewhat hand-wavingly. I will continue the hand-waving here.
  5. If you’re a CAP theorem kinda person, you might call TTL a CP system and these variant AP systems. But that would mostly serve to highlight the limitations of CAP thinking, because none of these variants are C. If you’re a PACELC kinda person, you might call the strict TTL variant PCEL, and the less-strict variants PAEL.
  6. But if you’re interested in learning more about it, check out this Andy Pavlo lecture, or Harding et al’s excellent 2017 paper An Evaluation of Distributed Concurrency Control, or Kung and Papadimitriou’s classic 1979 paper An Optimality Theory of Concurrency Control for Databases, or Agrawal et al’s 1987 classic Concurrency Control Performance Modeling: Alternatives and Implication (thanks Peter Alvaro for reminding me about this one), or the OCC OG On Optimistic Methods for Concurrency Control.