I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I work on databases, serverless, and serverless databases. Before that, I worked on EC2 and EBS.

All opinions are my own.

@marcbrooker on Mastodon @MarcJBrooker on Twitter

While reading distributed systems papers, one of the names that comes up most often is Nancy Lynch’s. From a standard textbook for university distributed systems courses (Distributed Algorithms), to some of the earliest successful results on consensus, to the proof of the CAP theorem, Lynch’s name is everywhere. In the same spirit as The Essential Leslie Lamport, I thought I’d write about some of my favorite Nancy Lynch papers. The criteria are the same as last time: I like these papers for some reason. I’d probably make a different list if I wrote this post again next week.

What good are impossibility results, anyway? They don’t seem very useful at first, since they don’t allow computers to do anything they couldn’t previously.

A Hundred Impossibility Proofs for Distributed Computing is a great read. It covers a huge amount of ground across most of the distributed systems field as it stood in 1989, and presents an overwhelming number of results. The focus is on impossibility results and bounds (as the title suggests), but the paper frequently wanders off this path.

This paper is worth reading on it’s own, but it’s also a really great way to discover distributed systems papers you haven’t seen before. With 103 references, there’s plenty to keep you busy if you’re looking for papers and books to read. Despite covering some deep results quite formally, the paper remains readable even without deep expertise in some of the areas it covers. A Hundred Impossibility Proofs is also a great piece of history, a snapshot of the distributed systems world 25 years ago.

*Why this is worth reading:* It presents a huge number of results in a very compact and readable package. You won’t get bored reading this paper.

However the

readrequest does not begin until after the write request … has completed. This therefore contradicts the atomicity property, proving that no such algorithm exists.

The algorithm that doesn’t exist is one that implements a writeable data object guaranteeing both consistency and availability in all executions in an asynchronous system. In other words, one that solves one definition of Brewer’s CAP theorem. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. is important to the practice of distributed systems today, because it’s held up as a proof of the CAP theorem (which it is), and provides some definitions under which conditions the CAP theorem is true. The paper spends about half its length looking at ways to circumvent the CAP theorem in partially synchronous networks.

*Why this is worth reading:* It both proves the CAP theorem, and debunks many of the common statements of it. The proof itself is simple and succinct, and provides real insight into why CAP is true.

It is easy to see that all correct processors make decisions

Consensus in the Presence of Partial Synchrony is one of three solutions to the consensus problem from the late 1980s. The others, Oki and Liskov’s Viewstamped Replication and Lamport’s Paxos, are arguably more general and perhaps more interesting solutions, but this one is still very influential. The algorithms in Consensus in the Presence of Partial Synchrony are interesting because they break the problem up differently from both Oki and Liskov and Lamport, and provide real insight into the structure of the consensus problem. The algorithms appear to be more complex and more cumbersome than either VR or Paxos, mostly because of the way they execute ballots in rounds. Still, this is very interesting stuff.

*Why this is worth reading:* This is more a piece of history than the others papers in this list, but it’s still worth reading because it provides a view of a common problem from a different angle.

we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. … the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement. Thus, this important problem has no robust solution without further assumptions about the computing environment or still greater restrictions on the kind of failures to be tolerated!

There are fairly few results in computer science that are seen as so influential that they have a widely recognized initialism. The FLP result, named after Fischer, Lynch and Paterson, is one such results. It may be even more notable because it’s an impossibility result: instead of describing how to do something, FLP simply states that it can’t be done.

In Impossibility of distributed consensus with one faulty process, FLP describe how no asynchronous protocol exists that can always reach consensus, even in the case of a single process failure, no matter how many participants there are. It’s easy to see why this result is so influential. Still, as famous as FLP is, it seems to be less widely known (and much less widely misinterpreted) than CAP.

The thing that stands out for me in this paper is the beauty and simplicity of the proof. I found Lemma 2 and Lemma 3 in the paper both surprising and enlightening. The proof in the paper is worth reading, but may be easiest to approach once you already understand it. Henry Robinson’s A Brief Tour of FLP Impossibility is a great place to start.

*Why this is worth reading:* It’s both an important result and a beautiful proof.

at each round, until termination is reached, each process sends its latest value to all processes (including itself). On receipt of a collection V of values, the process computes a certain function f(V) as its next value.

Reaching approximate agreement in the presence of faults is one of my favorite distributed systems papers. I like it because it contains some incredibly simple, beautiful and yet non-obvious algorithms. I also like it because it take a different look at a common problem. It looks at the FLP result and says *“OK, if I can’t get exact agreement, how close can I get?”*. It turns out that you can get arbitrarily close to agreement with guaranteed termination. In some ways, this is the flip side of Ben-Or’s consensus with probability 1 results.

*This is worth reading because* it’ll change the way you think about agreement in distributed systems.