In some sense, the CAP theorem has been too successful. With its snappy name and apparently easy-to-understand behavior, CAP has become the go-to way of talking about tradeoffs in distributed systems. Despite its apparent simplicity, confusion, misunderstandings, misrepresentations and debates about CAP are widespread. Criticism of CAP is also widespread, both in its use as a teaching tool and as a way of reasoning about tradeoffs. Dan Weinreb, Henry Robinson, and Michael Stonebraker have written good examples of the genre. It would be great to find a tool for teaching, and learning about, these tradeoffs that doesn't have the same shortcomings as CAP.
In this post, my focus is on CAP as a tool for learning and teaching about distributed systems tradeoffs, rather than its use by experienced practitioners. The problems with CAP as a teaching tool, as I see them, are:
This doesn't mean that the CAP theorem, and Brewer's more general CAP conjecture, isn't useful as a teaching and learning tool. I think its usefulness has been overstated by popularity, and in its popularity a lot of the subtlety has been lost. I find that unfortunate, because it means more confused students, and more confused practitioners. It would be great to replace CAP with something equally snappy, with fewer subtle edges.
Daniel Abadi's Consistency Tradeoffs in Modern Distributed Database System Design proposes an alternative: PACELC. Is that what we need?
A more complete portrayal of the space of potential consistency tradeoffs for DDBSs can be achieved by rewriting CAP as PACELC (pronounced “pass-elk”): if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)?
Dan Weinreb suggests making it PACELCA instead, a formulation that I prefer:
The reason the acronym doesn’t need to be “PACELCA” is that if there are no partitions, then the system must be available. Adding an “A” to the second part is redundant. But for me (maybe not for you), putting in the redundant “A” in the “E” case helps me. A PA/EL system is always “available”, and calling it PA/ELA makes it easier for me to see that availability is always there.
PACELC mostly takes aim at the first and fourth misconceptions I listed: that CA systems exist, and that eventual consistency is all about CAP. First, it doesn't allow you to ignore partition tolerance. Second, it makes it clear that even in the absence of partitions there's a tradeoff between consistency and latency. In the general case, though not all cases, consistency requires a level of coordination which prevents systems from being always available (in the Gilbert and Lynch sense), and increases latency when no partition is present. The matter of latency, which is of great practical importance in real-world systems, isn't captured at all in CAP.
So far so good for PACELC.
The first two categories of PACELC are very clear. PC/EC is the most consistent class of systems, which never give up consistency. PA/EL systems don't try hard to be consistent, and rather take the opportunity to reduce latency and gain availability by reducing coordination. The trickier ground starts with PA/EC. These types of systems give up consistency when there is a partition, and are consistent when there isn't. That's more subtle than it looks. When is there a partition? How long does the network need to be down before there is a partition? Is a single dropped connection or lost packet a partition? That may seem like nit picking, but there's an important line to be drawn between partition and not partition. PACELC doesn't help there.
If PA/EC is tricky, PC/EL is madness. What does it mean to be more consistent during a partition? Daniel Abadi says that's the wrong question:
PNUTS is a PC/EL system. In normal operation, it gives up consistency for latency; however, if a partition occurs, it trades availability for consistency. This is admittedly somewhat confusing: according to PACELC, PNUTS appears to get more consistent upon a network partition. However, PC/EL should not be interpreted in this way. PC does not indicate that the system is fully consistent; rather it indicates that the system does not reduce consistency beyond the baseline consistency level when a network partition occurs—instead, it reduces availability.
That's interesting, but not really helpful, because it introduces a subtle edge to PACELC of the same kind that exists in CAP. All is not lost. I feel that PACELC is a more useful model for beginners in thinking about distributed systems tradeoffs, but it's still not the ideal solution. It would be great if a simple easily-understood replacement existed. Any ideas?