Marc's Blog

About Me

My name is Marc Brooker. I've been writing code, reading code, and living vicariously through computers for as long as I can remember. I like to build things that work. I also dabble in machining, welding, cooking and skiing.

I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I work on databases, serverless, and serverless databases. Before that, I worked on EC2 and EBS.
All opinions are my own.

Links

My Publications and Videos
@marcbrooker on Mastodon @MarcJBrooker on Twitter

Not Just Scale

Bookmarking this so I can stop writing it over and over.

It seems like everywhere I look on the internet these days, somebody’s making some form of the following argument:

You don’t need distributed systems! Computers are so fast these days you can serve all your customers off a single machine!

This argument is silly and reductive.

But first, let’s look for the kernel of truth.

One Machine Is All You Need?

This argument is based on a kernel of truth: modern machines are extremely powerful, can do vast amounts of work every second, and can fit all the data belonging to even some rather large businesses in memory. Thousands, or even millions, of requests per second are achievable. Hundreds of gigabits per second. Terabytes of memory, and even more storage. Gigabytes per second of storage bandwidth. Millions of IOPS. Modern machines are super fast, and software which can take advantage of that speed can achieve incredible things.

It’s also true that many systems are distributed thoughtlessly, or wastefully, or in ways that increase complexity and reduce efficiency.

At the time I’m writing this, EC2 offers single instances with 32TiB of memory and 896 vCPUs, and 200Gbps of network bandwidth.

Many very important workloads can fit on one such machine.

Or could, if scale was all we cared about.

It’s Not Just Scale

Scale, and scalability, is only a small part of the overall reason distributed systems are interesting. Other practical reasons include:

These properties allow systems to achieve something important: simplicity.

Simplicity is a System Property

It is trivial to make any component in a system simpler, by moving its responsibilities to other parts of the system. Or by deciding that some of its responsibilities are redundant. It is common to see reductive views of simplicity that consider only part of a system’s responsibilities, dismissing important requirements or ignoring the way they’re actually achieved.

Let’s consider deployments as an example. In many distributed designs, deployments work by replacing or re-imaging machines when changes need to be made. Often, this uses the same mechanisms that ensure high-availability: traffic is moved away from a machine, changes are made and validated, and traffic returns. Single-machine deployments are typically harder to change: changes must be made online, to a running system, or under the pressure of downtime. Validating changes is difficult, because it’s all or nothing. The problems of single-machine deployments are solvable, but typically at the cost of higher system complexity: complex operational procedures, skilled operators, high judgement, coordination with customers, etc. It’s easy to ignore this complexity when admiring the simplicity of a single machine deployment. In the moment we look at it, none of this system complexity is visible.

Simplicity is a property of systems, not components. Systems include people and processes.

Another trap in the simplicity debate is confusing simple with familiar. Years of using Linux may make system administration tasks feel simple. Years of using IaC frameworks may make cloud deployments feel simple. In reality, both are rather complex, but its easy to conclude that the one we’re more familiar with is the simpler one.

Of course, scale also matters in real systems, in a number of ways. One of those ways is organizational scale.

Scaling Organizations

Just like computer systems, organizations scale by avoiding coordination. The more the organization needs different pieces to coordinate with one another to work, the less it is going to be able to grow. Organizations that wish to grow without grinding to a halt need to be carefully design, and continuously optimized, to reduce unnecessary coordination. Approaches like microservices and SoA are tools that allow technical organizations to avoid coordinating over things like data models, implementation choices, fleet management, tool choices, and other things that aren’t core to their businesses. APIs are, fundamentally, contracts that move coordination from human-to-human to system-to-system, and constrain that coordination in ways that allow systems to handle it efficiently.

You might be able to run all your business logic on a single box, but as your organization grows you’ll likely find the coordination necessary to do that slows you down more and more.

Finally, scale does matter.

The Scale Ceiling

As a business owner, there’s nothing quite like the joy and misery of a full store. Joy, because its an indication of a successful business. Misery, because a larger store would have been able to serve more customers. The queue out the door is turning people away, and with those people go their business. Opening a second location could take months, as could adding space. The opportunity is slipping away.

A smart business needs to be correctly scaled. A hundred thousand square feet is too much for a taco truck. All that space is expensive, and distracting. Fifty square feet is too few for a supermarket. Folks can barely get into the door. A pedestrian bridge and a train bridge are built differently. Scale matters, both up and down.

This isn’t a hard idea. It’s right at the soul of what engineering aims to achieve as a field. The smartest thing that new engineers can do is focus on the needs of their businesses. Both now and in the future. Learn what drives the costs and scalability needs of your business. Know how it makes money. Understand the future projections, and the risks that come with them. Ignore the memes and strong opinions.