"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson
Debugging highly concurrent distributed systems in a noisy network environment is an exceptionally challenging endeavor. On the one hand, evaluating all possible orders in which program events can occur is a task ill-suited to human cognition, rendering a pure analytic understanding of the control flow of such a system beyond the reach of any individual programmer. On the other hand, a more “empirical” approach to the task is also fraught with difficulty, as the dependence of severe bugs on precise timings or transient network conditions makes every part of the debugging cycle – from bug replication to verification of a fix – a Sisyphean labor bordering on the impossible. One approach which has been developed to ameliorate this situation is that of deterministic simulation, wherein the hardware components of the system – including hard disks, network links, and the machines themselves – are replaced in testing with software which fulfills the contracts of those systems, but whose state is completely transparent to the developer. This enables the simulation of a wide diversity of failure modes including network failures, disk failures or space exhaustion, unexpected machine shutdown or reboot, IP address changes, and even entire datacenter failures. Moreover, once a particular pattern of failures has been identified which uncovers a bug, the determinism property of the simulation means that the exact same series of events can be replayed an indefinite number of times, greatly facilitating the debugging process, and providing confidence when a bug has been fixed. Attendees of this talk will gain an understanding of the benefits, drawbacks, and tradeoffs involved in implementing a deterministic simulation framework, with frequent reference both to theory and to real-world engineering experience gleaned from applying this method to a complex distributed system. Attendees will also learn about language features which aid in the development of such a framework. Will Wilson FoundationDB Will Wilson works on the engineering team at FoundationDB (https://foundationdb.com). Will started his career in biotechnology, leading a successful R&D effort in spinal cord injury diagnostics, currently undergoing commercialization by a company he co-founded. Since then, Will has worked in a variety of technical and business roles at data science and data virtualization startups. Will has a degree in math and philosophy from Yale.

🚀 TDD, Where Did It All Go Wrong (Ian Cooper)

"Consistency without consensus in production systems" by Peter Bourgon

Let's all write good software - Will Wilson

"Systems that run forever self-heal and scale" by Joe Armstrong (2013)

Distributed Systems in One Lesson by Tim Berglund

How to write your own Deterministic Simulator

Why Testing Is Hard and How to Fix It with Will Wilson

Testing a Single-Node, Single Threaded, Distributed System Written in 1985 By Will Wilson

"The Mess We're In" by Joe Armstrong

CRDTs: The Hard Parts

"The Sociology of Programming Languages" by Leo Meyerovich

Can we test it? Yes, we can! - Mitchell Hashimoto

The Man Who Says AI Code Will Break Everything - EP 59 Will Wilson

From Unit Tests to Whole Universe Tests (with Will Wilson)

"Performance Matters" by Emery Berger

The Anatomy of a Distributed System

Testing Distributed Systems the right way ft. Will Wilson

Something is jamming GPS over Europe. Here's what we found

Episode 074: Deterministic Testing By Example

