Sustaining Resilience in Software and Systems
Learn about S|
A book by Kelly Shortridge with case studies by Aaron Rinehart


About the Security Chaos Engineering book
Cybersecurity is broken. Year after year, attackers remain unchallenged and undeterred, while engineering teams feel pressure to design, build, and operate “secure” systems. Failure can’t be prevented, mental models of systems are incomplete, and our digital world constantly evolves. How can we verify that our systems behave the way we expect? What can we do to improve our systems’ resilience?
In this comprehensive guide, Kelly Shortridge helps you navigate the challenges of sustaining resilience in complex software systems by using the principles and practices of security chaos engineering. By preparing for adverse events, you can ensure they don’t disrupt your ability to innovate, move quickly, and achieve your engineering and business goals.
- Learn how to design a modern security program
- Make informed decisions at each phase of software delivery to nurture resilience and adaptive capacity
- Understand the complex systems dynamics upon which resilience outcomes depend
- Navigate technical and organizational trade-offs that distort decision making in systems
- Explore chaos experimentation to verify critical assumptions about software quality and security
- Learn how major enterprises leverage security chaos engineering
350+
Pages of Pragmatic Opportunities6
In-depth Enterprise Case Studies1
Adorable Chaos Kitty to guide you on your questContents
Resilience in Software and Systems
We begin our journey in Chapter 1 by discussing resilience in complex systems, how failure manifests, how resilience is maintained, and how we can avoid common myths that lead our security strategy astray.
Systems-Oriented Security
In Chapter 2, we explore the shift toward systems thinking in security, describing how to refine mental models of systems behavior and perform resilience assessments before comparing SCE to traditional cybersecurity.
Architecting and Designing
Chapter 3 starts in the “first” phase of software delivery: architecting and designing systems. We think through how to invest effort based on your organization’s specific context before describing opportunities to invest in looser coupling and linearity.
Building and Delivering
In Chapter 4, we map the five features that define resilience to activities we can pursue when developing, building, testing, and delivering systems. The ground we cover is expansive; this chapter is perhaps the most packed full of practical wisdom.
Operating and Observing
Chapter 5 describes how we can sustain resilience as our systems run in production. We reveal the overlap between SRE and security goals, then discover different strategies for security observability before closing with a discussion of scalability’s relevance to security.
Responding and Recovering
In Chapter 6, we dig into the biases that can distort our decision making and learning during incident response and recovery. Along the way, we propose tactics for countering those biases and supporting more constructive efforts that eradicate blame games around "human error".
Platform Resilience Engineering
Chapter 7 introduces the concept of platform resilience engineering and describes how to implement it in practice within any organization. We cover the process for creating security solutions for internal customers (like engineering teams); the Ice Cream Cone Hierarchy of Security Solutions is especially tasty wisdom.
Security Chaos Experiments
In Chapter 8 we learn how to conduct experiments and paint a richer picture of our systems, which in turn helps us better navigate strategies to make them more resilient to failure. We outline the end-to-end experimentation process including setup, hypotheses, experiment design, running experiments, and analyzing evidence.
Plus Chapter 9 with Case Studies by

Praise

Security Chaos Engineering is a must read for technology leaders and engineers today, as we operate increasingly complex systems. SCE presents clear evidence that systems resilience is a shared goal of both ops and security teams, and showcases tools and frameworks to measure, design, and instrument systems to improve the resilience and security of our systems. 10/10 strong recommend (kidding but also not).
Dr. Nicole Forsgren
Lead Author of Accelerate and Partner at Microsoft Research
Shortridge weaves multiple under-served concepts into the book’s guidance, like recognizing human biases, the power of rehearsals, org design, complex systems, systems thinking, habits, design thinking, thinking like a product manager and a financial planner, and much more. This book brings the reader in on a well-kept secret: security is more about people and processes than about technology. It is our mental models of those elements that drive our efforts and outcomes.
Bob Lord
Bob Lord, Former Chief Security Officer of the DNC and former Chief Information Security Officer of Yahoo
As our societies become more digitized then our software ecosystems are becoming ever more complex. While complexity can be considered the enemy of security, striving for simplicity as the sole tactic is not realistic. Rather, we need to manage complexity and a big part of that is chaos engineering. That is testing, probing, modeling and nudging complex systems to a better state. This is tough but Kelly and Aaron bring immense cross-domain, practical real world experience to this area in a way that all security professionals should find accessible and fascinating.
Phil Venables
Chief Information Security Officer, Google Cloud
Security Chaos Engineering provides a much-needed reframing of cybersecurity that moves it away from arcane rules and rituals, replacing them with modern concepts from software and resiliency engineering. If you are looking for ways to uplift your security approaches and engage your whole engineering team in the process, this book is for you.
Camille Fournier
Engineering leader and author, The Manager’s Path
We as defenders owe it to ourselves to make life as hard for attackers as possible. This essential work expertly frames this journey succinctly and clearly and is a must read for all technology leaders and security practitioners, especially in our cloud native world.
Rob Duhart, Jr.
VP, Deputy CISO and CISO eCommerce at Walmart
Security Chaos Engineering is an unflinching look at how systems are secured in the real world. Shortridge understands both the human and the technical elements in security engineering.
George Neville-Neil
Author of the Kode Vicious column in ACM Queue Magazine
Security masquerades as a technical problem, but it really cuts across all layers: organizational, cultural, managerial, temporal, historical, and technical. You can't even define security without thinking about human expectations, and the dividing line between "flaw" and "vulnerability" is non-technical. This thought-provoking book emphasizes the inherent complexity of security and the need for flexible and adaptive approaches that avoid both box-ticking and 0day-worship.