Chaos engineering

Proactive resilience to cyber threats

Charles A. Jacco

Charles A. Jacco

Principal, Cyber Security, KPMG US

+1 212-954-1949

Alexander Smith

Alexander Smith

Associate Advisory, Cyber Security Services, KPMG US

+1 480-284-3292

Dimitri Gavriilopoulos

Dimitri Gavriilopoulos

Manager Advisory, Cyber Security Services, KPMG US

+1 312-665-8544

Caleb Queern

Caleb Queern

Director, Cyber Security, KPMG US

+1 571-228-8011

To navigate modern development obstacles, large organizations have begun to introduce a proactive measure called chaos engineering. Although the practice infers a random attack on one’s own software, chaos engineering is an empirical approach to systematically discover weaknesses in software by executing thoughtful, planned experiments. This new practice has grown over the last few years and is largely adopted as a standard capability by many modern DevSecOps teams to proactively mitigate critical failures in complex systems before they actually disrupt a production environment.1

Speed to market – a double edged sword

The DevSecOps movement has caused modern systems to evolve at a much faster frequency. Organizations now find it is nearly impossible to account for everything that can go wrong before deployment. Production applications in agile environments are constantly undergoing change, making it difficult to predict where and how failures are going to occur.

Avoiding the cons of moving quickly

Chaos engineering can act like a vaccine to modern application development problems. Through careful injection of harmful failures into a live production system, engineers can assess and measure their ability to perform problem solving tasks during real failures when the system does not behave like it is intended to. This leads to a large reduction in unlikely, but high impact risk. The resistance to certain types of failures over time is a strong accelerator to an application and overall DevSecOps program maturity.

Get vigilant!

Organizations should consider exploring the practice of scalable chaos engineering to increase their understanding of how complex systems work in production. The ability to proactively augment resiliency of a system and react to uncertainty in production environments can lead to increased uptime, adherence to service level agreements, and further enable continued business growth.

Currently, there are several businesses with a Software as a Service (SaaS) model to help guide companies through continuous verification and reliability improvement through the chaos engineering lifecycle. Regardless of the size of an organization, modern chaos engineering tools can be deployed to test in production in a scalable fashion. A critical facet of the deployment model of these businesses is to use the chaos engineering methodology in small, controlled spaces to avoid any unwanted outages with the goal over time to build up to larger, more disruptive experiments as use case complexity increases.

Transitioning to an environment with constant failure-based testing may sound like a daunting task, but the confidence gained through increased uptime, reliability, and future resources saved makes chaos engineering a worthy investment when considering evolution of future application and DevOps teams.

For more information about how to get started, check out our podcast on chaos engineering.2

Some or all of the services described herein may not be permissible for KPMG audit clients and their

affiliates or related entities.


The KPMG name and logo are registered trademarks or trademarks of KPMG International.

  1. KPMG LLP (US), "Rebooting DevOps security by design", (December 1, 2021).
  2. KPMG LLP (US), " Chaos engineering: Fixing your business by breaking your IT infrastructure", (March 27, 2018).