**Sound of a T-Rex breaks the silence** Don’t worry. This isn’t a paleontology blog — just me replaying my favorite Jurassic Park movie and learning a lesson about chaos testing. You surely are wondering: “What do dinos have to do with a branch of software testing?”. The answer is – A lot!
If you remember the first (and best!) installment in the Jurassic Park series, there’s a scene when an engineer takes a vital security system offline, leading to a complete cascading meltdown. A tiny switch broke the whole complex facility apart and brought the whole system into chaos.
I’m bringing that example here because chaos testing is precisely that: pulling tiny levers and seeing their effect on the whole system and how much havoc they cause. The only difference is that, unlike Dennis Nedry, you won’t be throwing chaos to divert people from your wrongdoings: you’ll be doing that to test the strength of your systems.
The fun part I love about chaos testing is its educational side, which sheds light on what parts of a system need improving, modifying, or removing to become more resilient.
Sounds cool, doesn’t it?
There’s more to the educational purposes of chaos testing than meets the eye. That’s because this technique can help devs understand and improve the behavior of their systems under real-world conditions. So let’s dig right in.
What is chaos testing?
Speaking BMO’s language: Chaos testing is a software testing method that identifies systems’ weaknesses and vulnerabilities by intentionally introducing failures and disruptions.
Why would you purposefully press the explode button on something that works flawlessly?
Because, by doing that, you’ll be learning a lot about the reliability and resilience of a software, especially in the face of unexpected or extreme circumstances. To see how resilient something is, you must test it and see if it will crumble into a million pieces and warnings.
Why chaos testing matters?
Okay, so how does chaos testing fit into our daily lives? Without chaos testing, many services like Netflix, Instagram, Meta, and Gmail would have so many downtimes, glitches, and attacks that they would become useless and unwanted.
That’s even more true with modern, distributed software systems, which rely on multiple interconnected components. Chaos testing helps to ensure that the system handles unexpected failures or disruptions and continues to function as expected.
The educational side of chaos testing
Aside from the practical use I mentioned above, there’s also added value in using chaos testing for education. The educational side of chaos testing shines in these aspects:
- Understanding the behavior of the system
- Learning the limitations of your software
- Better grasping the interdependencies and vulnerabilities
- Creating a hands-on experience for new devs
Understanding the behavior of the system
One of the primary educational purposes of chaos testing is to help developers understand the behavior of their systems in the face of even the wildest and most unexpected failures and disruptions. When warnings flash and the system fails, it’s extremely difficult to understand the root cause of the failure without extensive testing and analysis. You have 0 TIME to react and test the system at that moment!
To avoid panic and ‘everything’s fine’ moment, you need to intentionally cause failures and observe the effects on the system in a safe environment. That’s how you’ll create a how-to manual on how the system responds to different types of disruptions.
Learning the limitations of your software
How do you know how much your software would “stretch” unless you push it to extremes?
Imagine yourself in this position: You’re running an e-commerce server that receives thousands of queries every moment. But you don’t know the ultimate number your system can handle. The next day you’re met with a 500 Internal Server Error, and you turn ghost-white in microseconds. What do you do?
That’s why chaos testing is helpful when learning the limitations and capabilities of the software system. By introducing failures and disruptions in a controlled manner, you’re pushing the limits of the system and seeing how it behaves under extreme conditions.
This is super valuable when you’re developing systems that need to be highly reliable and resilient, such as in environments with high levels of stress and uncertainty and in sensitive fields like healthcare, aviation, and finance.
Better grasp the interdependencies and vulnerabilities
Sitting on the sidelines and having your fingers crossed, hoping that nothing fails, is a highway to disaster many companies take. By intentionally causing failures and disruptions, businesses can identify points of weakness or bottlenecks in their systems and take steps to improve their resilience and reliability.
This helps prevent costly downtime or data loss and ultimately leads to a more efficient and effective system operation.
So how do you intentionally cause damage? What tool do you use?
More than a decade ago, when Netflix migrated to the cloud, they made Chaos Monkey. It’s like Donkey Kong, but instead of barrels, it throws stress instances on the system and even destroys everything in its path. Chaos Monkey is part of the Simian army, a set of open-source chaos-testing tools from Netflix that you can use to your advantage in these cases.
Creating a hands-on experience for new devs
Chaos testing isn’t just for senior devs. It’s also a learning tool for new developers, as it allows them to gain hands-on experience working with and understanding the behavior of complex systems.
By participating in chaos testing exercises, new developers learn about the various components of a system and how they interact with each other, as well as how to identify and troubleshoot issues that may arise.
It’s like learning how to drive, except the vehicle is a complex system you have to crash — bit by bit, not entirely all of a sudden.
Chaos testing becomes a valuable addition to traditional classroom learning, as it allows developers to gain practical experience working with real-world systems, compared to some computer theories from CS books a decade ago.
A lesson worth remembering
Chaos testing is a fun skill to develop, especially if you’re a let’s-break-things type of person. With a big community behind it, chaos testing is an educational tool that helps companies better understand the impact of external factors on their software systems.
Warning: While it’s a bit risky and requires careful planning, the benefits of chaos testing are significant, making it a valuable addition to any software development process. Having appropriate safeguards in place to prevent accidents and minimize risk is a must!