Chaos Engineering intends to understand the emergent operational behavior of Connected Living applications and infrastructure through automated discovery and chaos-based testing in AWS and on-prem data centers.
Sylvie’s goal is to provide a platform to define, execute, monitor and analyze chaos scenarios to improve the resiliency of a software system by helping to identify weaknesses and hidden assumptions. Sylvie provides a catalog of actions that can be performed to inject failures into an application or service. The platform is designed for extensibility to support actions on cloud and on prem components allowing the engineer to describe real world scenarios. Today Sylvie is integrated with AWS Fault Injection Service (FIS). Sylvie publishes events to Datadog to indicate changes in scenario and action state.
Who is it for?
Sylvie is designed for:
- Software engineers and architects who want to improve the resiliency of their applications and infrastructure.
- Teams responsible for ensuring the reliability and availability of critical systems.
- Organizations looking to proactively identify and address potential weaknesses in their software.
Sylvie is particularly useful for teams that:
- Operate complex, distributed systems.
- Rely on cloud-based infrastructure (AWS).
- Need to test the resilience of their applications under various failure scenarios.
How to use it?
The Principles of Chaos Engineering recommends:
- Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
- Hypothesize that this steady state will continue in both the control group and the experimental group.
- Introduce variables that reflect real world events like servers that crash, storage devices that malfunction, network connections that are severed, etc.
- Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
For Sylvie this would translate to:
- Create a
Scenario
.- Add
Stop Conditions
that monitors alerts for abnormal system behavior. - Add
Actions
to define what you’d like to happen and… - Add
Targets
to identify resources you’d like to affect.
- Add
- Start a
Scenario Run
from the savedScenario
- Verify the hypothesis. If the system becomes unstable, make corrections and rerun the
Scenario
as a newScenario Run
.
Onboarding
To use Sylvie teams would need to onboard their aws accounts to grant Sylvie and FIS the necessary permissions to perform actions in the target account. Users will request account specific permissions from their Account Manager.
Start Onboarding