Central Services Portal is in Beta. We need your feedback to ensure the best possible user experience. Any feedback you provide will help us help you! Please use the feedback tool at the bottom of any page.

Search Icon

Central Services Portal is in Beta. We need your feedback to ensure the best possible user experience. Any feedback you provide will help us help you! Please use the feedback tool at the bottom of any page.

Sylvie (Chaos Engineering)

What is it?

Chaos Engineering intends to understand the emergent operational behavior of Connected Living applications and infrastructure through automated discovery and chaos-based testing in AWS and on-prem data centers.

Sylvie’s goal is to provide a platform to define, execute, monitor and analyze chaos scenarios to improve the resiliency of a software system by helping to identify weaknesses and hidden assumptions. Sylvie provides a catalog of actions that can be performed to inject failures into an application or service. The platform is designed for extensibility to support actions on cloud and on prem components allowing the engineer to describe real world scenarios. Today Sylvie is integrated with AWS Fault Injection Service (FIS). Sylvie publishes events to Datadog to indicate changes in scenario and action state. 

Who is it for?

Sylvie is designed for:

  • Software engineers and architects who want to improve the resiliency of their applications and infrastructure.
  • Teams responsible for ensuring the reliability and availability of critical systems.
  • Organizations looking to proactively identify and address potential weaknesses in their software.

Sylvie is particularly useful for teams that:

  • Operate complex, distributed systems.
  • Rely on cloud-based infrastructure (AWS).
  • Need to test the resilience of their applications under various failure scenarios.

How to use it?

The Principles of Chaos Engineering recommends:

  1. Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
  2. Hypothesize that this steady state will continue in both the control group and the experimental group.
  3. Introduce variables that reflect real world events like servers that crash, storage devices that malfunction, network connections that are severed, etc.
  4. Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.

For Sylvie this would translate to:

  1. Create a Scenario.
    1. Add Stop Conditions that monitors alerts for abnormal system behavior.
    2. Add Actions to define what you’d like to happen and…
    3. Add Targets to identify resources you’d like to affect.
  2. Start a Scenario Run from the saved Scenario
  3. Verify the hypothesis. If the system becomes unstable, make corrections and rerun the Scenario as a new Scenario Run.

Follow the link to learn more about Sylvie!

wpChatIcon
wpChatIcon
Provide Feedback