ADDO session: Building observability to increase resiliency

October 10, 2024 By Sonatype

2 minute read time

2:46

As part of the DevOps and DevSecOps track during Sonatype's 9th All Day DevOps (ADDO) event, AWS Senior Developer Advocate Guillermo Ruiz presented his session titled "Building Observability to Increase Resiliency." Well-applied observability helps you find early signs of problems before they impact customers and makes it possible to react quickly to disruptions.

Observability and resiliency topics typically focus on logging and tracing system performance. However, Ruiz focused on things that might go wrong within a system as a way to discuss how to uncover and diagnose issues, as well as prevent future challenges.

Common failure modes and dimensionality

There are four common types of failure: a bad dependency, a bad component, a bad deployment, or a traffic spike. But how would you know there was a problem to begin with? Ruiz used a hypothetical e-commerce website as an example, where each of the pages and elements, including navigation and search, operate independently with its own code. When one of these elements experiences an issue, it interferes with the user experience. Identifying these issues before they can frustrate users is why observability is essential.

By observing data for real-time insight into the system and by setting alerts to anything that happens that might be out of the ordinary, it's possible to proactively address potential problems before they become user complaints. Dimensionality allows developers to break down errors into multiple factors or dimensions, such as time, location (in code or system), user input, or environmental conditions. By analyzing these dimensions, developers can understand the context in which the error occurs, pinpointing the source more effectively.

Having more dimensions means seeing the problem from more perspectives, but having too many can be overwhelming, so it's important to find the right balance for dimensionality without getting lost in the data.

You can also use composite alarms, which combine several alarms into a single notification. In our website example, we can set a threshold so that when an issue is detected across multiple pages, just a single alarm is triggered. This is a way to minimize the number of alarms and focus on key issues.

During his session, Ruiz explained in detail how to identify and respond to each of these common failure types and how to identify issues that may originate externally.

Explore more at All Day DevOps

All Day DevOps is the largest DevOps conference in the world, with more than 180,000 attendees each year.

You can catch Ruiz's session on demand here, as well as hundreds of sessions across a wide range of topics.

Written by Sonatype

Explore All Posts by Sonatype

ADDO session: Building observability to increase resiliency

Common failure modes and dimensionality

Explore more at All Day DevOps

Code 3x Faster with Less False Positives

Subscribe for all the latest software security news and events

ADDO session: Building observability to increase resiliency

Common failure modes and dimensionality

Explore more at All Day DevOps

Code 3x Faster with Less False Positives

Related Resources

​Elevate your organization's success: Submissions now open for the 2025 Sonatype Elevate Awards

Why should your organization choose Nexus Repository Pro?

SCA Best Practices Guide

Subscribe for all the latest software security news and events

Elevate your organization's success: Submissions now open for the 2025 Sonatype Elevate Awards