Skip to main content

Service Level Objective (SLO) monitoring

An SLO (Service Level Objective) is a specific goal that defines how well a service should perform, such as its uptime or response time. It sets the standard for reliability or performance that a service needs to meet to keep users happy.

The most common SLOs focus on availability and latency:

  • Availability SLO: This tracks how often a service is available to users. For example, an SLO might require that 99.9% of all requests are successful.
  • Latency SLO: This measures how fast a service responds. An example SLO might state that 99% of requests should be completed in under 500 milliseconds.

Coroot uses eBPF to automatically gather performance data for each service. It also comes with preset SLOs that you can easily adjust, so it can start monitoring your services right after installation.

To avoid violating SLOs, Coroot alerts your team when the error budget is being consumed too quickly. It uses multi-window burn rate thresholds to trigger alerts:

SeverityLong WindowShort WindowBurn Rate ThresholdMonthly Error Budget ConsumedTime to Exhaustion
Critical1 hour5 minutes14.42%≤ 50 hours
Critical6 hours30 minutes65%≤ 5 days
Warning24 hours2 hours310%≤ 10 days

An incident will be triggered if the burn rate exceeds the threshold in both the long and short windows. The short window helps ensure the error budget is still being actively used.

To prevent false positive alerts, Coroot only calculates the burn rate if at least half of the window contains valid data. This is especially useful for services with low traffic.

info

The detailed explanation of SLO-based alerting you can find in The SRE Workbook.

When an application significantly violates its SLOs, Coroot triggers an incident and notifies the team through the configured integrations: