CPU
This inspection detects situations in which an application is experiencing a lack of CPU time, which can be caused by several reasons:
- A container has reached its CPU limit and has been throttled by the system.
- A container competes for CPU time against the other containers running on the same node.
- A container consumes all available CPU time on its own.
Possible failure scenarios
A container has reached its CPU limit
To control the CPU usage of containers, you can set CPU limits for them. If a container uses all the allowed CPU bandwidth, it will be limited in CPU cycles for a while. This mechanism is called CPU throttling.
High CPU utilization on the related nodes
Throttling is not the sole reason why an application can experience a shortage of CPU time. There can be situations where the application itself or other applications on the same node consume the entire CPU bandwidth. This can result in performance degradation due to a shortage of CPU time.
Dashboard
CPU usage
This chart can help you answer the following questions:
- Are all the application instances consuming the same amount of CPU time, or are there any outliers?
- How does CPU consumption change over time?
- Assess the CPU usage of each container in comparison to its limit.
CPU usage is calculated using the container_resources_cpu_usage_seconds_total metric.
If the application Pods contain more than one container, this chart provides you with both per-container and total views.
The profile button opens the CPU profiling data, allowing you to identify and analyze unexpected spikes in CPU usage down to the precise line of code.
Learn more about Continuous profiling in Coroot.
CPU delay
A lack of CPU time can be estimated by container_resources_cpu_delay_seconds_total metric.
The Linux kernel reports CPU delay,
indicating how long a specific process or container has been waiting for CPU time.
For instance, if you observe a delay of 500ms per second, it signifies that you are experiencing an additional latency of 500ms, which is spread across all requests processed during that wall-clock second.
If a container is limited in CPU time (throttled), the container_resources_cpu_delay_seconds_total metric is bound to increase.
This means that this metric indicates a shortage regardless of its underlying reasons.
To identify the specific reason, please refer to the Throttled time and Node CPU usage usage charts.
Related blog post: Delay accounting: an underrated feature of the Linux kernel