Skip to main content

Node-agent

This page describes metrics gathered by coroot-node-agent.

Each container metric has the container_id label. This is a compound identifier and its format varies between container types, e.g., /docker/upbeat_borg, k8s/namespace-1/pod-2/container-3 or /system.slice/nginx.service.

CPU

container_resources_cpu_limit_cores

  • Description: CPU limit of the container
  • Type: Gauge
  • Source: CPU cgroup, the cpu.cfs_quota_us and cpu.cfs_period_us files

container_resources_cpu_usage_seconds_total

  • Description: Total CPU time consumed by the container
  • Type: Counter
  • Source: CPU accounting cgroup, the cpuacct.usage file

container_resources_cpu_throttled_seconds_total

  • Description: Total time duration the container has been throttled for
  • Type: Counter
  • Source: CPU cgroup, the cpu.stat file

container_resources_cpu_delay_seconds_total

  • Description: Total time duration the container has been waiting for a CPU (while being runnable)
  • Type: Counter
  • Source: Delay accounting

Memory

container_resources_memory_limit_bytes

  • Description: Memory limit of the container
  • Type: Gauge
  • Source: Memory cgroup, file memory.limit_in_bytes

container_resources_memory_rss_bytes

  • Description: Amount of physical memory used by the container (doesn't include page cache)
  • Type: Gauge
  • Source: Memory cgroup, file memory.stats

container_resources_memory_cache_bytes

  • Description: Amount of page cache memory allocated by the container
  • Type: Gauge
  • Source: Memory cgroup, file memory.stats

container_oom_kills_total

  • Description: Total number of times the container has been terminated by the OOM killer
  • Type: Counter
  • Source: eBPF: tracepoint/oom/mark_victim

Disk

container_resources_disk_delay_seconds_total

  • Description: Total time duration the container has been waiting for I/Os to complete
  • Type: Counter
  • Source: Delay accounting

container_resources_disk_size_bytes

  • Description: Total capacity of the volume
  • Type: Gauge
  • Source: statfs()
  • Labels:
    • mount_point - path in the mount namespace of the container
    • device - device name, e.g., vda, nvme1n1
    • volume - Persistent Volume (Kubernetes only)

container_resources_disk_used_bytes

  • Description: Used capacity of the volume.
  • Type: Gauge
  • Source: statfs()
  • Labels: mount_point, device, volume

container_resources_disk_(reads|writes)_total

  • Description: Total number of reads or writes completed successfully by the container
  • Type: Counter
  • Source: Blkio cgroup, the blkio.throttle.io_serviced file
  • Labels: mount_point, device, volume

container_resources_disk_(read|written)_bytes_total

  • Description: Total number of bytes read from the disk or written to the disk by the container
  • Type: Counter
  • Source: Blkio cgroup, the blkio.throttle.io_service_bytes file
  • Labels: mount_point, device, volume

Network

container_net_tcp_listen_info

  • Description: A TCP listen address of the container
  • Type: Gauge
  • Source: eBPF: tracepoint/sock/inet_sock_set_state, /proc/<pid>/net/tcp, /proc/<pid>/net/tcp6
  • Labels: listen_addr (ip:port), proxy

container_net_tcp_successful_connects_total

  • Description: Total number of successful TCP connection attempts
  • Type: Counter
  • Source: eBPF: tracepoint/sock/inet_sock_set_state
  • Labels: destination, actual_destination. The IP and port of the connection’s destination. For example, a container might be establishing a connection to port 80 of a Kubernetes Service IP (e.g., 10.96.1.1). This destination address may be translated by iptables to the actual Pod IP (e.g., 10.40.1.5). In this case, the actual_destination would be 10.40.1.5:80.

container_net_tcp_retransmits_total

  • Description: Total number of retransmitted TCP segments. This metric is collected only for outbound TCP connections.
  • Type: Counter
  • Source: eBPF: tracepoint/tcp/tcp_retransmit_skb
  • Labels: destination, actual_destination

container_net_tcp_failed_connects_total

  • Description: Total number of failed TCP connects to a particular endpoint. The agent takes into account only TCP failures, so this metric doesn't reflect DNS errors
  • Type: Counter
  • Source: eBPF: tracepoint/sock/inet_sock_set_state
  • Labels: destination

container_net_tcp_active_connections

  • Description: Number of active outbound connections between the container and a particular endpoint
  • Type: Gauge
  • Source: eBPF: tracepoint/sock/inet_sock_set_state
  • Labels: destination, actual_destination

container_net_latency_seconds

  • Description: Round-trip time between the container and a remote IP
  • Type: Gauge
  • Source: The agent measures the round-trip time of an ICMP request sent to IP addresses the container is currently working with
  • Labels: destination_ip

container_net_latency_seconds

  • Description: Round-trip time between the container and a remote IP
  • Type: Gauge
  • Source: The agent measures the round-trip time of an ICMP request sent to IP addresses the container is currently working with
  • Labels: destination_ip

Application layer protocol metrics

container_http_requests_total

  • Description: Total number of outbound HTTP requests made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_http_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound HTTP request
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_postgres_queries_total

  • Description: Total number of outbound Postgres queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_postgres_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Postgres query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_redis_queries_total

  • Description: Total number of outbound Redis queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_redis_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Redis query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_memcached_queries_total

  • Description: Total number of outbound Memcached queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_memcached_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Memcached query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_mysql_queries_total

  • Description: Total number of outbound Mysql queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_mysql_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Mysql query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_mongo_queries_total

  • Description: Total number of outbound Mongo queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_mongo_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Mongo query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_kafka_requests_total

  • Description: Total number of outbound Kafka requests made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_kafka_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound Kafka request
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_cassandra_queries_total

  • Description: Total number of outbound Cassandra queries made by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_cassandra_queries_duration_seconds_total

  • Description: Histogram of the response time for each outbound Cassandra query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_rabbitmq_messages_total

  • Description: Total number of Rabbitmq messages produced or consumed by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status, method

container_nats_messages_total

  • Description: Total number of NATS messages produced or consumed by the container
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, method

container_dubbo_requests_total

  • Description: Total number of outbound DUBBO requests
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_dubbo_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound DUBBO request
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_dns_requests_total

  • Description: Total number of outbound DNS requests
  • Type: Counter
  • Source: eBPF
  • Labels: domain, request_type, status

container_dns_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound DNS request
  • Type: Counter
  • Source: eBPF
  • Labels: le

container_clickhouse_requests_total

  • Description: Total number of outbound ClickHouse queries
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_clickhouse_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound ClickHouse query
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

container_zookeeper_requests_total

  • Description: Total number of outbound ZooKeeper requests
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, status

container_zookeeper_requests_duration_seconds_total

  • Description: Histogram of the response time for each outbound ZooKeeper request
  • Type: Counter
  • Source: eBPF
  • Labels: destination, actual_destination, le

JVM

Each JVM metric has the jvm label which refers to the main class or path to the .jar file.

container_jvm_info

  • Description: Meta information about the JVM
  • Type: Gauge
  • Source: hsperfdata
  • Labels: jvm, java_version

container_jvm_heap_size_bytes

  • Description: Total heap size in bytes
  • Type: Gauge
  • Source: hsperfdata
  • Labels: jvm

container_jvm_heap_used_bytes

  • Description: Used heap size in bytes
  • Type: Gauge
  • Source: hsperfdata
  • Labels: jvm

container_jvm_gc_time_seconds

  • Description: Time spent in the given JVM garbage collector in seconds
  • Type: Counter
  • Source: hsperfdata
  • Labels: jvm, gc

container_jvm_safepoint_time_seconds

  • Description: Time the application has been stopped for safepoint operations in seconds
  • Type: Counter
  • Source: hsperfdata
  • Labels: jvm

container_jvm_safepoint_sync_time_seconds

  • Description: Time spent getting to safepoints in seconds
  • Type: Counter
  • Source: hsperfdata
  • Labels: jvm

.NET runtime

Each .NET runtime metric has the application label, which allows distinguishing multiple applications within the same container.

container_dotnet_info

  • Description: Meta information about the Common Language Runtime (CLR)
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application, runtime_version

container_dotnet_memory_allocated_bytes_total

  • Description: The number of bytes allocated
  • Type: Counter
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_exceptions_total

  • Description: The number of exceptions that have occurred
  • Type: Counter
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_memory_heap_size_bytes

  • Description: Total size of the heap generation in bytes
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application, generation

container_dotnet_gc_count_total

  • Description: The number of times GC has occurred for the generation
  • Type: Counter
  • Source: .NET diagnostic port
  • Labels: application, generation

container_dotnet_heap_fragmentation_percent

  • Description: The heap fragmentation
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_monitor_lock_contentions_total

  • Description: The number of times there was contention when trying to take the monitor's lock
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_thread_pool_completed_items_total

  • Description: The number of work items that have been processed in the ThreadPool
  • Type: Counter
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_thread_pool_queue_length

  • Description: The number of work items that are currently queued to be processed in the ThreadPool
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application

container_dotnet_thread_pool_size

  • Description: The number of thread pool threads that currently exist in the ThreadPool
  • Type: Gauge
  • Source: .NET diagnostic port
  • Labels: application

Other

container_info

  • Description: Meta information about the container
  • Type: Gauge
  • Source: dockerd, containerd
  • Labels: image

container_restarts_total

  • Description: Number of times the container has been restarted
  • Type: Counter
  • Source: eBPF: tracepoint/task/task_newtask, tracepoint/sched/sched_process_exit

container_application_type

  • Description: Type of application running in the container (e.g., memcached, postgres, mysql)
  • Type: Gauge
  • Source: /proc/<pid>/cmdline of the processes running within the container
  • Labels: application_type

Logs

container_log_messages_total

  • Description: The number of messages grouped by the automatically extracted repeated patterns
  • Type: Counter
  • Source: The container's log. The following logging methods are supported:
    • stdout/stderr: streams are captured by Dockerd (json file driver) or Containerd (CRI)
    • Journald
    • /var/log/*
  • Labels:
    • source: journald, stdout/stderr, or path to the file in the /var/log directory.
    • level: < unknown | debug | info | warning | error | critical >
    • pattern_hash: the ID of the automatically extracted repeated pattern
    • sample: a sample message of the group

Node metrics

node_resources_cpu_usage_seconds_total

  • Description: Amount of CPU time spent in each mode
  • Type: Counter
  • Source: /proc/stat
  • Labels: mode: < user | nice | system | idle | iowait | irq | softirq | steal >

node_resources_cpu_logical_cores

  • Description: Number of logical CPU cores
  • Type: Counter
  • Source: /proc/stat

node_resources_memory_total_bytes

  • Description: Total amount of physical memory
  • Type: Gauge
  • Source: /proc/meminfo

node_resources_memory_free_bytes

  • Description: Amount of unassigned memory
  • Type: Gauge
  • Source: /proc/meminfo

node_resources_memory_available_bytes

  • Description: An estimate of how much memory is available for allocations, without swapping. Roughly speaking, this is the sum of the free memory and a part of the page cache that can be reclaimed. You can learn more about how this estimate is calculated here
  • Type: Gauge
  • Source: /proc/meminfo

node_resources_memory_cached_bytes

  • Description: Amount of memory used as page cache. The memory used for page cache might be reclaimed on memory pressure. This can increase the number of disk reads
  • Type: Gauge
  • Source: /proc/meminfo

node_resources_disk_(reads|writes)_total

  • Description: Total number of reads or writes completed successfully. Any disk has the maximum IOPS it can serve. Below are the reference values for the different storage types:
TypeMax IOPS
Amazon EBS sc1250
Amazon EBS st1500
Amazon EBS gp2/gp316,000
Amazon EBS io1/io264,000
Amazon EBS io2 Block Express256,000
HDD200
SATA SSD100,000
NVMe SSD10,000,000
  • Type: Counter
  • Source: /proc/diskstats
  • Labels: device

node_resources_disk_(read|written)_bytes_total

  • Description: Total number of bytes read from the disk or written to the disk respectively In additional to the maximum number of IOPS a disk can serve, there is a throughput limit. For example,
TypeMax throughput
Amazon EBS sc1250 MB/s
Amazon EBS st1500 MB/s
Amazon EBS gp2250 MB/s
Amazon EBS gp31,000 MB/s
Amazon EBS io1/io21,000 MB/s
Amazon EBS io2 Block Express4,000 MB/s
SATA600 MB/s
SAS1,200 MB/s
NVMe4,000 MB/s
  • Type: Counter
  • Source: /proc/diskstats
  • Labels: device

node_resources_disk_(read|write)_time_seconds_total

  • Description: Total number of seconds spent reading and writing respectively, including queue wait. To get the average I/O latency, the sum of these two should be normalized by the number of the executed I/O requests. Below is the reference average I/O latency for the different storage types:
TypeAvg latency
Amazon EBS gp2/gp3/io1/io2"single-digit millisecond"
Amazon EBS io2 Block Express"sub-millisecond"
HDD2–4ms
NVMe SSD0.1–0.3ms
  • Type: Counter
  • Source: /proc/diskstats
  • Labels: device

node_resources_disk_io_time_seconds_total

  • Description: Total number of seconds the disk spent doing I/O. It doesn't include queue wait, only service time. E.g., if the derivative of this metric for a minute interval is 60s, this means that the disk was busy 100% of that interval.
  • Type: Counter
  • Source: /proc/diskstats
  • Labels: device

node_net_received_(bytes|packets)_total

  • Description: Total number of bytes and packets received
  • Type: Counter
  • Labels: interface

node_net_transmitted_(bytes|packets)_total

  • Description: Total number of bytes and packets transmitted
  • Type: Counter
  • Labels: interface

node_net_interface_up

  • Description: Status of the interface (0: down, 1:up)
  • Type: Gauge
  • Labels: interface

node_net_interface_ip

  • Description: IP address assigned to the interface
  • Type: Gauge
  • Labels: interface, ip

node_net_interface_ip

  • Description: IP address assigned to the interface
  • Type: Gauge
  • Labels: interface, ip

node_uptime_seconds

  • Description: Uptime of the node in seconds
  • Type: Gauge

node_info

  • Description: Meta information about the node
  • Type: Gauge
  • Labels: hostname, kernel_version, agent_version

node_cloud_info

  • Description: Meta information about the cloud instance
  • Type: Gauge
  • Source: The agent detects the cloud provider using sysfs. Then it uses cloud-specific metadata services to retrieve additional information about the instance AWS, GCP, Azure, Hetzner, Scaleway, DigitalOcean, Alibaba. For unsupported providers, you can use the --provider, --region, and --availability-zone command line arguments of the agent to define the labels manually.
  • Labels:
    • provider: < aws | gcp, azure | hetzner | scaleway | digitalocean | alibaba >
    • account_id: account_id for AWS, project_id for GCP, subscription_id for azure
    • instance_id
    • instance_type
    • instance_life_cycle: < on-demand | spot | preemtible > (always empty for Azure instances)
    • region
    • availability_zone
    • availability_zone_id: ID of the availability zone (AWS only)
    • local_ipv4
    • public_ipv4
Looking for 24/7 support from the Coroot team? Subscribe to Coroot Enterprise:Start free trial