Skip to main content

Using S3 Storage with ClickHouse

This guide walks through configuring ClickHouse to use S3-compatible object storage for Coroot's telemetry data (logs, traces, profiles, and metrics).

Overview

By default, ClickHouse stores all data on local disks. With S3 storage enabled, you can:

  • Reduce storage costs by moving older data to cheaper object storage
  • Scale storage independently from compute — no need to resize PVCs
  • Store more data without being limited by local disk capacity

Coroot's operator supports two S3 storage modes:

ModeDescriptionBest for
TieredRecent data on local SSD, older data automatically moved to S3Best query performance on recent data with cost-effective long-term storage
S3-onlyAll data on S3, local disk used only for cachingMinimal local storage requirements, cost optimization

Prerequisites

  • Coroot deployed via the Kubernetes Operator
  • An S3-compatible bucket (AWS S3, MinIO, Ceph, etc.)
  • S3 credentials (access key + secret key) or IAM/IRSA configured

Step 1: Create an S3 bucket

Create a dedicated bucket for ClickHouse data. Do not share this bucket with other applications — ClickHouse manages its own file lifecycle, and external lifecycle policies can cause data loss.

aws s3 mb s3://my-coroot-clickhouse --region us-east-1

Step 2: Create the credentials secret

kubectl create secret generic clickhouse-s3-creds \
--from-literal=access_key_id=YOUR_ACCESS_KEY \
--from-literal=secret_access_key=YOUR_SECRET_KEY \
-n coroot
tip

If your Kubernetes cluster uses IAM Roles for Service Accounts (IRSA) or workload identity, you can skip creating the secret and omit the credentials section entirely. ClickHouse is configured to resolve credentials from environment variables and instance metadata automatically.

Step 3: Configure the Coroot CR

Keeps recent data on fast local disks for best query performance. Older data is automatically moved to S3 when local disk usage exceeds the threshold.

apiVersion: coroot.com/v1
kind: Coroot
metadata:
name: coroot
namespace: coroot
spec:
communityEdition:
nodeAgent:
clusterAgent:
clickhouse:
shards: 1
replicas: 2
storage:
size: 100Gi # local disk per replica
s3:
endpoint: https://s3.us-east-1.amazonaws.com/my-coroot-clickhouse/
region: us-east-1
credentials:
accessKeyId:
name: clickhouse-s3-creds
key: access_key_id
secretAccessKey:
name: clickhouse-s3-creds
key: secret_access_key
cacheSize: 10Gi # local cache for S3 reads
mode: tiered
moveFactor: "0.1" # move data to S3 when <10% free space

With these settings and 100Gi local disk, ClickHouse will start moving the oldest data to S3 when less than 10Gi of free space remains on local disk.

S3-only mode

All data is stored on S3. Local disk is used only for caching reads and temporary merge operations. This minimizes local storage requirements.

    s3:
endpoint: https://s3.us-east-1.amazonaws.com/my-coroot-clickhouse/
region: us-east-1
credentials:
accessKeyId:
name: clickhouse-s3-creds
key: access_key_id
secretAccessKey:
name: clickhouse-s3-creds
key: secret_access_key
cacheSize: 20Gi # larger cache recommended for s3only mode
mode: s3only

In S3-only mode, you can reduce the storage.size since local disk is only needed for cache and temporary operations.

Step 4: Apply the configuration

kubectl apply -f coroot.yaml

The operator will update the ClickHouse StatefulSets with the S3 storage configuration. Pods will be restarted to pick up the new config.

How it works

Data isolation

Each ClickHouse shard and replica writes to a unique S3 path prefix. Even replicas within the same shard get separate paths. ClickHouse does have a "zero-copy replication" feature that allows replicas to share S3 objects, but it is experimental and disabled by default since version 22.8 due to known data corruption issues. The operator explicitly disables it and uses separate S3 paths per replica to ensure data safety.

Paths follow this pattern:

s3://my-coroot-clickhouse/{shard}/{replica}/

For example, with 2 shards and 2 replicas:

  • s3://my-coroot-clickhouse/shard-0/coroot-clickhouse-shard-0-0/
  • s3://my-coroot-clickhouse/shard-0/coroot-clickhouse-shard-0-1/
  • s3://my-coroot-clickhouse/shard-1/coroot-clickhouse-shard-1-0/
  • s3://my-coroot-clickhouse/shard-1/coroot-clickhouse-shard-1-1/

Caching

A local cache layer sits between ClickHouse and S3. Frequently accessed data is cached on local disk, reducing S3 API calls and read latency. The cache also pre-populates on writes (cache_on_write_operations), so recently written data is immediately available from cache.

Space Manager

When S3 storage is configured, Coroot's Space Manager is automatically disabled. Instead of deleting old partitions to free disk space, ClickHouse moves data to S3, preserving it for the full TTL period.

Using MinIO or other S3-compatible storage

Any S3-compatible storage can be used. Simply set the endpoint to your storage URL:

    s3:
endpoint: https://minio.example.com/coroot-clickhouse/
credentials:
accessKeyId:
name: minio-creds
key: access_key_id
secretAccessKey:
name: minio-creds
key: secret_access_key

Troubleshooting

Verify S3 disks are configured

Connect to a ClickHouse pod and run:

SELECT name, path, type, free_space, total_space FROM system.disks

You should see ObjectStorage type disks for s3_disk and s3_cache alongside the Local default disk.

Verify the storage policy

SELECT * FROM system.storage_policies

You should see an s3_tiered or s3_s3only policy depending on your mode.

Check data distribution across disks

SELECT disk_name, sum(bytes_on_disk) as bytes
FROM system.parts
WHERE active = 1
GROUP BY disk_name
Looking for 24/7 support from the Coroot team? Subscribe to Coroot Enterprise:Start free trial