Prometheus at Scale: Hub-and-Spoke with Thanos

Every Kubernetes cluster you add multiplies your monitoring problem. One Prometheus per cluster is easy; querying across twelve of them, in three clouds, with HA and a year of retention is where teams start hurting.

Here's the architecture that has worked for us in production.

The shape: hub and spoke

Each workload cluster runs Prometheus in agent mode — it scrapes locally and remote-writes everything to a central hub. No local storage beyond the WAL, no local querying, nothing to page you about in the spokes.

# spoke cluster: prometheus agent
prometheus:
  prometheusSpec:
    mode: agent
    remoteWrite:
      - url: https://metrics-hub.internal/api/v1/receive
        headers:
          X-Scope-Cluster: prod-aws-mumbai
        queueConfig:
          maxSamplesPerSend: 5000
          capacity: 20000

The hub runs Prometheus as a remote-write receiver, fronted by a load balancer, with external labels identifying each tenant cluster.

Why not federation?

We tried it. Federation pulls aggregated series on a scrape interval, which means you lose granularity exactly when you need it — during an incident. Remote write streams raw samples continuously, and agent mode keeps the spoke footprint tiny (we run spokes with 512Mi limits).

Thanos for the hard parts

The hub alone gives you a single pane of glass, but it's also a single point of failure. Thanos fixes the three remaining problems:

Problem	Thanos component
HA / deduplication	Sidecar + Querier with replica labels
Long-term retention	Store Gateway over object storage
Downsampling old data	Compactor (5m/1h resolutions)

Run two hub replicas with replica external labels, let the Querier deduplicate, and ship blocks to S3/GCS/Azure Blob every two hours. Retention on the hub drops to days; object storage handles the year.

thanos:
  objstoreConfig:
    type: S3
    config:
      bucket: metrics-longterm
      endpoint: s3.ap-south-1.amazonaws.com

Alerting: keep it close to the data

One thing we deliberately did not centralize: critical alerts. Rules like KubeletDown or disk-pressure evaluate on the hub, but each spoke keeps a tiny set of "is my pipeline alive?" rules with a dead-man's-switch. If a spoke stops remote-writing, the hub notices; if the hub dies, the spokes still scream through a secondary Alertmanager path.

Results

MTTD down ~90% — one query surface, no "which Grafana do I check?"
MTTR down ~50% — cross-cluster correlation in a single dashboard
Spoke overhead small enough that nobody argues about running it everywhere

The pattern scales sideways: onboarding a new cluster is one Helm values file and one external label. That's the real win — monitoring stopped being a per-cluster project and became a platform capability.