$ ls -lt /var/log/blog
Notes from production: architecture decisions, incident lessons and deep dives on the cloud-native stack.
3 posts found
How we monitor a multi-cloud Kubernetes fleet with Prometheus agents, a central remote-write hub, and Thanos for HA and long-term storage — and cut MTTD by 90%.
Rack-aware placement, tiered retention, consumer-lag SLOs and the failure modes nobody warns you about when you run Kafka for time-series AI workloads.
How we replaced 'works on my machine' with ephemeral per-branch environments — automated creation, dependency wiring, database lifecycle and teardown.