OpenTelemetry Drift in Kubernetes — Fix It with Weaver, dtctl, Claude & Argo

Explore how observability drift can silently break dashboards, SLOs, and alerts in cloud‑native environments — even when metrics, logs, and spans are still flowing. In this Observe & Resolve episode, Henrik Rexed walks through a real‑world failure scenario where a single OpenTelemetry attribute rename disconnects an entire Dynatrace observability stack. You’ll learn how to treat observability as application code by versioning, validating, and releasing it alongside your services. The session demonstrates a practical GitOps workflow using OpenTelemetry Weaver, dtctl, Argo Rollouts, and Site Reliability Guardians to detect breaking changes early and block unsafe releases before impact. What you’ll learn: Why OpenTelemetry attribute drift breaks dashboards and SLOs How to use OpenTelemetry Weaver as schema‑as‑code Managing Dynatrace dashboards and SLOs as YAML with dtctl Detecting instrumentation drift with GitHub Actions Using Site Reliability Guardians as release gates Safely handling attribute renames with overlap and DQL coalesce Connecting Argo Rollouts canary deployments with Dynatrace verdicts Using AI agents to resolve observability drift automatically Whether you run fully manual OpenTelemetry instrumentation or GitOps‑driven Dynatrace configuration, this episode shows how to prevent silent failures and keep observability reliable through every release. 🔗 Useful links dtctl (Dynatrace CLI) — https://github.com/dynatrace-oss/dtctl dtctl agent skill — https://github.com/dynatrace-oss/dtct... Dynatrace for AI — https://github.com/Dynatrace/dynatrac... observability-agent-skills — https://github.com/henrikrexed/observ... OpenTelemetry Weaver — https://github.com/open-telemetry/weaver Argo CD — https://argo-cd.readthedocs.io Argo Rollouts — https://argoproj.github.io/argo-rollouts Dynatrace Site Reliability Guardian — https://docs.dynatrace.com/docs/deliv... Visit the Dynatrace Playground: https://dt-url.net/devrel-signup-play... 📖 Chapters 📖 00:00 – Intro 00:22 – The problem: a small change can impact our observability assets 00:50 – The context 01:39 – The solution: observability needs to be part of your application 02:07 – Solution - Weaver 02:31 – Solution - dtctl 02:47 – Solution - Drift Watcher 02:57 – Solution - Argo 03:14 – Solution - Site Reliability Guardian 03:25 – Solution - Agent Skills 03:49 – dtctl cheat sheet 04:02 – Architecture 05:31 – Scenario 1 06:04 – Scenario 1 - Code changes 06:29 – Scenario 1 - the CI 06:56 – Scenario 1 - Release Flow 08:30 – Scenario 1 - Dynatrace assets updated 08:41 – Scenario 2 09:02 – Scenario 2 - the issue opened by the Drift Watcher 09:28 – Scenario 2 - How to resolve? 09:39 – Scenario 2 - The skills 10:00 – Scenario 2 - the fix made by Claude 11:09 – Recap 11:48 – Conclusion 🔬 Have a question? Visit our community and connect with our experts and users: https://dynatr.ac/3M0AzM6 👉 Stay connected Facebook →   / dynatrace   Instagram →   / dynatrace   LinkedIn →   / dynatrace   Bluesky → https://bsky.app/profile/dynatrace.com X →   / dynatrace   Twitch →   / dynatrace