AI-Powered Root Cause Analysis at Scale: From Theory To Production... Letícia Mota & Yevgeny Gladun

Don't miss out! Join us at our next KubeCon + CloudNativeCon events in Mumbai, India (18-19 June, 2026), Yokohama, Japan (29-30 July, 2026), and Shanghai, China (8-9 September, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io AI-Powered Root Cause Analysis at Scale: From Theory To Production Lessons From Nubank's 120M+ Cus - Letícia Mota & Yevgeny Gladun, Nubank This session presents an AI-powered SRE Agent designed to autonomously orchestrates complex, multi-source investigations by querying internal observability providers and knowledge bases. A primary focus is the "Data Volume Problem." Modern observability systems generate terabytes of metrics and logs daily; at Nubank’s scale, the Prometheus MCP alone has more than 23,000 metrics available, while log queries can span billions of rows. The team overcame LLM context limits through on-premises data filtering, intelligent summarization, and selective context assembly. This architecture utilizes "Expert Guides" to reduce 23,000 raw metrics to approximately 14 relevant data points before LLM processing. The talk covers multi-source orchestration using the Model Context Protocol (MCP) for pluggable tool discovery, allowing the AI to progressively load and correlate only the observability sources. The platform enables the delivery of expert instructions for any specific scenario through targeted, versioned prompts. This transformation allows the platform to scale across the enterprise, performing virtually any investigative task beyond its original root cause analysis mission.

OpenTelemetry GenAI in Practice: What the Spec Says Vs. What You Actually See - Zach Groves, Datadog
▶︎

OpenTelemetry GenAI in Practice: What the Spec Says Vs. What You Actually See - Zach Groves, Datadog

The Full Picture: Visualizing Service "Fullness" To Rethink Saturation Prevention - Tal Nordan
▶︎

The Full Picture: Visualizing Service "Fullness" To Rethink Saturation Prevention - Tal Nordan

The Invisible Tax: How Data Format Conversions Drive up Telemetry... Cijo Thomas & Joshua MacDonald
▶︎

The Invisible Tax: How Data Format Conversions Drive up Telemetry... Cijo Thomas & Joshua MacDonald

Panel: Telemetry That Matters - Diana Todea, Antonio Jimenez Martinez & Laura Luttmer
▶︎

Panel: Telemetry That Matters - Diana Todea, Antonio Jimenez Martinez & Laura Luttmer

Keynote: 10 Million Spans Per Second: Lessons From Scaling OpenTelemetry at Reddit - Trevor Riles
▶︎

Keynote: 10 Million Spans Per Second: Lessons From Scaling OpenTelemetry at Reddit - Trevor Riles

This is not the AI we were promised | The Royal Society
▶︎

This is not the AI we were promised | The Royal Society

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Yann LeCun's $1B Bet Against LLMs [Part 1]
▶︎

Yann LeCun's $1B Bet Against LLMs [Part 1]

First findings from Project Glasswing
▶︎

First findings from Project Glasswing

How AI agents & Claude skills work (Clearly Explained)
▶︎

How AI agents & Claude skills work (Clearly Explained)

Leading in the Age of AI: A Conversation with NVIDIA CEO Jensen Huang | Global Conference 2026
▶︎

Leading in the Age of AI: A Conversation with NVIDIA CEO Jensen Huang | Global Conference 2026

The Uncomfortable Truth About AI “Reasoning” | World Science Festival
▶︎

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

Taming Observability at Scale in a Multi-Cluster Kubernetes Platform at Bloom... Joe Nathan Abellard
▶︎

Taming Observability at Scale in a Multi-Cluster Kubernetes Platform at Bloom... Joe Nathan Abellard

Stop Rambling: The 3-2-1 Speaking Trick That Makes You Sound Like A CEO
▶︎

Stop Rambling: The 3-2-1 Speaking Trick That Makes You Sound Like A CEO

What AI Agent Skills Are and How They Work
▶︎

What AI Agent Skills Are and How They Work

The French Do Not Care About Work
▶︎

The French Do Not Care About Work

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Inside the Telemetry Data Plane: Constraints, Tradeoffs, and Scale - José Lecaros
▶︎

Inside the Telemetry Data Plane: Constraints, Tradeoffs, and Scale - José Lecaros

Ex-Google Recruiter Explains Why "Lying" Gets You Hired
▶︎

Ex-Google Recruiter Explains Why "Lying" Gets You Hired

Secure by Design: Rethinking Test Credentials for Synthetic Monitoring - Katie Kodes, Katie Kodes
▶︎

Secure by Design: Rethinking Test Credentials for Synthetic Monitoring - Katie Kodes, Katie Kodes