End-to-End GenAI Observability: Infrastructure, Agents, and Applications
Join this L300 session to discover how to build comprehensive observability across your entire GenAI stack: from GPU utilization and infrastructure health to agent decision flows, tool invocations, and application performance. CloudWatch OTLP for Metrics Leverage OpenTelemetry Protocol (OTLP) for standardized metrics collection Configure CloudWatch to receive OTLP metrics from distributed GenAI workloads Enable vendor-agnostic observability across your infrastructure Infrastructure Observability Monitor inference workloads on Amazon EKS using CloudWatch Container Insights Collect GPU metrics (utilization, memory, power draw, temperature) with Nvidia DCGM exporter Visualize infrastructure and inference performance using Amazon Managed Prometheus and Grafana Leverage community-driven Grafana dashboards for zero-configuration monitoring Inference Performance Metrics Track VLLM (open-source LLM serving tool) performance metrics Monitor time-to-first-token and end-to-end request latency Analyze token consumption and generation patterns Agent Observability Beyond Agent Core Instrument agents deployed on EKS (outside Bedrock Agent Core runtime) using OpenTelemetry Enable auto-instrumentation without code changes using AWS Distro for OpenTelemetry Configure telemetry collection through environment variables in Kubernetes deployments Use CloudWatch GenAI Agent Core observability capabilities for agents on any platform Session and Trace Management View complete traces with timeline and trajectory visualizations Track token counts (input/output) for each model invocation Analyze agent reasoning, tool calls, and decision flows Correlate traces across multiple agent interactions using session IDs Access integrated logs, metrics, and traces in CloudWatch Key Takeaways Zero infrastructure overhead for observability setup OpenTelemetry-based approach works with agents on EKS, ECS, EC2, or any platform CloudWatch GenAI observability features extend beyond Bedrock Agent Core runtime For more events like this see our Cloud Operations Enablement series: https://aws-experience.com/amer/smb/e...

I didn't know Amazon CloudWatch could do that! - April 2026

Unity Catalog Workshop: Unified, open governance for data and AI (July 2024)

How to Monitor, Debug, and Trust Agentic AI Systems - Observability in Agentic AI

Accelerate Incident Resolution with AI Ops using CloudWatch and AWS MCP servers - Aug 2025

FME Bootcamp by Consortech : AI Assist, Data Virtualization, MCP

How to Build & Sell AI Agents: Ultimate Beginner’s Guide

I didn't know Amazon CloudWatch could do that - Oct 2025

NVIDIA CEO Jensen Huang's Vision for the Future

Incident Response Accelerator: CloudWatch Database Insights for Rapid Resolution

Streamline Database Operations with CloudWatch Database Insights and Amazon DevOps Agent

NestJS Full Course for Beginners in 2026 | Build a Production-Ready API

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Zero to Hero Improve Operations by Gaining Actionable Database Insights with CloudWatch - Feb 2026

CloudWatch GenAI Observability for Amazon Bedrock and Bedrock AgentCore - March 2026
![Kubernetes Tutorial for Beginners [FULL COURSE in 4 Hours]](https://i.ytimg.com/vi/X48VuDVv0do/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDNg7nINwKqigXGqrL80FN9YuTNGg)
Kubernetes Tutorial for Beginners [FULL COURSE in 4 Hours]

Migration Spring - Agentic-first legacy ETL migration to Databricks

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Evolution of the Single-Pane-of-Glass: From Dashboards to DevOps Agents

From Alert to Action: Streamlining Container Operations with CloudWatch - April 2026

