LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run. The talk covers the five flavors of eval signal (LLM as judge, human feedback, golden datasets, deterministic checks, business metrics), what scope to run them at (single span, multispan, trajectory, session), and where this is heading. Arize Phoenix is open source, runs as a single container, no Kubernetes required. The enterprise product adds an AI layer called Alex that scans traces, surfaces high latency and errors, and creates evals automatically. The stated goal: automate you out of the observability loop entirely. Speaker info:   / datdarylngo   https://x.com/dat_attacked

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS
▶︎

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM
▶︎

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Stop Prompting Claude. Use Karpathy's Method Instead.
▶︎

Stop Prompting Claude. Use Karpathy's Method Instead.

Connecting the Dots with Context Graphs — Stephen Chin, Neo4j
▶︎

Connecting the Dots with Context Graphs — Stephen Chin, Neo4j

Agentic RAG & LLMOps - How Observability helps (LangGraph & Opik)
▶︎

Agentic RAG & LLMOps - How Observability helps (LangGraph & Opik)

Self-Training Agents: Hermes Agent, HF Traces, Skills, MCP & Finetuning  — Merve Noyan, Hugging Face
▶︎

Self-Training Agents: Hermes Agent, HF Traces, Skills, MCP & Finetuning — Merve Noyan, Hugging Face

Is RAG Still Needed? Choosing the Best Approach for LLMs
▶︎

Is RAG Still Needed? Choosing the Best Approach for LLMs

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.
▶︎

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026
▶︎

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

Hermes Architecture EXPLAINED: Memory, Context & Gateways
▶︎

Hermes Architecture EXPLAINED: Memory, Context & Gateways

The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks
▶︎

The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks

Why The Best Engineers Are Solving Code Review Bottlenecks
▶︎

Why The Best Engineers Are Solving Code Review Bottlenecks

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Omnigent: The New Meta-Harness for EVERY Coding Agent - Claude Code, Codex, Pi, More
▶︎

Omnigent: The New Meta-Harness for EVERY Coding Agent - Claude Code, Codex, Pi, More

WTF Is an "AI Agent Loop"? Genius or Hype?
▶︎

WTF Is an "AI Agent Loop"? Genius or Hype?

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

/handoff is my new favourite skill
▶︎

/handoff is my new favourite skill

I Made Opus 4.8 and Fable 5 Build the Same App (RAW RESULTS)
▶︎

I Made Opus 4.8 and Fable 5 Build the Same App (RAW RESULTS)

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
▶︎

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

What AI Agent Skills Are and How They Work
▶︎

What AI Agent Skills Are and How They Work