LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run. The talk covers the five flavors of eval signal (LLM as judge, human feedback, golden datasets, deterministic checks, business metrics), what scope to run them at (single span, multispan, trajectory, session), and where this is heading. Arize Phoenix is open source, runs as a single container, no Kubernetes required. The enterprise product adds an AI layer called Alex that scans traces, surfaces high latency and errors, and creates evals automatically. The stated goal: automate you out of the observability loop entirely. Speaker info: / datdarylngo https://x.com/dat_attacked

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Stop Prompting Claude. Use Karpathy's Method Instead.

Connecting the Dots with Context Graphs — Stephen Chin, Neo4j

Agentic RAG & LLMOps - How Observability helps (LangGraph & Opik)

Self-Training Agents: Hermes Agent, HF Traces, Skills, MCP & Finetuning — Merve Noyan, Hugging Face

Is RAG Still Needed? Choosing the Best Approach for LLMs

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.
![Nicholas Carlini - Black-hat LLMs | [un]prompted 2026](https://i.ytimg.com/vi/1sd26pWhfmg/hqdefault.jpg?sqp=-oaymwE9CNACELwBSFryq4qpAy8IARUAAAAAGAElAADIQj0AgKJDeAHwAQH4Af4JgALQBYoCDAgAEAEYciBmKDYwDw==&rs=AOn4CLBn1sRfbeYcMnkqD2mtRZhq1TO6JQ)
Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

Hermes Architecture EXPLAINED: Memory, Context & Gateways

The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks

Why The Best Engineers Are Solving Code Review Bottlenecks

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Omnigent: The New Meta-Harness for EVERY Coding Agent - Claude Code, Codex, Pi, More

WTF Is an "AI Agent Loop"? Genius or Hype?

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

/handoff is my new favourite skill

I Made Opus 4.8 and Fable 5 Build the Same App (RAW RESULTS)

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

