The AI PM Skill That Gets You Instant Job Offers | Aparna, Arize AI
Aparna Dhinakaran, CPO of Arize AI ($131M raised), shows exactly how to build a PM agent in Claude Code, instrument it with observability, run evals against it, and close the self-improvement loop, all in one live session. If you want to understand what serious AI eval practice looks like in 2025, this is the episode. Full Writeup: https://www.news.aakashg.com/p/aparna... Transcript: https://www.aakashg.com/how-to-build-... --- Timestamps: 00:00 - Intro 01:34 - What PMs get wrong 04:35 - Why product taste is the alpha 06:30 - The build-trace-eval-loop framework 08:04 - Ads 09:28 - Building the agent in Claude Code 19:00 - Instrumentation in one command 22:00 - Traces streaming into Arize live 28:00 - Asking Claude to suggest evals 31:13 - Ads 33:58 - Vibe evals vs axial coding 48:50 - Looping the improvement automatically 01:06:00 - The context graph unlock 01:18:55 - Outro --- Thanks to our sponsors: 1. Superhuman - Sign up and get 1-month free of Superhuman Mail with my link: superhuman.com/akash 2. Land PM Job - Land your next PM role faster - https://landpmjob.com 3. Vanta - Automate your compliance - http://vanta.com/aakash 4. Product Faculty - Get $550 off their AI PM Certification with code AAKASH550C7 - https://maven.com/product-faculty/ai-... 5. Bolt - Ship AI-powered products 10x faster - https://bolt.new --- Key Takeaways: 1. Trace before you eval - A trace is the full step-by-step playback of what your agent did. Without it, you have no evidence base for evals. Every LLM call, every tool call, every intermediate output needs to be visible before you write a single eval. 2. A span is your unit of evaluation - A span is one discrete step inside a trace. Evals run at the span level, not the trace level. "Did this specific scoring step get the priority right?" is a more useful question than "was the whole run good?" 3. Instrumentation is now a one-command job - Claude Code's instrumentation skills can set up observability for your agent automatically. Arize Phoenix's skill looks at your codebase, identifies the LLM calls and tool calls, and wires them to the tracing layer. No engineering support required. 4. The vibe eval is a draft, not a verdict - An LLM can suggest what your evals should test by looking at your traces. That suggestion will not know your bug-first policy, your comp logic, or your definition of "critical." Treat it as v0 and refine against your actual judgment. 5. When evals fire, two things could be wrong - The agent produced a bad output. Or the eval is miscalibrated. Reading the flagged span yourself is the only way to know which one needs fixing. Both are normal. Both are good news. 6. Evals drift and need regular realignment - Your priorities change. Your bug policy changes. Your product changes. An eval calibrated to last quarter will start misfiring this quarter. Regular alignment to human feedback is maintenance, not a failure. 7. The self-improvement loop is already running at the best teams - Fetch all spans where evals fired. Group by failure category. Propose a specific prompt fix. Review and approve. Ship the new version. This loop runs on a schedule and requires a human at the approval step. 8. Enterprise PMs: start with one internal agent - Not a customer-facing product. An internal tool that takes four hours off your week. Once you have it, you will naturally want to trace it. That is when observability starts to matter to you personally. 9. The context graph is the enterprise unlock - Agents are only as useful as the context they have. Enterprise data lives in silos. The teams breaking through are building unified context layers that give one agent access to CRM, Gong, analytics, GitHub, and Slack. 10. Product taste is still the alpha - Code is cheap now. Shipping speed is table stakes. The PMs who pull ahead are the ones with the sharpest judgment about what to build, and the loops that make their agents better every day. --- Where to find Aparna Dhinakaran: LinkedIn: / aparnadhinakaran Arize AI: https://arize.com Where to find Aakash: X: https://x.com/aakashgupta LinkedIn: / aagupta Newsletter: https://www.news.aakashg.com #AIagents #ProductManagement --- About Product Growth: The world's largest podcast focused solely on product + growth, with over 200K+ listeners. Subscribe and turn on notifications.

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

I got an inside look at how OpenAI PMs ship code

Software Reviews Are Broken, So We Rebuilt Efficient App

How To Use Claude Better Than 99% Of People

you need to use Hermes RIGHT NOW!! (goodbye OpenClaw!!)

The most rational take on AI you’ll hear this year

How this PM Used Claude Code to Support 20 People

If You Don’t Understand AI Evals, Don’t Build AI

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

How to Make Claude Code Your AI Engineering Team

The OpenAI PM Who Grew ChatGPT to 900M Users Demos His Actual Setup

How To ACE AI Product Sense Interviews (OpenAI PM Mock Interview)

Learn 97% of Claude in Under 16 Minutes

Don’t Waste 2026 on the Wrong Career - How to Pick the PERFECT Tech Role

Anthropic Just Dropped a Masterclass on Building Agent Harnesses (for Large Codebases)

Ex-Amazon AI Leader: In 1 Year, the Gap Between AI Users and Everyone Else Will Be Irreversible

Updated Essential AI Skills For 2026

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Browsers Are Dead. Codex Just Replaced Them.

