The Dark Arts of ML Benchmarking - Yonatan Alexander
Summary Most ML benchmarks are quietly broken. Leaderboard gaming, data leaks, logging errors, and poorly designed execution functions mean teams spend more time debugging than learning. The best case is rare: implement once, run once, analyze without pain. This talk shares hard-won wisdom practitioners rarely document: designing experiments backwards from the questions you want to answer, building robust single-execution functions that are cacheable, stateless, and failure-aware, saving raw responses, and versioning everything at the right level of rigor. We will also demo xetrack, an open-source Python experiment tracking library built for practitioners who want lightweight logging without vendor lock-in. xetrack ships with a Claude skill that acts as a built-in methodology guide, helping AI agents design experiments correctly, avoid common pitfalls, and work methodically from the start. If you have ever stared at a 3 AM benchmark failure, wondering what went wrong, this talk is for you. About Yonatan Alexander Yonatan Alexander builds AI systems that ship fast, work at scale, and actually solve problems. As Head of AI at Lasso Security, he leads teams building production ML systems for enterprise security. He invented a patent-pending LLM inference architecture achieving 570X cost reduction and pioneered serverless machine learning before it became an industry standard. Yonatan is the creator of xetrack, an open-source experiment tracking library, and a technical advisor to Vaex (8.5K+ GitHub stars), helping shape how Python handles billion-row datasets on standard hardware. His "Beyond Pandas" article has been read 18.8K times by practitioners navigating real-world data challenges. He has delivered technical talks at PyData and AIGrunn on the messy realities of production AI, from LLM hallucinations to the gap between demos and deployed systems. His "Branches Are All You Need" framework influences how teams approach ML versioning.

Testing for Confidence - Jaap Bresser

Json freedom or chaos; how to trust your data - Bart Dorlandt

Trump Preps for 80th Birthday, Threatens to Hit Iran, Knicks Historic Win & Elon Musk Trillionaire!?

From YOLO to Secure Autonomous Dev Agents - Ard Timmerman

From YOLO to Secure Autonomous Dev Agents - Michiel Beijen

Can you VibeCode a business? - Vincent Luder

From Farm to Cloud:Building a massive Realtime Streaming Ingest Wieneke Keller&Sebastian Lenartowicz

Why Aliens Would NEVER Invade Africa

Cork Talk with Stellios Boutaris

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Beyond The Founder: Navigating Your Agency's Next Chapter

HOLY ROSARY TODAY THURSDAY, JUNE 11, 2026 ST. JUDE THADDEUS & LUMINOUS MYSTERIES | DAILY HOLY ROSARY

The Insane Genius of a Formula 1 Gearbox

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Craig Pennington - A case study in placemaking | Community Music Konferenz 2025

Layered Architecture for Readable, Robust, and Extensible Apps - Mike Huls

10 Images | Coastal Citrus Floral Summer Paintings Screensaver l Frame TV ART |

Something is jamming GPS over Europe. Here's what we found

Full Walkthrough: Workflow for AI Coding — Matt Pocock

