Same Model, Same Benchmark, 42% vs 95% — What Went Wrong? | Dr. Cozmin Ududec, AI Security Institute
Do you have any questions or points to add to the discussion? Any lightbulb moments? Share in the comments! --- Through the Open Seminar Series, we're opening select lectures from the AI Evaluation Programme to anyone in the wider community who wants to learn. These are the same sessions our students attend. --- We built evaluation for models that answer questions. Now we have systems that take actions. That changes everything. In this session, Dr. Cozmin Ududec explored how evaluating AI agents requires a different lens — one that looks at behavior over time, not just final outputs, and asks not just did it succeed, but how did it get there?

▶︎
Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

▶︎
How Much Should You Trust an AI's Answer? | Dr. Thomas Dietterich | AI Evaluation Open Seminar

▶︎
Evaluating Multi-Agent AI Systems by Dr. Joel Leibo (Google DeepMind) | AI Evaluation Open Seminar
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
▶︎
Yann LeCun's $1B Bet Against LLMs [Part 1]

▶︎
The Power of a Single Neuron and a Path to Simulating the Brain | Dr. Konrad Kording

▶︎
Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

▶︎
Is the AfD a threat to Germany? Mehdi Hasan & Maximilian Krah | Head to Head

▶︎
Politics Chat, June 25, 2026
![You’ll stop using ChatGPT after listening to this | Jonathan Pageau [ARC 2026]](https://i.ytimg.com/vi/yZUuKzDQSsI/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLAXTozuIcoGA_3ys1pkvHYXgL8C4Q)
▶︎
You’ll stop using ChatGPT after listening to this | Jonathan Pageau [ARC 2026]

▶︎
Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

▶︎
The Uncomfortable Truth About AI “Reasoning” | World Science Festival

▶︎
🔥 GOD UNLEASHES the Truth | Psalms 23, 35, 91 and 112 To Break Curses and Activate Abundance

▶︎
Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

▶︎
Europe Has Become a War Project — Can It Be Stopped? | Yanis Varoufakis & Jeffrey Sachs
![Master No Code Chatbots With Copilot Studio (Formerly Power Virtual Agents) [Full Course]](https://i.ytimg.com/vi/nYxf8ndIBE0/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCDSuC2zfv72qnTbKu4dkMBDhkYUg)
▶︎
Master No Code Chatbots With Copilot Studio (Formerly Power Virtual Agents) [Full Course]

▶︎
Ilya Sutskever – We're moving from the age of scaling to the age of research

▶︎
Historian Timothy Snyder on ENDING Trump Nightmare FOR GOOD | PoliticsGirl

▶︎
Are AI Benchmarks Actually Measuring Anything? | Dr. Sanmi Koyejo (Stanford) | AI Evaluation Seminar

▶︎
RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

▶︎
