QuantUniversity AI Spring School: Poor Measurement in AI

AI performance is often measured using benchmarks that fail to capture real-world effectiveness. Common pitfalls—like confirmation bias, task contamination, and Goodhart’s law—can lead to misleading assessments, raising concerns about the reliability of today’s AI evaluation practices. In this insightful session, Patrick Hall will break down the risks of poor AI measurement and explore cutting-edge solutions, including NIST ARIA, H2O.ai HCAT, and stakeholder-informed evaluations that ensure AI systems are fair, transparent, and robust. 🔎 What you'll learn: ✅ The hidden risks in AI benchmarking and evaluation ✅ How AI measurement failures impact decision-making ✅ Promising new frameworks and methodologies shaping the future of AI assessment

Agentic AI in Finance: What is Real, What is Hype, and What Changes Now
▶︎

Agentic AI in Finance: What is Real, What is Hype, and What Changes Now

Testing & Validating GenAI Applications
▶︎

Testing & Validating GenAI Applications

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026
▶︎

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

The rise of the AI bureaucrat | Yuval Noah Harari
▶︎

The rise of the AI bureaucrat | Yuval Noah Harari

Empathy & Compassion Research - Speaker Series (Spring 2026) - Dr. Ryan Moran
▶︎

Empathy & Compassion Research - Speaker Series (Spring 2026) - Dr. Ryan Moran

Navigating AI and Model Risk Management in Banking
▶︎

Navigating AI and Model Risk Management in Banking

Understand AI in 14 minutes – with Anthropic's Chloe Lubinski [ARC 2026]
▶︎

Understand AI in 14 minutes – with Anthropic's Chloe Lubinski [ARC 2026]

Byte-Sized Briefing Webinar Series – Collect Less, Risk Less: Data Minimization in Practice
▶︎

Byte-Sized Briefing Webinar Series – Collect Less, Risk Less: Data Minimization in Practice

Decoding AI: Challenges in Classification, Measurement, and Evaluation |  The Athens Roundtable 2023
▶︎

Decoding AI: Challenges in Classification, Measurement, and Evaluation | The Athens Roundtable 2023

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup
▶︎

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

AI Evaluation from First Principles: You Can't Manage What You Can't Measure
▶︎

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Navigating the Future with Generative AI and Enhanced Model Risk Management
▶︎

Navigating the Future with Generative AI and Enhanced Model Risk Management

God Says:"MY CHILD, I NEED TO SEE YOU URGENTLY!"/God Message Now/God Message
▶︎

God Says:"MY CHILD, I NEED TO SEE YOU URGENTLY!"/God Message Now/God Message

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud
▶︎

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!
▶︎

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Managing Your Research Data: What You Need to Know
▶︎

Managing Your Research Data: What You Need to Know

We’re Using AI for Everything. What Are the Steps to Get Ready?
▶︎

We’re Using AI for Everything. What Are the Steps to Get Ready?

GNRC Regional Council Body Meeting - June 17 2026
▶︎

GNRC Regional Council Body Meeting - June 17 2026

Deep Dive into LLMs like ChatGPT
▶︎

Deep Dive into LLMs like ChatGPT

Think Fast, Talk Smart: Communication Techniques
▶︎

Think Fast, Talk Smart: Communication Techniques