Using Large Language Models for Evaluation: Opportunities and Limitations

Talk Title: Using Large Language Models for Evaluation: Opportunities and Limitations Speaker: Prof. Emine Yilmaz Date: May 27, 2026 Abstract: Large Language Models (LLMs) have shown significant promise as tools for automated evaluation across diverse domains. While the use of LLMs for evaluation offers substantial advantages—potentially reducing reliance on costly and subjective human assessments—the adoption of LLM-based evaluation is not without challenges. In this talk, we discuss both the transformative potential and the inherent limitations of using LLMs for evaluation tasks. In particular, we highlight challenges such as bias and variability in judgments. We also explore how LLMs can augment traditional evaluation practices while emphasizing the need for a cautious and informed approach to their use.

Generative Diffusion Models: Optimization, Generalization and Fine-tuning
▶︎

Generative Diffusion Models: Optimization, Generalization and Fine-tuning

The Modern Mathematics of Artificial Intelligence: From Reliable AI to Quantum Computing
▶︎

The Modern Mathematics of Artificial Intelligence: From Reliable AI to Quantum Computing

LLM Evaluation Is Hard! Here’s the 3‑layer evaluation strategy
▶︎

LLM Evaluation Is Hard! Here’s the 3‑layer evaluation strategy

Deep Gaussian processes: theory and applications
▶︎

Deep Gaussian processes: theory and applications

Yann LeCun's $1B Bet Against LLMs
▶︎

Yann LeCun's $1B Bet Against LLMs

Training Data Attribution as Explanations: Insights from User Studies | Elisa Nguyen
▶︎

Training Data Attribution as Explanations: Insights from User Studies | Elisa Nguyen

Model Context Protocol (MCP), clearly explained (why it matters)
▶︎

Model Context Protocol (MCP), clearly explained (why it matters)

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

LIVE: Conan O’Brien speaks at Harvard graduation ceremony (full)
▶︎

LIVE: Conan O’Brien speaks at Harvard graduation ceremony (full)

Machine learning in solution of inverse problems: a subjective perspective
▶︎

Machine learning in solution of inverse problems: a subjective perspective

Claude Coding Tutorial For Beginners 2026 | How To Build Apps Faster With Claude Code | Simplilearn
▶︎

Claude Coding Tutorial For Beginners 2026 | How To Build Apps Faster With Claude Code | Simplilearn

AI Agents 1(a) - What are AI Agents, and why do they matter?
▶︎

AI Agents 1(a) - What are AI Agents, and why do they matter?

Don't learn AI Agents without Learning these Fundamentals
▶︎

Don't learn AI Agents without Learning these Fundamentals

What to do when you don't understand: Live English class
▶︎

What to do when you don't understand: Live English class

How AI agents & Claude skills work (Clearly Explained)
▶︎

How AI agents & Claude skills work (Clearly Explained)

Agentic Context Engineering (ACE) | Qizheng Zhang | Random Samples
▶︎

Agentic Context Engineering (ACE) | Qizheng Zhang | Random Samples

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI
▶︎

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

FULL DISCUSSION: Google's Demis Hassabis, Anthropic's Dario Amodei Debate the World After AGI | AI1G
▶︎

FULL DISCUSSION: Google's Demis Hassabis, Anthropic's Dario Amodei Debate the World After AGI | AI1G

Meet the Experts: Tuning Large Language Models with Limited Resources - The Adapters Approach
▶︎

Meet the Experts: Tuning Large Language Models with Limited Resources - The Adapters Approach

Magnus Invents A New Opening So OUTRAGEOUS, You'll Question EVERY Chess Principle!
▶︎

Magnus Invents A New Opening So OUTRAGEOUS, You'll Question EVERY Chess Principle!