BiomniBench: Evaluating AI Agents in Biology | Yunhao Qu
Join the reading group: https://hannes-stark.com/starkly-spea... Paper: BiomniBench: Evaluating AI Agents in Biology https://phylo.bio/blog/evaluating-ai-... Abstract: As AI agents become central to biological research, evaluation must keep pace. We examine why existing benchmarks fall short for biology, share lessons from our experience with BixBench including a verified subset, and introduce BiomniBench, a trace-based evaluation framework that scores agents on their analytical process, not just the final answer. Biomni Lab achieves state-of-the-art performance across both general-purpose and domain-specific agents on both benchmarks.

▶︎
Yann LeCun: World Models: Enabling the next AI revolution

▶︎
Don't learn AI Agents without Learning these Fundamentals

▶︎
One-step Language Modeling via Continuous Denoising | Nicholas Boffi

▶︎
AlphaFold - The Most Useful Thing AI Has Ever Done

▶︎
Session 3: Scaling Industrial PHM with Foundation Models and AI Agents

▶︎
Don't Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic

▶︎
How AI Cracked the Protein Folding Code and Won a Nobel Prize

▶︎
20 AI Concepts Explained in 40 Minutes

▶︎
Training Sand to Think: Artificial General Intelligence & Future of Physics

▶︎
Why AI Agents are either the best or worst thing we’ve ever built

▶︎
Trump’s Big Violent 80th Birthday Party at the White House, "Great Deal" with Iran & NY Knicks Win

▶︎
Andrej Karpathy: Software Is Changing (Again)
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
▶︎
Yann LeCun's $1B Bet Against LLMs [Part 1]

▶︎
Text Diffusion — Brendan O’Donoghue, Google DeepMind

▶︎
How AI agents & Claude skills work (Clearly Explained)

▶︎
AI, Machine Learning, Deep Learning and Generative AI Explained

▶︎
Something is jamming GPS over Europe. Here's what we found

▶︎
Webinar #2 - Agentic Systems and Conversational Assistants

▶︎
Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

▶︎
