BiomniBench: Evaluating AI Agents in Biology | Yunhao Qu

Join the reading group: https://hannes-stark.com/starkly-spea... Paper: BiomniBench: Evaluating AI Agents in Biology https://phylo.bio/blog/evaluating-ai-... Abstract: As AI agents become central to biological research, evaluation must keep pace. We examine why existing benchmarks fall short for biology, share lessons from our experience with BixBench including a verified subset, and introduce BiomniBench, a trace-based evaluation framework that scores agents on their analytical process, not just the final answer. Biomni Lab achieves state-of-the-art performance across both general-purpose and domain-specific agents on both benchmarks.

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

One-step Language Modeling via Continuous Denoising | Nicholas Boffi

One-step Language Modeling via Continuous Denoising | Nicholas Boffi

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Session 3: Scaling Industrial PHM with Foundation Models and AI Agents

Session 3: Scaling Industrial PHM with Foundation Models and AI Agents

Don't Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic

Don't Build Agents, Build Skills Instead – Barry Zhang & Mahesh Murag, Anthropic

How AI Cracked the Protein Folding Code and Won a Nobel Prize

How AI Cracked the Protein Folding Code and Won a Nobel Prize

20 AI Concepts Explained in 40 Minutes

20 AI Concepts Explained in 40 Minutes

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

Why AI Agents are either the best or worst thing we’ve ever built

Why AI Agents are either the best or worst thing we’ve ever built

Trump’s Big Violent 80th Birthday Party at the White House, "Great Deal" with Iran & NY Knicks Win

Trump’s Big Violent 80th Birthday Party at the White House, "Great Deal" with Iran & NY Knicks Win

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Text Diffusion — Brendan O’Donoghue, Google DeepMind

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Webinar #2 - Agentic Systems and Conversational Assistants

Webinar #2 - Agentic Systems and Conversational Assistants

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?

The Limits of AI: Generative AI, NLP, AGI, & What’s Next?