How difficult is AI alignment? | Anthropic Research Salon
At an Anthropic Research Salon event in San Francisco, four of our researchers—Alex Tamkin, Jan Leike, Amanda Askell and Josh Batson—discussed alignment science, interpretability, and the future of AI research. Further reading: Anthropic’s research: https://anthropic.com/research Claude’s character: https://www.anthropic.com/news/claude... Evaluating feature steering: https://www.anthropic.com/research/ev... 0:00 Introduction 0:30 An overview of alignment 4:48 Challenges of scaling 8:08 Role of interpretability 12:02 How models can help 14:31 Signs of whether alignment is easy or hard 18:28 Q&A — Multi-agent deliberation 20:38 Q&A — Model alignment epiphenomenon 23:43 Q&A — What solving alignment could look like

▶︎
Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

▶︎
Interpretability: Understanding how AI models think

▶︎
FULL DISCUSSION: Google's Demis Hassabis, Anthropic's Dario Amodei Debate the World After AGI | AI1G

▶︎
The French Do Not Care About Work

▶︎
Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

▶︎
Can You Teach Claude to be ‘Good’? | Meet Anthropic Philosopher Amanda Askell

▶︎
Something is jamming GPS over Europe. Here's what we found

▶︎
Alignment faking in large language models

▶︎
Anthropic CEO warns that without guardrails, AI could be on dangerous path

▶︎
WWDC 2026 Impressions: Yeah, That's About Right

▶︎
Infantino stinksauer, leere Ränge, Buh-Rufe - und 200.000 Tickets übrig! RIP Fußball WM 2026

▶︎
Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next

▶︎
Is Lilly from THE PRINCESS DIARIES a Toxic Friend?

▶︎
Scaling Laws: Claude's Constitution, with Amanda Askell

▶︎
Could AI models be conscious?

▶︎
Building Anthropic | A conversation with our co-founders

▶︎
What is Al "reward hacking"—and why do we worry about it?

▶︎
Anthopic, OpenAI Should Not Be Allowed to IPO, Says Ed Zitron

▶︎
Anthropic’s philosopher answers your questions

▶︎
