Paul Christiano: Formalizing Explanations of Neural Network Behaviors
Paul Christiano (Alignment Research Center): October 26 Abstract: Existing research on mechanistic interpretability usually tries to develop an informal human understanding of “how a model works,” making it hard to evaluate research results and raising concerns about scalability. Meanwhile formal proofs of model properties seem far out of reach both in theory and practice. In this talk I’ll discuss an alternative strategy for “explaining” a particular behavior of a given neural network. This notion is much weaker than proving that the network exhibits the behavior, but may still provide similar safety benefits. This talk will primarily motivate a research direction and a set of theoretical questions rather than presenting results. Course homepage: https://sites.google.com/view/m-ml-sy...

Francois Charton: Transformers for maths, and maths for transformers

Paul Christiano - How Misalignment Could Lead to Takeover

Let's build GPT: from scratch, in code, spelled out.

Greg Yang: The unreasonable effectiveness of mathematics in large scale deep learning

Paul Christiano — Preventing an AI takeover

There are 22 theories of consciousness: Michael Pollan explains why none of them work

Argentinien – Österreich Highlights | Gruppe J, FIFA WM 2026 | sportstudio

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

Clara Mattei: capitalism is not natural - it’s enforced

Denis Noble: "Neo-Darwinism Is Dead" | We Need A Biology Beyond Genes

The Russian Mindset and Where it Comes From - Historian Sir Antony Beevor

300 Years of Classical Music in 18 Minutes | Joshua Bell | TED

China Just Built What TSMC Said Was Impossible

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD

ACLS Drugs Review with Nurse Eunice 📚💉

David Brooks | How to Know a Person: The Art of Seeing Others Deeply and Being Deeply Seen

Sam Harris and Richard Dawkins in Conversation

Former Open AI Researcher Paul Christiano on Eliciting Latent Knowledge

🩺 2024 Medical Terminology Made Easy - Part 1

