Stanford CS25: V5 I Transformers in Diffusion Models for Image Generation and Beyond

May 27, 2025 Sayak Paul of Hugging Face Diffusion models have been all the rage in recent times when it comes to generating realistic yet synthetic continuous media content. This talk covers how Transformers are used in diffusion models for image generation and goes far beyond that. We set the context by briefly discussing some preliminaries around diffusion models and how they are trained. We then cover the UNet-based network architecture that used to be the de facto choice for diffusion models. This helps us to motivate the introduction and rise of transformer-based architectures for diffusion. We cover the fundamental blocks and the degrees of freedom one can ablate in the base architecture in different conditional settings. We then shift our focus to the different flavors of attention and other connected components that the community has been using in some of the SoTA open models for various use cases. We conclude by shedding light on some promising future directions around efficiency. Speaker: Sayak works on diffusion models at Hugging Face. His day-to-day includes contributing to the diffusers library, training and babysitting diffusion models, and working on applied ideas. He's interested in subject-driven generation, preference alignment, and evaluation of diffusion models. When he is not working, he can be found playing the guitar and binge-watching ICML tutorials and Suits. More about the course can be found here: https://web.stanford.edu/class/cs25/ View the entire CS25 Transformers United playlist: • Stanford CS25 - Transformers United

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Stanford CS25: Transformers United V6 I Serving Transformers: Lessons from the Trenches

Stanford CS25: Transformers United V6 I Serving Transformers: Lessons from the Trenches

Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models

Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models

Hands-On Power BI Tutorial 📊 Beginner to Pro [Full Course] 2023 Edition⚡

Hands-On Power BI Tutorial 📊 Beginner to Pro [Full Course] 2023 Edition⚡

Think Fast, Talk Smart: Communication Techniques

Think Fast, Talk Smart: Communication Techniques

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

Inference, Diffusion, World Models, and More | YC Paper Club

Inference, Diffusion, World Models, and More | YC Paper Club

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD

Sean Carroll | The Passage of Time & the Meaning of Life

Sean Carroll | The Passage of Time & the Meaning of Life

Stanford CS231N Deep Learning for Computer Vision | Spring 2025 | Lecture 16: Vision and Language

Stanford CS231N Deep Learning for Computer Vision | Spring 2025 | Lecture 16: Vision and Language

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stanford Webinar - Agentic AI: A Progression of Language Model Usage

Stop Over-provisioning for Application Startup by Grazino Casto

Stop Over-provisioning for Application Startup by Grazino Casto