MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu

Title: Efficiently Modeling Long Sequences with Structured State Spaces Speaker: Albert Gu Abstract: A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of 10000 or more steps. This talk introduces the Structured State Space sequence model (S4), a simple new model based on the fundamental state space representation $x*(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)$. S4 combines elegant properties of state space models with the recent HiPPO theory of continuous-time memorization, resulting in a class of structured models that handles long-range dependencies mathematically and can be computed very efficiently. S4 achieves strong empirical results across a diverse range of established benchmarks, particularly for continuous signal data such as images, audio, and time series. Speaker Bio: Albert Gu is a final year Ph.D. candidate in the Department of Computer Science at Stanford University, advised by Christopher Ré. His research broadly studies structured representations for advancing the capabilities of machine learning and deep learning models, with focuses on structured linear algebra, non-Euclidean representations, and theory of sequence models. Previously, he completed a B.S. in Mathematics and Computer Science at Carnegie Mellon University, and an internship at DeepMind in 2019. ------ The MedAI Group Exchange Sessions are a platform where we can critically examine key topics in AI and medicine, generate fresh ideas and discussion around their intersection and most importantly, learn from each other. We will be having weekly sessions where invited speakers will give a talk presenting their work followed by an interactive discussion and Q&A. Our sessions are held every Thursday from 1pm-2pm PST. To get notifications about upcoming sessions, please join our mailing list: https://mailman.stanford.edu/mailman/... For more details about MedAI, check out our website: https://medai.stanford.edu. You can follow us on Twitter @MedaiStanford Organized by members of the Rubin Lab (http://rubinlab.stanford.edu) Nandita Bhaskhar (https://www.stanford.edu/~nanbhas) Siyi Tang (https://siyitang.me)

MedAI #42: Domain Adaptation with Invariant Representation Learning | Petar Stojanov

MedAI #42: Domain Adaptation with Invariant Representation Learning | Petar Stojanov

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (COLM Oral 2024)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (COLM Oral 2024)

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

What is a Hilbert Space?

What is a Hilbert Space?

BREAKING: Trump’s Epstein problem returns with blockbuster testimony

BREAKING: Trump’s Epstein problem returns with blockbuster testimony

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Trump Gets Booed & Falls Asleep During NBA Finals, Claims War is Almost Over & Goodbye Spencer Pratt

Trump Gets Booed & Falls Asleep During NBA Finals, Claims War is Almost Over & Goodbye Spencer Pratt

Do we need Attention? A Mamba Primer

Do we need Attention? A Mamba Primer

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

But what is the Fourier Transform? A visual introduction.

But what is the Fourier Transform? A visual introduction.

Jeff Dean (Google): Exciting Trends in Machine Learning

Jeff Dean (Google): Exciting Trends in Machine Learning

AlphaGo - The Movie | Full award-winning documentary

AlphaGo - The Movie | Full award-winning documentary

MedAI #158: DT-BEHRT: Disease Trajectory-aware Transformer for Int. Patient Rep. Learning | Deyi Li

MedAI #158: DT-BEHRT: Disease Trajectory-aware Transformer for Int. Patient Rep. Learning | Deyi Li