How RWKV-7 "Goose" and It's Linear Inference Work with Author Eugene Cheah

Paper 📜 https://arxiv.org/abs/2503.14456 Links + Notes 📝 https://www.oxen.ai/blog/how-rwkv-7-g... Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 / discord Use Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Chapters 0:00 Why is RWKV-7 Goose interesting 2:53 How to quickly run RWKV-7 Goose 4:04 What is RWKV-7 10:20 RNN’s forget things 12:33 First paper: Reinventing RNNs for the Transformer Era 24:22 Paper author Eugene Cheah joins the dive 36:43 The intuition behind each model layer 47:57 Parallelization during training 53:01 How well did RWKV-7 do on benchmarks? 56:50 Live evals on RWKV-7 and fine-tuning tips 1:00:38 Why they made the World Tokenizer

RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)

RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

SLMs - When and When NOT to use them (+ Mistral 3.1 & Gemma-3 Bakeoff)

SLMs - When and When NOT to use them (+ Mistral 3.1 & Gemma-3 Bakeoff)

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

How Vision LLMs Reasoning: Inside LLaVa CoT

How Vision LLMs Reasoning: Inside LLaVa CoT

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Pareto-efficient AI systems—Simran Arora (Stanford)

Pareto-efficient AI systems—Simran Arora (Stanford)

Sean Carroll | The Passage of Time & the Meaning of Life

Sean Carroll | The Passage of Time & the Meaning of Life

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Robotics' End Game: Nvidia's Jim Fan

Robotics' End Game: Nvidia's Jim Fan

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

How to Fine-Tune FLUX-dev and Comparing it to a Fine-Tuned PixArt Model

How to Fine-Tune FLUX-dev and Comparing it to a Fine-Tuned PixArt Model

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Agentic AI Systems: from scruffy to neat by Kevin Murphy June 2026

Agentic AI Systems: from scruffy to neat by Kevin Murphy June 2026

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI