Deep dive into Mixture of Experts (MOE) with the Mixtral 8x7B paper

Arxiv Dives is a group from Oxen.ai of engineers, researchers, and practitioners that gets together every Friday to dig into state of the art research that relates to Machine Learning and Artificial Intelligence. If you would like to join the live discussion we would love to have you! Join here: https://lu.ma/oxenbookclub Each week we dive deep into a topic in ML/AI. Whether it is a research paper, a blog post, a book, or a YouTube video, we break down the content into a digestible format and have an open discussion with the Oxen.ai team, and anyone else who wants to join. We try to cover the content as high level so that anyone can understand it, and will dive into deeper technical details to get a clearer understanding. This week we cover the Mixtral paper from the team at Mistral.ai. This paper goes over how Mistral used Mixture of Experts (MOE) in their latest Mistral-8x7B-instruct-1.0 paper to achieve better performance than larger models as well as competitive performance with GPT-3.5. All the notes and previous dives can all be found on the Oxen.ai blog: https://blog.oxen.ai/tag/arxiv-dives/

Practical ML - Benchmarking RAG with variety of LLMs over 100k+ documents

Practical ML - Benchmarking RAG with variety of LLMs over 100k+ documents

Understanding Mixture of Experts

Understanding Mixture of Experts

AI Keeps Memory Prices High, IBM's NanoStack Breakthrough, Google AI Agents & Data Center Backlash

AI Keeps Memory Prices High, IBM's NanoStack Breakthrough, Google AI Agents & Data Center Backlash

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

How RWKV-7 "Goose" and It's Linear Inference Work with Author Eugene Cheah

How RWKV-7 "Goose" and It's Linear Inference Work with Author Eugene Cheah

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

AI Talks | Understanding the mixture of the expert layer in Deep Learning | MBZUAI

AI Talks | Understanding the mixture of the expert layer in Deep Learning | MBZUAI

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun's $1B Bet Against LLMs [Part 2]

Yann LeCun's $1B Bet Against LLMs [Part 2]

MIT Just Revealed the AI Bubble's Fatal Flaw

MIT Just Revealed the AI Bubble's Fatal Flaw

Linus Torvalds: AI Is Changing Linux Fast

Linus Torvalds: AI Is Changing Linux Fast

What are Mixture of Experts (GPT4, Mixtral…)?

What are Mixture of Experts (GPT4, Mixtral…)?

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

How to Build an LLM from Scratch | An Overview

How to Build an LLM from Scratch | An Overview

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker