SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 5: Speculative Decoding

Welcome back to the EXD! Last week we took a deeper look at inference benchmarking with Llama-benchy. For example, we learned about how overall token generation can speed up under concurrent loads. This week we look at speculative decoding aka Multi-Token Prediction or MTP. Speculative decoding is a rather clever way of better utilizing your compute resources in the decode pass. Today we will just show that it actually does work, and in a future episode when we introduce LLM architectures we can understand why it works. My name is Ram, I work at the Ethereum Foundation on AI ops, and this is an open learning log that I call the EXD. Episode 1: • SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 1: What... Github: https://github.com/Ramshreyas/EXD

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 6: Taking Stock

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 6: Taking Stock

Recursive Self-Improvement

Recursive Self-Improvement

Are we really doing this again

Are we really doing this again

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 4: Inference Benchmarking continued

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 4: Inference Benchmarking continued

Why Aliens Would NEVER Invade Africa

Why Aliens Would NEVER Invade Africa

She Asks if I Know Coldplay and This Singer Shocks The Street

She Asks if I Know Coldplay and This Singer Shocks The Street

What Nobody Tells You About Being a Quant

What Nobody Tells You About Being a Quant

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

You Know This Song (but the Orchestra Doesn’t) | Jacob Collier & VSO School of Music Orchestra | TED

You Know This Song (but the Orchestra Doesn’t) | Jacob Collier & VSO School of Music Orchestra | TED

Gemma4 12B Coder - Composer 2.5 × Fable 5 v2 vs base - 16GB Local LLM setup

Gemma4 12B Coder - Composer 2.5 × Fable 5 v2 vs base - 16GB Local LLM setup

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

The problem with using AI to help you learn (programming, and in general)

The problem with using AI to help you learn (programming, and in general)

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Anthropic is Completely F*cked.

Anthropic is Completely F*cked.

Why AI Agents are either the best or worst thing we’ve ever built

Why AI Agents are either the best or worst thing we’ve ever built

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

The Moment That Changed Software Development!

The Moment That Changed Software Development!

bounce + bounce = no bounce

bounce + bounce = no bounce

How I animate 3Blue1Brown | A Manim demo with Ben Sparks

How I animate 3Blue1Brown | A Manim demo with Ben Sparks