SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 4: Inference Benchmarking continued

Welcome back to the EXD. Last week we took our first look at LLM inference using vLLM. More specifically, we learned about the prefill and decode phases of an inference pass and how their performance characteristics differ. This week we’ll dig a little deeper and make our first attempt to tune vLLM for better performance. My name is Ram, and I work at the Ethereum Foundation on internal AI ops, and this is an open learning log for what I call the EXD. Episode 01: • SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 1: What... EXD: github.com/Ramshreyas/EXD Llama-benchy: https://github.com/eugr/llama-benchy

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 5: Speculative Decoding

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 5: Speculative Decoding

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Why Aliens Would NEVER Invade Africa

Why Aliens Would NEVER Invade Africa

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

SELF-DIRECTED P̶h̶D̶ EXD in AIEp. 8: Tokenization Deep Dive - Part 2

SELF-DIRECTED P̶h̶D̶ EXD in AIEp. 8: Tokenization Deep Dive - Part 2

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

But what is quantum computing? (Grover's Algorithm)

But what is quantum computing? (Grover's Algorithm)

How ASML Makes Chips Faster With Its New $400 Million High NA Machine

How ASML Makes Chips Faster With Its New $400 Million High NA Machine

The insane engineering of Deepseek V4

The insane engineering of Deepseek V4

Taiwan's DRAM Failure

Taiwan's DRAM Failure

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

But what is the Fourier Transform? A visual introduction.

But what is the Fourier Transform? A visual introduction.

How to Get and Evaluate Startup Ideas | Startup School

How to Get and Evaluate Startup Ideas | Startup School

The Passage of Time and the Meaning of Life | Sean Carroll

The Passage of Time and the Meaning of Life | Sean Carroll

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 1: What is the EXD?

SELF-DIRECTED P̶h̶D̶ EXD in AI Ep. 1: What is the EXD?