Open Source Model Performance Optimization With SGLang - Yineng Zhang, Together AI

Open Source Model Performance Optimization With SGLang - Yineng Zhang, Together AI SGLang is an open-source fast inference framework in the PyTorch ecosystem built for performant, flexible, extensible model serving. SGLang's growing popularity is in large part thanks to its community ethos and the participation of developers from around the world. Join this BoF session hosted by SGLang core maintainer Yineng Zhang to discuss the future of SGLang and learn how to get involved in the project.

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten
▶︎

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)
▶︎

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)

Qwen 3.7 Plus: The Most Underrated AI Release Right Now
▶︎

Qwen 3.7 Plus: The Most Underrated AI Release Right Now

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

What is OpenClaw? Inside AI Agents, LLMs and the Agentic Loop
▶︎

What is OpenClaw? Inside AI Agents, LLMs and the Agentic Loop

🔍 AI Serving Frameworks Explained: vLLM vs TensorRT-LLM vs Ray Serve | Which One Should You Use?
▶︎

🔍 AI Serving Frameworks Explained: vLLM vs TensorRT-LLM vs Ray Serve | Which One Should You Use?

Inference Office Hours with SGLang: Performance Optimizations for LLM Serving
▶︎

Inference Office Hours with SGLang: Performance Optimizations for LLM Serving

Lianmin Zheng on Efficient LLM Inference with SGLang
▶︎

Lianmin Zheng on Efficient LLM Inference with SGLang

How to pick a GPU and Inference Engine?
▶︎

How to pick a GPU and Inference Engine?

PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI - Ke Wen & Chien-Chin Huang
▶︎

PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI - Ke Wen & Chien-Chin Huang

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
▶︎

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

EASIEST Way to Fine-Tune a LLM and Use It With Ollama
▶︎

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!
▶︎

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

torch.compile and Diffusers: A Hands-On Guide to Peak Performance - Sayak Paul, Hugging Face
▶︎

torch.compile and Diffusers: A Hands-On Guide to Peak Performance - Sayak Paul, Hugging Face

The Best Local Agentic Coding Workflow (Complete Guide)
▶︎

The Best Local Agentic Coding Workflow (Complete Guide)

🚗 BYD : The biggest SCAM of the car industry ?
▶︎

🚗 BYD : The biggest SCAM of the car industry ?

Lecture 35: SGLang
▶︎

Lecture 35: SGLang

SGLang Step by Step Beginner Tutorial
▶︎

SGLang Step by Step Beginner Tutorial

They Lied to You About AI (This Study Proves It)
▶︎

They Lied to You About AI (This Study Proves It)

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA
▶︎

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA