Johannes Kolbe – Escaping the Cloud: High-Performance AI in your Browser #bbuzz

More: https://2026.berlinbuzzwords.de/sessi... Speaker: Johannes Kolbe Server-side inference is the bottleneck of modern AI, creating costs and privacy hurdles. But what if the solution is scaling down to the browser? This session investigates Client-Side AI using WebGPU, ONNX Runtime, and Transformers.js. We’ll explore the reality of hardware access, model size, and the 2026 trade-offs of browser based execution. Server-side inference is the bottleneck of modern AI. It introduces network latency, creates massive operational costs, and forces complex privacy compliance. But what if we could push the compute entirely to the edge, specifically, the browser tab? This session explores the architecture of **Client-Side AI**, where the strategy is to distribute the workload to the user's own hardware. We will investigate the modern browser-based ML stack: The Runtime: How *ONNX Runtime* provides a near-native execution environment for models trained in PyTorch or TensorFlow. The Hardware Access: Leveraging *WebGPU* to unlock direct access to the client’s GPU, bypassing the limitations of legacy WebGL. The Pipeline: A technical look at optimizing transformer models (quantization, caching) for delivery over the wire using libraries like **Transformers.js**. But most of all, we will look at actual demos of LLMs, speech and computer vision models all running in the browser. We’ll be honest about the trade-offs: memory limits, model size constraints, and the reality of browser compatibility in 2026. Join us to see if the future of AI scaling is actually... no servers at all. ### Follow us on Social Media and join the Community! Mastodon: https://floss.social/@berlinbuzzwords LinkedIn:   / berlin-buzzwords   Website: https://berlinbuzzwords.de Mail: [email protected] Berlin Buzzwords is an event by Plain Schwarz – https://plainschwarz.com