End to End LLMOps with Kubeflow - J. George, G. Prabhu, A. Nagar & A. Raimule, K. Durai
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in London from April 1 - 4, 2025. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io End to End LLMOps with Kubeflow - Johnu George, Gavrish Prabhu, Ajay Nagar & Aishwarya Raimule, Nutanix; Krishna Durai, Meta In the newer world of generative AI models, enterprises bet on integrating large language models into their various business use cases. Due to the complex infrastructure requirements of large language models, building scalable optimized end-to-end GenAI pipelines connecting data and compute is not easy compared to traditional machine learning models. Cluster admins need better visibility into infrastructure to ensure the best utilization of cluster resources, including expensive accelerators. In contrast, data scientists need a clean Pythonic interface without exposure to any underlying stack details. In this talk, we will cover how the Kubeflow Platform helps in LLMOps journey from training an LLM on the custom dataset to fine-tuning the pipeline for the best results and, finally, deployment of the trained models at scale. We will discuss an optimized Kubernetes native ML reference stack for your LLM needs that provides maximum infra utilization.

Build your machine learning pipeline with Kubeflow

Kubeflow Training - AI Day SF 2025

Ep 1 - Post-Meeting Deconstruction: Hardening Java Apps with Docker, GHEC, & TLS 1.2

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

End-to-End ML with Cloudera Machine Learning

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Kubeflow Explained for Beginners

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

Kubeflow Ecosystem: What’s Next for Cloud Native AI/ML and LLMOps

RAG vs. CAG: Solving Knowledge Gaps in AI Models

KubeFlow Pipelines Zero to Hero with a Realtime MLOps Project

From Canary To Global: Unified Progressive Delivery for Hybrid Cloud With... Zhuang Zhang & Ryan Wu

End-to-End MLOps with MLflow and Kubeflow - Nick Chase, CloudGeometry

Exploring MLOps and LLMOps: Architectures and Best Practices

Google & AWS Veteran: What Top Tier Software Architects Actually Do

Demystifying Argo Workflows: An Architectural Deep Dive - Darko Janjić, Pipekit & Becky Pauley

How Instagram Scaled Postgres to 2 Billion Users

A SRE’s Guide to LLMOps: Deploying and Managing AI/ML Workloads using Kubernetes

LLMOps: Everything You Need to Know to Manage LLMs

