Cloud Performance Root Cause Analysis at Netflix • Brendan Gregg • YOW! 2018
This presentation was recorded at YOW! 2018. #GOTOcon #YOW https://yowcon.com Brendan Gregg - Industry Expert in Computing Performance & Cloud Computing @BrendanGregg RESOURCES https://x.com/brendangregg https://aus.social/@brendangregg / brendangregg https://github.com/brendangregg https://www.brendangregg.com https://www.brendangregg.com/blog/ind... ABSTRACT At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build to do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, flame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source tools in the same way to find performance wins in your own environment. Brendan Gregg is an industry expert in computing performance and cloud computing. He is a senior performance architect at Netflix, where he does performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He has also worked as a kernel engineer, and as a performance lead on storage and cloud products. Brendan has created performance analysis tools included in multiple operating systems, and visualizations and methodologies for performance analysis, including flame graphs. [...] RECOMMENDED BOOKS Brendan Gregg • Systems Performance • https://amzn.to/3SGCbM3 Brendan Gregg • BPF Performance Tools • https://amzn.to/3Dl8H0K Brendan Gregg • Systems Performance • https://amzn.to/3TAl9At Brendan Gregg & Jim Mauro • DTrace • https://amzn.to/3gPvJFm / gotocon / goto- / gotoconferences #Cloud #RootCauseAnalysis #Netflix #CloudNative #Programming #BrendanGregg #YOWcon Looking for a unique learning experience? Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech Sign up for updates and specials at https://gotopia.tech/newsletter SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily. https://www.youtube.com/user/GotoConf...

Visualizing Performance - The Developers’ Guide to Flame Graphs • Brendan Gregg • YOW! 2022

eBPF: Fueling New Flame Graphs & more • Brendan Gregg • YOW! 2022

Keynote at SCA26 - Torsten Hoefler: Ultra Ethernet for next generation AI and HPC workloads

SREcon16 - Performance Checklists for SREs

MVP: Why We Confuse Building to Learn with Building to Earn • Jeff Patton • YOW! 2018

Linux Performance Tools, Brendan Gregg, part 1 of 2

The Many Meanings of Event-Driven Architecture • Martin Fowler • GOTO 2017

"Performance Matters" by Emery Berger

Velocity 2017: Performance Analysis Superpowers with Linux eBPF

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

Kernel Recipes 2017 - Perf in Netflix - Brendan Gregg

Prioritizing Technical Debt as If Time & Money Matters • Adam Tornhill • GOTO 2022

Why Tech CEOs Are Quietly Cancelling Their AI Plans
![eBPF: Unlocking the Kernel [OFFICIAL DOCUMENTARY]](https://i.ytimg.com/vi/Wb_vD3XZYOA/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBxAuuCMJh_jEk7chBuiLFOR9oX5Q)
eBPF: Unlocking the Kernel [OFFICIAL DOCUMENTARY]
![AWS re:Invent 2019: [REPEAT 1] BPF performance analysis at Netflix (OPN303-R1)](https://i.ytimg.com/vi/16slh29iN1g/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCwiKYEha5C84_dpGQ0ACurT2SibA)
AWS re:Invent 2019: [REPEAT 1] BPF performance analysis at Netflix (OPN303-R1)

Large-Scale Architecture: The Unreasonable Effectiveness of Simplicity • Randy Shoup • YOW! 2022

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Introduction to Memory Management in Linux

3X Explore, Expand, Extract • Kent Beck • YOW! 2018

