Datadog on Site Reliability Engineering
There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business. In this episode of Datadog on, join Staff Engineers Laura de Vesine and Rick Mangi to hear how Datadog’s approach to SRE has changed with scale and experience. Their unique backgrounds and roles – Rick is embedded on a team building an internal platform, while Laura works across multiple teams on a variety of projects – will highlight some of the different methodologies and how we use them. You’ll learn how Datadog approaches technical debt and legacy systems, some key differences between SRE for startups versus larger companies, how to get buy-in for SRE practices at an organizational level, and more.

Olivier Pomel (Datadog): Reimagining observability for the AI-powered enterprise

Introduction to Site Reliability Engineering | Raghav | Site Reliability Engineer at Booking.com

Datadog on the Lifecycle of Threats and Vulnerabilities

When To Actually Shard vs When Distributed SQL Wins

Datadog on Kubernetes Monitoring

Ensuring Reliability with SLOs with Datadog & Google Cloud

Getting Started with SRE - Stephen Thorne, Google

What is SRE | Tasks and Responsibilities of an SRE | SRE vs DevOps

Datadog on Incident Management

Implementing SRE practices: SLI/SLO deep dive - David Blank Edelman - DevOpsDays Tel Aviv 2018

Database Reliability Engineering - the new DBA?

What is Site Reliability Engineering (SRE)?

How to become an SRE (and why you should) with Henri Devieux

DASH by Datadog 2025 Keynote

AI at Datadog: Monitoring machines in the age of LLMs | Olivier Pomel, CEO of Datadog
![[Tech Talk] SRE (Site Reliability Engineering) Virtual Lunch and Learn](https://i.ytimg.com/vi/j6zB7emiobY/hq720.jpg?sqp=-oaymwEbCNAFEJQDSFryq4qpAw0IARUAAIhCGAG4AvcY&rs=AOn4CLA7Jci9ZI1A955FIoVhSf1p9-IsSw&usqp=CCc)
[Tech Talk] SRE (Site Reliability Engineering) Virtual Lunch and Learn

How HashiCorp Implements SRE

Datadog Tutorials | Alerting and Monitoring

Modern Architecture 101 for New Engineers & Forgetful Experts - Jerry Nixon - NDC Copenhagen 2025

