Many Databases 1 LSM Engine - OpenData
The episode explores why modern databases keep reinventing the same distributed-systems machinery and argues that a major part of database cost is the operational tax of running replication-heavy systems. Our guest, Almog Gavra, co-founder of Responsive, explains how his team pivoted from operating Kafka Streams as a service to building SlateDB and the “Open Data” manifesto: an object-storage-native LSM foundation that can power multiple database types (vector, time series, logs, key-value) with shared tuning knobs and failure modes. They discuss why distributed-systems complexity is often harder than query engines, how LSM trees provide a tunable tradeoff between read/write/space amplification, caching layers and cost transparency, separating readers/writers, stateless ingest, single-writer availability and fencing via S3 compare-and-set, offloading compaction, and how the architecture enables near-free snapshots. They also cover when this approach doesn’t fit: OLTP that can stay on Postgres and ultra-low-latency workloads where cold object-store misses are unacceptable. Chapters: 00:00 Introduction 08:36 Open Data Manifesto 18:34 Specialized vs General 25:10 SlateDB Architecture 32:51 LSM Trees as Tuning Dial 38:58 Tuning Without Overload 39:46 Cost Aware Config Knobs 41:51 Latency Cost Durability Tradeoffs 46:46 Caching Strategies And Layers 50:23 Split Readers And Writers 52:43 Single Writer Versus Multi Writer 55:16 Scaling And Partitioning Writes 58:58 Failure Modes And Fencing 01:05:23 Compaction As Separate Worker 01:09:28 Snapshots And Garbage Collection 01:10:25 When Open Data Is Not Fit Important links and references: OpenData: http://github.com/opendata-oss/opendata OpenData manifesto: https://www.opendata.dev/blog/manifesto Reach out to Almog: / agavra or https://x.com/almoggavra Dostovesky paper on LSM: https://nivdayan.github.io/dostoevsky... Latency/Cost/Durability Triad: https://materializedview.io/p/cloud-s... SlateDB: https://github.com/slatedb/slatedb "how SSTs work": https://www.bitsxpages.com/p/sorted-s... For memberships: join this channel as a member here: / @thegeeknarrator Don't forget to like, share, and subscribe for more insights! ============================================================================= Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=... ============================================================================= Database internals series: • Write-ahead-logging Popular playlists: Realtime streaming systems: • Realtime Streaming Systems Software Engineering: • Software Engineering Distributed systems and databases: • Distributed Systems and Databases Modern databases: • Modern Databases Stay Curios! Keep Learning!

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

How Convex Works - A Technical Deep Dive

How Does Apache Kafka Scale to Millions of Messages/sec?

Will Turso Be The Better SQLite? (with Glauber Costa)

EloqData - The Converged database

CockroachDB: Architecture of a Geo-Distributed SQL Database | Cockroach Labs

How to Lose a Global AI Monopoly in One Afternoon

Let’s Handle 1 Million Requests per Second, It’s Scarier Than You Think!

Part 1 - Database Internals with Franck Pachot

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Streitgespräch: Ich stelle Hauptstadt-Journalistin zur Rede

Casey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025

Anthropic is Completely F*cked.

DistributedSQL and CockroachDB with Jim Walker

18 Months of Pgvector Learnings in 47 Minutes (Tutorial)

Super-KI? Die große Lüge der Tech-Konzerne

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Database Technology: What’s Actually Changing and Why It Matters (with Ben Stopford)

Designing Data-intensive Applications with Martin Kleppmann

