Many Databases 1 LSM Engine - OpenData

The episode explores why modern databases keep reinventing the same distributed-systems machinery and argues that a major part of database cost is the operational tax of running replication-heavy systems. Our guest, Almog Gavra, co-founder of Responsive, explains how his team pivoted from operating Kafka Streams as a service to building SlateDB and the “Open Data” manifesto: an object-storage-native LSM foundation that can power multiple database types (vector, time series, logs, key-value) with shared tuning knobs and failure modes. They discuss why distributed-systems complexity is often harder than query engines, how LSM trees provide a tunable tradeoff between read/write/space amplification, caching layers and cost transparency, separating readers/writers, stateless ingest, single-writer availability and fencing via S3 compare-and-set, offloading compaction, and how the architecture enables near-free snapshots. They also cover when this approach doesn’t fit: OLTP that can stay on Postgres and ultra-low-latency workloads where cold object-store misses are unacceptable. Chapters: 00:00 Introduction 08:36 Open Data Manifesto 18:34 Specialized vs General 25:10 SlateDB Architecture 32:51 LSM Trees as Tuning Dial 38:58 Tuning Without Overload 39:46 Cost Aware Config Knobs 41:51 Latency Cost Durability Tradeoffs 46:46 Caching Strategies And Layers 50:23 Split Readers And Writers 52:43 Single Writer Versus Multi Writer 55:16 Scaling And Partitioning Writes 58:58 Failure Modes And Fencing 01:05:23 Compaction As Separate Worker 01:09:28 Snapshots And Garbage Collection 01:10:25 When Open Data Is Not Fit Important links and references: OpenData: http://github.com/opendata-oss/opendata OpenData manifesto: https://www.opendata.dev/blog/manifesto Reach out to Almog: / agavra or https://x.com/almoggavra Dostovesky paper on LSM: https://nivdayan.github.io/dostoevsky... Latency/Cost/Durability Triad: https://materializedview.io/p/cloud-s... SlateDB: https://github.com/slatedb/slatedb "how SSTs work": https://www.bitsxpages.com/p/sorted-s... For memberships: join this channel as a member here: / @thegeeknarrator Don't forget to like, share, and subscribe for more insights! ============================================================================= Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=... ============================================================================= Database internals series: • Write-ahead-logging Popular playlists: Realtime streaming systems: • Realtime Streaming Systems Software Engineering: • Software Engineering Distributed systems and databases: • Distributed Systems and Databases Modern databases: • Modern Databases Stay Curios! Keep Learning!

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

How Convex Works - A Technical Deep Dive

How Convex Works - A Technical Deep Dive

How Does Apache Kafka Scale to Millions of Messages/sec?

How Does Apache Kafka Scale to Millions of Messages/sec?

Will Turso Be The Better SQLite? (with Glauber Costa)

Will Turso Be The Better SQLite? (with Glauber Costa)

EloqData - The Converged database

EloqData - The Converged database

CockroachDB: Architecture of a Geo-Distributed SQL Database | Cockroach Labs

CockroachDB: Architecture of a Geo-Distributed SQL Database | Cockroach Labs

How to Lose a Global AI Monopoly in One Afternoon

How to Lose a Global AI Monopoly in One Afternoon

Let’s Handle 1 Million Requests per Second, It’s Scarier Than You Think!

Let’s Handle 1 Million Requests per Second, It’s Scarier Than You Think!

Part 1 - Database Internals with Franck Pachot

Part 1 - Database Internals with Franck Pachot

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Streitgespräch: Ich stelle Hauptstadt-Journalistin zur Rede

Streitgespräch: Ich stelle Hauptstadt-Journalistin zur Rede

Casey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025

Casey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025

Anthropic is Completely F*cked.

Anthropic is Completely F*cked.

DistributedSQL and CockroachDB with Jim Walker

DistributedSQL and CockroachDB with Jim Walker

18 Months of Pgvector Learnings in 47 Minutes (Tutorial)

18 Months of Pgvector Learnings in 47 Minutes (Tutorial)

Super-KI? Die große Lüge der Tech-Konzerne

Super-KI? Die große Lüge der Tech-Konzerne

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Database Technology: What’s Actually Changing and Why It Matters (with Ben Stopford)

Database Technology: What’s Actually Changing and Why It Matters (with Ben Stopford)

Designing Data-intensive Applications with Martin Kleppmann

Designing Data-intensive Applications with Martin Kleppmann

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan