Many Databases 1 LSM Engine - OpenData

The episode explores why modern databases keep reinventing the same distributed-systems machinery and argues that a major part of database cost is the operational tax of running replication-heavy systems. Our guest, Almog Gavra, co-founder of Responsive, explains how his team pivoted from operating Kafka Streams as a service to building SlateDB and the “Open Data” manifesto: an object-storage-native LSM foundation that can power multiple database types (vector, time series, logs, key-value) with shared tuning knobs and failure modes. They discuss why distributed-systems complexity is often harder than query engines, how LSM trees provide a tunable tradeoff between read/write/space amplification, caching layers and cost transparency, separating readers/writers, stateless ingest, single-writer availability and fencing via S3 compare-and-set, offloading compaction, and how the architecture enables near-free snapshots. They also cover when this approach doesn’t fit: OLTP that can stay on Postgres and ultra-low-latency workloads where cold object-store misses are unacceptable. Chapters: 00:00 Introduction 08:36 Open Data Manifesto 18:34 Specialized vs General 25:10 SlateDB Architecture 32:51 LSM Trees as Tuning Dial 38:58 Tuning Without Overload 39:46 Cost Aware Config Knobs 41:51 Latency Cost Durability Tradeoffs 46:46 Caching Strategies And Layers 50:23 Split Readers And Writers 52:43 Single Writer Versus Multi Writer 55:16 Scaling And Partitioning Writes 58:58 Failure Modes And Fencing 01:05:23 Compaction As Separate Worker 01:09:28 Snapshots And Garbage Collection 01:10:25 When Open Data Is Not Fit Important links and references: OpenData: http://github.com/opendata-oss/opendata OpenData manifesto: https://www.opendata.dev/blog/manifesto Reach out to Almog:   / agavra   or https://x.com/almoggavra Dostovesky paper on LSM: https://nivdayan.github.io/dostoevsky... Latency/Cost/Durability Triad: https://materializedview.io/p/cloud-s... SlateDB: https://github.com/slatedb/slatedb "how SSTs work": https://www.bitsxpages.com/p/sorted-s... For memberships: join this channel as a member here:    / @thegeeknarrator   Don't forget to like, share, and subscribe for more insights! ============================================================================= Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=... ============================================================================= Database internals series:    • Write-ahead-logging   Popular playlists: Realtime streaming systems:    • Realtime Streaming Systems   Software Engineering:    • Software Engineering   Distributed systems and databases:    • Distributed Systems and Databases   Modern databases:    • Modern Databases   Stay Curios! Keep Learning!