Nathan Bronson: Improving RocksDB Write Scalability

Read the full blog post here - https://www.heavybit.com/library/blog... Nathan Bronson has been an engineer at Facebook for 5 years, most notably on the TAO cache. He received a PhD from Stanford for work on better programming models for single-machine concurrency. He's not a database expert, but the code he helped write ran a billion database queries while you were reading this bio. Currently he's working in Facebook's Boston office on a new interface to the social graph with stronger consistency primitives. RocksDB's architecture is highly concurrent for reads, but not for writes. When there are concurrent writers, their work is grouped together and applied by a single thread. This makes it easy to batch log writes, keeps the write path simple and reliable, and is sufficient for many workloads. Unfortunately, it also severely limits write scalability. In this talk Nathan will dig into a series of changes he made to rocksdb to tackle the scalability problem with minimal impact on the core write logic. These changes allow writing threads to join a write group without waiting for the main DB mutex, reduce the cost of waiting for write group leader, and allow many threads to simultaneously update the memtable's lock-free skip list. The end result is useful (not perfect) write scalability, getting a 3X improvement in peak insert rate with sync on commit disabled and 2X improvement with it enabled. For more developer focused content, visit https://www.heavybit.com/library