Serving 100µs reads with 100% availability

This is the story of how we built ctlstore, a distributed multi-tenant data store that features effectively infinite read scalability, serves queries in 100µs, and can withstand the failure of any component. Highly-reliable systems need highly-reliable data sources. Segment’s stream processing pipeline is no different. Pipeline components need not only the data that they process, but additional control data that specifies how the data is to be processed. End users configure some settings in a UI or via our API which in turn this manipulates the behavior of the pipeline. In the initial design of Segment, the stream processing pipeline was tightly coupled to the control plane. Stream processors would directly query a set of control plane services to pull in data that directs their work. While redundancy generally kept these systems online, it wasn’t the 5-9s system we are aiming for. A common failure mode was a stampede of traffic from cold caches or code that didn’t cache at all. It was easy for developers to do the wrong thing, and we wanted to make it easy to do the right thing.

Source: segment

Leave a Reply