Reddit’s engineering team and product complexity has seen significant growth over the last three years. Facilitating that growth has taken a lot of behind-the-scenes evolution of Reddit’s backend infrastructure. One major component has been adopting a service-oriented architecture, and a significant facet of that has been evolving service-to-service discovery and communication.
As the number of services has grown, so has the complexity in how they interact with each other and legacy systems. Instead of debugging function and module calls within a monolithic application, engineers now need insight about RPCs among multiple services. Instead of focusing on common problems like exception handling and bad input, engineers also have to consider client request behaviors and defend appropriately with retry-handling, circuit-breaking and granular route control.
Recently, we rolled out Envoy as our service-to-service L4/L7 proxy as part of our efforts to address these new and ever-growing needs for developing and maintaining stable production services. In this blog, we’ll provide insight into Reddit’s service communication beginnings, why and how we chose Envoy as well as how we approached and managed deployment of the tool given our infrastructure constraints.