We briefly touched on some of the load balancing improvements we’ve recently been making in our Open Sourcing Zuul 2 post. In this post, we’ll go into more detail on the whys, hows and results of that work. On the Netflix Cloud Gateway team we are always working to help systems reduce errors, gain higher availability, and improve Netflix’s resilience to failures.
We do this because even a low rate of errors at our scale of over a million requests per second can degrade the experience for our members, so every little bit helps. So we set out to take our learnings from Zuul, plus those from other teams, and improve our load balancing implementation to further reduce errors caused by overloaded servers.