Many new gRPC users are surprised to find that Kubernetes’s default load balancing often doesn’t work out of the box with gRPC. For example, here’s what happens when you take a simple gRPC Node.js microservices app and deploy it on Kubernetes: While the voting service displayed here has several pods, it’s clear from Kubernetes’s CPU graphs that only one of the pods is actually doing any work—because only one of the pods is receiving any traffic.
Why? In this blog post, we describe why this happens, and how you can easily fix it by adding gRPC load balancing to any Kubernetes app with Linkerd, a CNCF service mesh and service sidecar. Why does gRPC need special load balancing?
First, let’s understand why we need to do something special for gRPC. However, gRPC also breaks the standard connection-level load balancing, including what’s provided by Kubernetes. This is because gRPC is built on HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection, across which all requests are multiplexed—meaning multiple requests can be active on the same connection at any point in time.
Normally, this is great, as it reduces the overhead of connection management. However, it also means that (as you might imagine) connection-level balancing isn’t very useful. Once the connection is established, there’s no more balancing to be done.
All requests will get pinned to a single destination pod, as shown below:
Source: kubernetes.io