I’ve given quite a few talks about observability in the age of the service mesh (most recent slides, unfortunately this talk series has not been recorded yet). Visibility into the inherently unstable network is one of the most important thing that Envoy provides and I’m asked repeatedly for the source of the dashboards that we use at Lyft. In the interest of “shipping” and getting something out there that can help folks, we are releasing a snapshot of our internal Envoy dashboards.
What we are releasing is unfortunately not going to be readily consumable. It is also not an OSS project that will be maintained in any way. The goal is to provide a snapshot of what Lyft does internally (what is on each dashboard, what stats do we look at, etc.).
Our hope is having that as a reference will be useful in developing new dashboards for your organization. In order to provide some context into what is being shared, I will very briefly describe Lyft’s observability and dashboard stack. All Envoys write stats in statsd format.
We run statsrelay on each host. All of our stats are funneled to a pre-aggregation pipeline. The pre-aggregation pipeline ultimately writes stats out to Wavefront.
Developers at Lyft look at dashboards in Grafana (we have a Wavefront plugin that pulls TSD).All dashboards at Lyft are created from SaltStack code (the Grafana SaltStack module is an approximation of what we use internally).We pre-generate dashboards for every service and also allow developers to add custom rows for business logic, etc.