Operating Apache Kafka Clusters 24/7 Without A Global Ops Team

Earlier this year, the Streaming PubSub team at Lyft got multiple Apache Kafka clusters ready to take on load that required 24/7 support. The team’s operational burden for Kafka quickly started heading towards burn-out territory. On-call rotations started getting miserable because we’d get woken up at night due to failing hosts. Business requirements kept coming and requiring us to scale the clusters further. The more we scaled, the more we’d get woken up.
Read more

Making long-term forecasts at Lyft

At Lyft, like many other companies, we need to make accurate short and long-term forecasts. Some of the metrics that we need to accurately predict are number of driver hours provided by drivers in different regions — i.e our supply side of the business — and also number of rides taken by riders in different regions, i.e. our demand. We have several internal tools that we use to make forecasts.
Read more