How we 30x’d our Node parallelism

What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service. The service was originally designed such that each worker would process only a single request at […]

Making the LinkedIn experimentation engine 20x faster

At LinkedIn, we like to say that experimentation is in our blood because no production release at the company happens without experimentation; by “experimentation,” we typically mean “A/B testing.” The company relies on employees to make decisions by analyzing data. Experimentation is a data-driven foundation of the decision-making process, which helps with measuring the precise […]

Lyft’s Journey through Mobile Networking

In 5 years, the number of endpoints consumed by Lyft’s mobile apps grew to over 500, and the size of our mobile engineering team increased by more than 15x. To scale with this growth, our infrastructure had to evolve dramatically to utilize new advances in modern networking in order to continue to provide benefits for […]

Database Migration To Amazon Aurora

In this blog post we’ll show you how we migrated a critical Postgres database with 18Tb of data from Amazon RDS (Relational Database Service) to Amazon Aurora, with minimal downtime. To do so, we’ll discuss our experience at Codacy. We chose Amazon’sAuroradatabase as a solution for a few key reasons including: 1) automatic storage growth […]

Automating Datacenter Operations at Dropbox

Switch provisioning at Dropbox is handled by a Pirlo component called the TOR Starter. The TOR Starter is responsible for validating and configuring switches in our datacenter server racks, PoP server racks, and at the different layers of our datacenter fabric that connect racks in the same facility together. Writing the TOR Starter on top […]

Kubernetes Failure Stories

I started to compile a list of public failure/horror stories related to Kubernetes. It should make it easier for people tasked with operations to find outage reports to learn from. Since we started with Kubernetes at Zalando in 2016, we collected many internal postmortems. Docker bugs (daemon unresponsive, process stuck in pipe wait, ..) were […]

The Many Faces of Envoy Proxy: Edge Gateway, Service Mesh, and Hybrid Networking Bridge

At the inaugural EnvoyCon in Seattle, USA, engineers from Pinterest, Yelp and Groupon presented their current use cases for the Envoy Proxy. The overarching message was that the Envoy Proxy appears to be moving closer to fulfilling its vision of providing the “universal [proxy] data plane API” for modern networking, including edge gateways, service meshes […]