What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service. The service was originally designed such that each worker would process only a single request at a time. This design lessened the impact of integrations that accidentally blocked the event loop, and allowed us to ignore the variability in resource usage across different integrations. But since our total capacity was capped at 4,000 concurrent requests, the system did not gracefully scale. Most requests were network-bound, so we could improve our capacity and costs if we could just figure out how to increase parallelism safely. In our research, we couldn’t find a good playbook for going from ‘no parallelism’ to ‘lots of parallelism’ in a Node service. So we put together our own plan, which relied on careful planning, good tooling and observability, and a healthy dose of debugging. In the end, we were able to 30x our parallelism, which equated to a cost savings of about $300k annually. This post will outline how we increased the performance and efficiency of our Node workers and describe the lessons that we learned in the process.