An in-depth look at 100% Zero Downtime deployments with Terraform

Posted on
news

At Checkly, we run our browser checks on AWS EC2 instances managed by Terraform. When shipping a new version, we don’t want to interrupt our service, so we need zero downtime deployments. Hashicorp has their own write up on zero downtime upgrades, but it only introduces the Terraform configuration without a lot of context, workflow or other details that are needed to actually make this work in real life™.

This is the full lowdown of how we do it in production for ~1.5 million Chrome-based browser checks since launch. For those less initiated into “infra structure as code” and “immutable infrastructure” let’s look at the problem a bit closer. You will see that you have to build your app in a specific way and have some specific middleware (i.e. queues) in place to benefit from this approach.

Skip this if you are a grizzled veteran. You can chop this problem a bunch of parts. Some are Terraform related, some are not, but they all need to be in place before you can pull this off without annoying your users.

Source: checklyhq.com