The premise behind autoscaling in AWS is simple: you can maximize your ability to handle load spikes and minimize costs if you automatically scale your application out based on metrics like CPU or memory utilization. If you need 100 Docker containers to support your load during the day but only 10 when load is lower at night, running 100 containers at all times means that you’re using 900% more capacity than you need every night. With a constant container count, you’re either spending more money than you need to most of the time or your service will likely fall over during a load spike.
At Segment, wereliably deliver hundreds of thousands of events per second tocloud-based destinations but we also routinely handle traffic spikes of up to 300% with no warning and while keeping our infrastructure costs reasonable. There are many possible causes for traffic spikes. A new Segment customer may instrument their high-volume website or app with Segment and turn it on at 3 AM.
A partner API may have a partial outage causing the time to process each event to skyrocket. Alternatively, a customer may experience an extreme traffic spike themselves, thereby passing on that traffic to Segment. Regardless of the cause, the results are similar: a fast increase in message volume higher than what the current running process count can handle.
To handle this variation in load, we use target-trackingAWS Application Autoscaling to automatically scale out (and in) the number of Docker containers and EC2 servers running in anElastic Container Service (ECS) cluster. Application Autoscaling is not a magic wand, however. In our experience, people new to target tracking autoscaling on AWS encounter three common surprises leading to slow scaling and giant AWS billing statements.
Source: segment.com