AWS and other interesting stuff

Auto Scaling

Auto Scaling

  • Auto Scaling Group: a logical unit of EC2 instances.
  • Scaling Plan: how and when to scale.
    • Dynamic: scale based on conditions
    • Scheduled: scale based on time

Things to consider:

  • How long it takes to launch and configure instances
  • What metrics have to the most relevance to performance: is your application CPU or memory bound?
  • How many AZs you want to use
  • The role you want ASG to forefill: Expand/contract capacity? Ensure a certain number are running? Both?
  • How do you test new launch configurations?
  • How do you deploy new launch configurations while phasing out the old ones?

  • Auto Scaling Groups can not span regions

  • Any EIP / EBS volumes must be manually associated / attached e.g. via API

    • Auto Scaling is intended to be for “cattle”, so using a Load Balancer is the best option.


  • Fault tolerance (self healing): handles instance and AZ failures
    • Create a low cost, self healing, immutable infrastructure
    • No additional software to install or configure
    • Keep servers running and highly available without user interaction
    • Good for: Important servers you need to stay online, but you only require one of e.g. Bastion/NAT/OpenVPN
  • Availability: helps you have the right number of servers at the right time
    • few servers for low load times and lots of servers for high load times
  • Cost management: only have instances when required


  • Launch configurations: 100
  • Auto Scaling Groups: 20
  • Lifecycle hooks per Auto Scaling Group: 50
  • Load balancers per Auto Scaling Group: 50, 10 attached
  • Step adjustments per scaling policy: 20

Instance States

The 4 main states for instances are:

  • Pending
  • InService
  • Terminating
  • Terminated

Detached Versus Standby

You can detach an instance or put it into standby

Detach / Attach

The instance is removed from the Auto Scaling Group and its load balancer.


  • Move an instance out of one Auto Scaling Group and into another.
  • Attach instances running your application to an Auto Scaling Group for testing, then remove them once done.

When you detach instances, you have the option of decrementing the desired capacity for the Auto Scaling group by the number of instances being detached. If you choose not to decrement the capacity, Auto Scaling launches new instances to replace the ones that you detached.

Attaching / Detaching can only be done if:

  • Instance is in the running state.
  • AMI used to launch the instance must still exist.
  • Instance is not a member of another Auto Scaling group.
  • Instance is in the same Availability Zone as the Auto Scaling group.
  • If the Auto Scaling group is associated with a load balancer, the instance and the load balancer must both be in the same VPC.


Instances that on standby are still part of the Auto Scaling Group, but they do not actively handle application traffic.


  • If you update the launch configuration existing machines won’t be updated. You can put each in standby, do the update manually, then exit standby.
  • If there is a problem with on of the instances you can put it into standby then troubleshoot.


  • The instance is removed from the associated load balancer.
  • By default, Auto Scaling decrements the desired capacity when you put a machine in standby so that another instance isn’t created.
  • When a machine exits standby, Auto Scaling increments the desired capacity. If you chose not to decrement the desired capacity the desired capacity will now be 1 more than what it was originally.
  • The instance is added to the associated load balancer.


  • health checks are not done on standby instances; it continues to report its pre-standby state health.

Lifecycle Hooks

You can attach lifecycle hooks to execute code when pending or terminating states are entered.

$ aws autoscaling put-lifecycle-hook --auto-scaling-group-name test-auto-scaling-group \
  --lifecycle-hook-name test-asg-scale-out-hook \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --notification-target-arn arn:aws:sns:ap-southeast-2:<REDACTED>:auto-scaling-sns-topic \
  --role-arn arn:aws:iam::<REDACTED>:role/ASG-SNS-Role

The launching transition is called: EC2_INSTANCE_LAUNCHING The terminating transition is called: EC2_INSTANCE_TERMINATING

You can then perform an action e.g.

  • configure server on launch
  • archive logs on terminate

If there is a hook, autoscaling will block in Pending:Wait or Terminating:Wait states until you notify with either an ABANDON / CONTINUE or the wait times out. You need to supply the lifecycle-action-token that the hook notification target (SNS above) received:

$ aws autoscaling complete-lifecycle-action \
  --auto-scaling-group-name test-auto-scaling-group \
  --lifecycle-hook-name test-asg-scale-out-hook --lifecycle-action-result CONTINUE \
  --lifecycle-action-token 20f34ccc-c3fc-4ec3-8ed3-914058dc7aae

By default, Auto Scaling will wait for 1 hour before changing the state to Pending:Proceed. You can set a specific value for this using the --heartbeat-timeout option to set a timeout in seconds.

You can extend this for up to 48 hours by recording a heartbeat with:

$ aws autoscaling record-lifecycle-action-heartbeat

The most important functions are:

  • PutLifecycleHook – Create or update a lifecycle hook for an Auto Scaling Group. Call this function to create a hook that acts when instances launch or terminate.
  • CompleteLifecycleAction – Signify completion of a lifecycle action for a lifecycle hook. Call this function when your hook has successfully set or up decommissioned an instance.
  • RecordLifecycleActionHeartbeat – Record a heartbeat for a lifecycle action. Call this function to extend the timeout for a lifecycle action.

The notification-target-arn is optional. Instead you can use CloudWatch events to trigger a Lambda function:

Events don’t get sent unless a hook is configured:

$ aws autoscaling put-lifecycle-hook --auto-scaling-group-name MyASG --lifecycle-hook-name MyASGHook --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING

The event JSON looks like this:

{"version":"0","id":"5f6c4853-5125-4ff6-8017-a8ed6c1ab90b","detail-type":"EC2 Instance-launch Lifecycle Action","source":"aws.autoscaling","account":"<REDACTED>","time":"2017-02-07T18:46:23Z","region":"ap-southeast-2","resources":["arn:aws:autoscaling:ap-southeast-2:<REDACTED>:autoScalingGroup:ed3f9f5c-b319-4d88-bdba-7d911b834d74:autoScalingGroupName/MyASG"],"detail":{"LifecycleActionToken":"69f5d741-9c24-455a-b04d-478de8f0ff35","AutoScalingGroupName":"MyASG","LifecycleHookName":"MyASGHook","EC2InstanceId":"i-0ef1399616fa1eaf2","LifecycleTransition":"autoscaling:EC2_INSTANCE_LAUNCHING"}}

The instance lifecycle will be stuck in Pending:Wait until the action is completed:

aws autoscaling complete-lifecycle-action \
  --auto-scaling-group-name MyASG \
  --lifecycle-hook-name MyASGHook --lifecycle-action-result CONTINUE \
  --lifecycle-action-token 69f5d741-9c24-455a-b04d-478de8f0ff35

Scaling Policies


  • Cooldowns help ensure the Auto Scaling Group doesn’t launch/terminate more instances than needed
  • The default cooldown period is 300 seconds
  • The cooldown period starts when the instance reaches InService state. Before that time, Auto Scaling actions are blocked anyway as a scaling action is in progress.
  • Cooldowns are not supported for step scaling policies
  • You set a default cooldown, and you can then set per-scaling-policy cooldowns.

You’d typically have a shorter cooldown with a scale-in policy as Auto Scaling needs less time to determine whether to terminate instances or not.

If an instance becomes unhealthy, Auto Scaling does not wait for the cooldown period to complete before replacing the unhealthy instance.

Step Adjustments

Suppose that you have an alarm with a breach threshold of 50 and a scaling adjustment type of PercentChangeInCapacity. You also have scale out and scale in policies with the following step adjustments:


  • The lower and upper bounds relate to the alarm breach threshold, in this case 50
    • i.e. lower bound 0, upper bound 10 means between 50 (inclusive) and 60 (exclusive)
    • i.e. lower bound 10, upper bound 0 means between 40 (exclusive) and 50 (inclusive)
    • The net effect of the above is no adjustment between 40 and 60
  • Bounds are used so that you can change the alarm breach value and the step adjustment thresholds will change with it e.g. change a breach to be 60% and the first step will become >=60 and <70.

Your group has both a current capacity and a desired capacity of 10 instances. The group maintains its current and desired capacity while the aggregated metric value is greater than 40 and less than 60.

If the metric value gets to 60, Auto Scaling increases the desired capacity of the group by 1 instance, to 11 instances, based on the second step adjustment of the scale-out policy (add 10 percent of 10 instances). After the new instance is running and its specified warm-up time has expired, Auto Scaling increases the current capacity of the group to 11 instances. If the metric value rises to 70 even after this increase in capacity, Auto Scaling increases the desired capacity of the group by another 3 instances, to 14 instances, based on the third step adjustment of the scale-out policy (add 30 percent of 11 instances, 3.3 instances, rounded down to 3 instances).

If the metric value gets to 40, Auto Scaling decreases the desired capacity of the group by 1 instance, to 13 instances, based on the second step adjustment of the scale-in policy (remove 10 percent of 14 instances, 1.4 instances, rounded down to 1 instance). If the metric value falls to 30 even after this decrease in capacity, Auto Scaling decreases the desired capacity of the group by another 3 instances, to 10 instances, based on the third step adjustment of the scale-in policy (remove 30 percent of 13 instances, 3.9 instances, rounded down to 3 instances).

Instance Warmup

With step scaling policies, you can specify the number of seconds that it takes for a newly launched instance to warm up. Until its specified warm-up time has expired, an instance is not counted toward the aggregated metrics of the Auto Scaling group.

While scaling out, Auto Scaling does not consider instances that are warming up as part of the current capacity of the group. i.e. the percentage is calculated against the current capacity, so it excludes any instance warming up. Therefore, multiple alarm breaches that fall in the range of the same step adjustment result in a single scaling activity. This ensures that we don’t add more instances than you need. Using the example in the previous section, suppose that the metric gets to 60, and then it gets to 62 while the new instance is still warming up. The current capacity is still 10 instances, so Auto Scaling should add 1 instance (10 percent of 10 instances), but the desired capacity of the group is already 11 instances, so Auto Scaling does not increase the desired capacity further. However, if the metric gets to 70 while the new instance is still warming up, Auto Scaling should add 3 instances (30 percent of 10 instances), but the desired capacity of the group is already 11, so Auto Scaling adds only 2 instances, for a new desired capacity of 13 instances.

While scaling in, Auto Scaling considers instances that are terminating as part of the current capacity of the group. Therefore, we won’t remove more instances from the Auto Scaling group than necessary.

Note that a scale in activity can’t start while a scale out activity is in progress.

Spot Instances

  • You can use lifecycle hooks with Spot Instances, but it does not prevent an instance from terminating due to a change in the Spot Price
  • When a Spot Instance terminates, you must still complete the lifecycle action

Launch Configuration

A template for a ASGs EC2 instances that specifies the AMI, keypair, instance type, security groups and block device mappings.

  • An ASG must have one Launch Configuration.
  • A LC can be used by multiple ASG.
  • You can’t modify a LC after you’ve created it.
  • When you apply a new LC new instances will use it and old ones will use the old one.
  • You can create LC (or ASG) from the settings of a running EC2 instance.
    • Both create-auto-scaling-group and create-launch-configuration have a --instance-id property
    • Some properties may not be supported
    • It uses the block device mapping from the AMI you used, ignoring any other devices
    • You can override values
      • AMI, Block devices, Key Pair, Instance Profile, Instance type, Kernel, Monitoring, Placement Tenancy, Ramdisk, Security Groups, Spot Price, User Data, Assign Public IP, EBS optimized
  • You can use Spot Instances
    • Cheaper to run
    • May be killed at any time
    • You can’t use the same Launch Configuration to launch on-demand and spot instances
    • You set your bid price in your Launch Configuration, If you want to change your bid you need a new LC.
    • If your instance is terminated, Auto Scaling will attempt to launch a replacement to maintain desired capacity
    • If your bid price is higher than the market price it will launch a new instance, otherwise it will keep checking the price.

Auto Scaling Processes

Auto Scaling has the following processes:

  • Launch
    • Suspending this process disrupts other processes (e.g. removing a process from standby state) as the group is not able to scale when this process is suspended
  • Terminate
  • HealthCheck
  • ReplaceUnhealthy
    • Uses Terminate and Launch processes
  • AZRebalance
  • AlarmNotification
  • ScheduledActions
  • AddToLoadBalancer
    • Can be useful for testing new instances before sending traffic to them.
    • You need to manually add the instances to the load balancer once you resume the process.

You have the option to suspend one or more of them e.g. disable ReplaceUnhealthy when you’re troubleshooting

Health Checks

By default Auto Scaling uses EC2 health checks. You have the option of enabling ELB health checks too, which will use ELB OutOfService along with the EC2 health.


Manual Scaling

  • Change desired capacity
    • or
  • Attach instances

Scheduled Scaling

  • Time periods cannot overlap, otherwise you’ll get an error
  • Auto Scaling guarantees the order of execution within the same group, but not across different groups
  • The scheduled action defines a start time, and new min, max and desired capacity settings

Dynamic Scaling

  • A combination of alarms and polices define when to scale

Multiple Policies

  • An Auto Scaling group can have more than one scaling policy attached to it any given time.
  • Each Auto Scaling group would have at least two policies: one to scale the architecture out and another to scale the architecture in.
  • If an Auto Scaling group has multiple policies, there is always a chance that both policies can instruct the Auto Scaling to Scale Out or Scale In at the same time.
  • When this situations occur, Auto Scaling chooses the policy that has the greatest impact on the Auto Scaling group for e.g. if two policies are triggered at the same time and Policy 1 instructs to scale out the instance by 1 while Policy 2 instructs to scale out the instances by 2, Auto Scaling will use the Policy 2 and scale out the instances by 2 as it has a greater impact

Termination Policy

The termination policy defines how instances should be terminated


The termination process proceeds through the steps below until it finds an instance to terminate.

  • Select AZ
    • Selects AZ with most instances, with at least one instance not protected by scale in
    • Selects AZ with instances that use the oldest launch configuration.
  • Select instance
    • Terminates the unprotected instance with the oldest launch configuration
    • Terminates the unprotected instance closest to the next billing hour
    • Terminates a random unprotected instance


Terminates in the AZ with the most instances first. If they’re balanced, it applies the termination policy you specified.

  • OldestInstance – terminates the oldest instance in the group and can be useful to upgrade to new instance types
  • NewestInstance – terminates the newest instance in the group and can be useful when testing a new launch configuration
  • OldestLaunchConfiguration – terminates instances that have the oldest launch configuration
  • ClosestToNextInstanceHour – terminates instances that are closest to the next billing hour and helps to maximize the use of your instances and manage costs.
  • Default – terminates as per the default termination policy

Instance Protection

  • Instance protection can be set on a Auto Scaling Group or an Instance at any time.
    • Instance protection is inherited by Instances from their Auto Scalling Group.
  • If all instances are protected during a scale-in event, the Auto Scaling Group will decrement the desired capacity.
  • Instance protection does not protect against:
    • Manual termination via the console, cli or API
    • Termination due to health
    • Spot instance termination due to the bid being too low

Monitoring with CloudWatch

Detailed monitoring can be configured in the launch configuration

  • GroupMinSize - The minimum size of the Auto Scaling group.
  • GroupMaxSize - The maximum size of the Auto Scaling group.
  • GroupDesiredCapacity - The number of instances that the Auto Scaling group attempts to maintain.
  • GroupInServiceInstances - The number of instances that are running as part of the Auto Scaling group. This metric does not include instances that are pending or terminating.
  • GroupPendingInstances - The number of instances that are pending. A pending instance is not yet in service. This metric does not include instances that are in service or terminating.
  • GroupStandbyInstances - The number of instances that are in a Standby state. Instances in this state are still running but are not actively in service.
  • GroupTerminatingInstances - The number of instances that are in the process of terminating. This metric does not include instances that are in service or pending.
  • GroupTotalInstances - The total number of instances in the Auto Scaling group. This metric identifies the number of instances that are in service, pending, and terminating.


Combining On-Demand and Spot Instances

There are a few options for combining on-demand and spot instances with auto scaling:

Simple Setup With Two Auto Scaling Groups

We’ve architected our event worker fleet to be made up of both spot and on demand instances, so we can take advantage of the low cost of spot instances without negative drawbacks.

We’ve biased our Auto Scaling rules towards the usage of spot instances when they’re available for their lower cost. They are our first Auto Scaling group and we try to scale them up aggressively, and scale down slowly.

Our second Auto Scaling group is made up of on demand instances which we scale up conservatively and scale down aggressively because of their higher cost. When possible we prefer that spot instances pick up the slack after we scale down on demand instances.

The trick is knowing how to balance between the two groups to ensure that we have the necessary reliability and performance, at the lowest price. I came up with the following rules to accomplish this goal:

  • I set the maximum bid for the Spot Instances to be equal to the On-Demand Instance price – so that we will never pay more for a Spot Instance than for a more reliable On-Demand instance.
  • I set the threshold for scaling up the Spot Instance group below the threshold for scaling up the On-Demand Instance group – so that the system will always first add capacity using the lower-cost Spot Instances. For example, I set the Spot Instance group to automatically scale up when CPU utilization exceeds 65%, whereas the On-Demand Instance group will only scale up when CPU utilization exceeds 75%.
  • I set the threshold for terminating instances in the opposite manner, such that On-Demand Instances will be terminated before the cheaper Spot Instances.
  • I create an auto-scaling policy for the On-Demand Instance group based on the overall response latency reported by the Elastic Load Balancer. This ensures that our first priority is achieved, namely that the quality of service always remains above a pre-determined threshold.

Autospotting - Scripted replacement of on demand instances with spot instances.

This has the added advantage of using multiple types of instance, as long as they’re as powerful or more powerful than the on demand instance. This makes getting the best price more likely, and the chance of losing multiple spot instances in quick-succession less likely.