AWS and other interesting stuff

CloudFormation

This article records what I’ve learned about CloudFormation’s more advanced features during preparation for The DevOps exam.

CloudFormation

Cloud Formation gives us infrastructure as code which has the advantages:

  • We can version control our infrastructure
  • It encourages collaboration
  • It helps with automating our infrastructure
    • Repeatable, reliable and consistent environments

Custom Resources

A resource type within Cloud Formation that is backed by SNS or Lambda

They exist to get around standard CloudFormation resources limitations:

  • Not all features of all AWS services are supported e.g. API Gateway
  • It can’t operate on non AWS resources
  • Although you have some logic operators, they are limited
  • Its ability to interact with external services as part of stack operations is limited

When a stack is created, updated or delete an event is passed to the SNS topic or Lambda. This includes the other properties you specify in the resource.

A PhysicalResourceId is set by the CustomResource when it is created (it is set to null in the Create request from CloudFormation). If the CustomResource needs to be replaced, the handler needs to send back a new PhysicalResourceId. If the PhysicalResourceId changes, CloudFormation recognises this as a replacement and sends a Delete request to the old resource.

An Example

Using my short-lived credentials and MFA example, this is some functionality that a Custom Resource could provide:

  • when a user’s group or access key setting change, generate their ~/.aws/credentials file and store it encrypted on S3.
  • generate a signed URL for the credentials and email the link to the user
  • delete the credentials file after 15 minutes

The users’ credentials files would look something like this:

[h4-stan]
aws_access_key_id = AAAAAAAAAAAAAAAAAAAA
aws_secret_access_key = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

[h4-sales]
role_arn = arn:aws:iam::12345678910:role/SalesRole
source_profile = h4-stan
mfa_serial = arn:aws:iam::12345678910:mfa/stan

[h4-marketing]
role_arn = arn:aws:iam::12345678910:role/MarketingRole
source_profile = h4-stan
mfa_serial = arn:aws:iam::12345678910:mfa/stan

This is probably over-engineering things, but it is an interesting idea so I’ll investigate if it is possible rather than implementing it fully.

To manage user changes, we need to track a few things

  • the groups they belong to
  • access key changes

As you can’t reference another resource’s properties in the same template, the best way to do this is to replace AWS::IAM::User resources with Custom::<CustomName> and have a Lambda function take care of things that CloudFormation would normally do e.g. changes to group membership.

Rather than tracking the specifics about a user’s access keys, the Custom Resource could just have a AccessKeyVersion property that you only increment e.g. 1.0, 1.1 etc. When the Lambda function is triggered it could check if the version number has increased, and if it has delete the old access keys and generate a new set. Otherwise, if only the groups have changed the generated file could have a placeholder for the access keys saying <replace-this-with-your-current-access-key>

Ideally, this could all be done without tracking state ourselves.

Testing

Initial user configuration:

{
   "AWSTemplateFormatVersion": "2010-09-09",
   "Resources": {
      "SamUser" : {
         "Type": "Custom::ManagedUser",
         "Properties": {
            "ServiceToken": "arn:aws:lambda:ap-southeast-2:<REDACTED>MUFunction",
            "Groups": [ { "Fn::ImportValue": "MFAGroupsStack-SalesGroup" } ],
            "UserName": "sam",
            "Email": "sam@h4.nz",
            "AccessKeyVersion": "1.0"
         }
      },
      "StanUser" : {
         "Type": "Custom::ManagedUser",
         "Properties": {
            "ServiceToken": "arn:aws:lambda:ap-southeast-2:<REDACTED>MUFunction",
            "Groups": [
               { "Fn::ImportValue": "MFAGroupsStack-BossGroup" },
               { "Fn::ImportValue": "MFAGroupsStack-MarketingGroup" },
               { "Fn::ImportValue": "MFAGroupsStack-SalesGroup" }
            ],
            "UserName": "stan",
            "Email": "stan@h4.nz",
            "AccessKeyVersion": "1.0"
         }
      }
   }
}

Using this minimal Lambda function for the Custom Resource backend I can log what Cloud Formation sends the function.

This is what is sent for the sam user:

{
    "StackId": "arn:aws:cloudformation:ap-southeast-2:<REDACTED>...",
    "ResponseURL": "https://cloudformation-custom-resource-response-apsoutheast2.s3-ap-southeast-2.amazonaws.com/<REDACTED>...",
    "ResourceProperties": {
        "UserName": "sam",
        "AccessKeyVersion": "1.0",
        "ServiceToken": "arn:aws:lambda:ap-southeast-2:<REDACTED>MUFunction",
        "Email": "sam@h4.nz",
        "Groups": [
            "SalesGroup"
        ]
    },
    "RequestType": "Create",
    "ServiceToken": "arn:aws:lambda:ap-southeast-2:<REDACTED>MUFunction",
    "ResourceType": "Custom::ManagedUser",
    "RequestId": "6a6b1a45-77b3-42d9-b5c2-2104d1086688",
    "LogicalResourceId": "SamUser"
}

This is what is sent for the stan user:

{
   ...snip...
      "AccessKeyVersion": "1.0",
      "Groups": [
          "BossGroup",
          "MarketingGroup",
          "SalesGroup"
      ],
   ...snip...
   "RequestType": "Create",
}

If I remove stan from the BossGroup, the function gets this:

{
   ...snip...
   "ResourceProperties": {
      ...snip...
      "AccessKeyVersion": "1.0",
      "Groups": [
        "MarketingGroup",
        "SalesGroup"
      ]
   },
   "RequestType": "Update",
   "OldResourceProperties": {
      ...snip...
      "AccessKeyVersion": "1.0",
      "Groups": [
        "BossGroup",
        "MarketingGroup",
        "SalesGroup"
      ]
   }
   ...snip...
}

i.e. We get before and after snapshots of the properties, meaning CloudFormation stores state for us :-)

We now have all the information we need:

  • the user that changed
    • we can look-up their MFA device ARN
  • what changed
    • groups can be revoked or applied
    • a change in AccessKeyVersion would indicate we should delete the existing access key and generate a new one
  • the user’s email address

We can then generate a credentials file, store it encrypted on S3, generate a signed URL and email it to the user.

Conclusion

It is definitely possible to generate and automatically distribute credentials in a secure way when user configuration is changed.

In a simplied version of this, you could just generate and send the role configuration (ARN and the user’s MFA device), as it less sensitive than the access keys; the user could be emailed directly rather than using encrypted S3 and signed URLs. It would then be up to the user to generate their own access keys.

Creation Polices and Wait Conditions / Handlers

The DependsOn attribute allows you to specify that a resource should be created after (or deleted before) another one, but that is not useful when the readiness of a resource is more complex than its state e.g. a NAT firewall that has a number of configuration steps may not be ready for production even though its EC2 status check is OK. For these resources, we need some way of signaling when they’re actually ready. That’s where Creation Policies and Wait Conditions come in.

Creation Polices and Wait Conditions/Handlers do the same task - influence when a resource is marked as completed - delaying dependent resource provisioning until it’s actually ready.

Creation Policies are like a simple version of Wait Conditions/Handlers. They are the AWS recommended way of working with EC2 instances and Auto Scaling Groups, and they can only be used with those resource types currently.

Wait Conditions / Handlers can be used to handle much more complex scenarios. A Wait Handler is a resource with no properties, but it generates a signed URL that can be used to communicate SUCCESS or FAILURE (and optionally some data). Can be used by cfn-signal or CURL etc.

Creation Policies

Creation Policies prevent a resource from going to CREATE_COMPLETE state until it has had the required number of signals.

Testing

Setup: (https://gist.github.com/SteveHoggNZ/4cc9e5d60fe546ffbd73870379fb5f64)

Using a minimal EC2 instance template where:

  • The instance has a CreationPolicy that waits for 1 signal and has a 10 minute timeout
{
   ...snip...
   "MyEC2": {
     "Type" : "AWS::EC2::Instance",
     "Properties" : {
        ...snip...
     },
     "CreationPolicy": {
       "ResourceSignal": {
         "Count": 1,
         "Timeout": "PT10M"
       }
     }
   },
   ...snip...
}
  • There is a MyTestSecurityGroup that DependsOn on the EC2 instance

Expected Result:

  • The CloudFormation event for EC2 instance will stay in creating state even though status checks for the instance are OK
  • The MyTestSecurityGroup will not exist as it DependsOn the EC2 instances

Result:

ssh -i ~/the-key.pem ec2-user@13.54.116.167 /opt/aws/bin/cfn-signal --success true --stack EC2CreationPolicy --resource MyEC2 --region ap-southeast-2

Note: -e $? is another way of communicating success/failure i.e. send the exit code of the last command

2016-11-30 21:53:16,754 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.ap-southeast-2.amazonaws.com
2016-11-30 21:53:16,755 [DEBUG] Signaling resource MyEC2 in stack EC2CreationPolicy with unique ID i-0bdeb5de74f720fc8 and status SUCCESS

As expected, CloudFormation blocked in CREATE_IN_PROGRESS state, and MyTestSecurityGroup was not created until I sent the signal

A CloudFormation event is recorded as follows:

Received SUCCESS signal with UniqueId i-0bdeb5de74f720fc8

Testing 2

Using the auto-scaling group example from the documentation.

Setup: (https://gist.github.com/SteveHoggNZ/9a58f2dba8e2205f84be33a706b052a9)

{
...snip...
   "CreationPolicy": {
      "ResourceSignal": {
         "Count": "3",
         "Timeout": "PT15M"
      }
   },
   "UpdatePolicy" : {
      "AutoScalingScheduledAction" : {
         "IgnoreUnmodifiedGroupSizeProperties" : "true"
      },
      "AutoScalingRollingUpdate" : {
         "MinSuccessfulInstancesPercent": "100",
         "MinInstancesInService" : "1",
         "MaxBatchSize" : "2",
         "PauseTime" : "PT1M",
         "WaitOnResourceSignals" : "true"
      }
   }
...snip...
}

The CreationPolicy can also have a AutoScalingCreationPolicy property and MinSuccessfulInstancesPercent property.

Things of note in the configuration above:

  • Unlike EC2 instances, AutoScalingGroup can have a count > 1
  • AutoScalingScheduledAction.IgnoreUnmodifiedGroupSizeProperties - CloudFormation will only change the MinSize, MaxSize, or DesiredCapacity values if they have been updated in the template, otherwise they’ll remain at the current running values.
  • AutoScalingRollingUpdate.MinSuccessfulInstancesPercent - The rolling update will only succeed if this percentage of servers signal success. The default is 100%.
  • AutoScalingRollingUpdate.WaitOnResourceSignals - During updates, an instance in the update batch are not in service until they signal they are.
  • The Timeout value is in ISO8601 format e.g. PT1H30M10S

Test 1

Create a new stack with DesiredCapacity = 1 and CreationPolicy.ResourceSignal.Count = 2

Expected Result: Stack creation will stay in creating state as the DesiredCapacity < ResourceSignal.Count

Result: As expected. Manually increasing the DesiredCapacity to 2 triggers another signal and completes the stack creation.

Test 2

Change LaunchConfiguration Logical ID from LaunchConfig to LaunchConfigNew and update the stack

Expected Result:

  • A new LaunchConfiguration is created and applied to the AutoScalingGroup
  • A new instances is created
  • An old instance is deleted
  • The last 2 steps are repeated until all old instances are replaced

Result: As expected

Misc Findings

You can force a replacement of all instances in an Auto Scaling Group by changing the Logical ID of the Launch Configuration in the template and updating the stack.

You can replace the entire AutoScalingGroup on update using the WillReplace setting.

{
   ...snip...
   "UpdatePolicy" : {
      "AutoScalingReplacingUpdate" : {
         "WillReplace" : "true"
      },
      "CreationPolicy" : {
         ...snip...
      }
   }
   ...snip...
}

This will rollback to the old AutoScalingGroup if the update fails e.g. the instances don’t signal as per the CreationPolicy. A AutoScalingReplacingUpdate takes priority over a AutoScalingRollingUpdate if both are set. Changes to DesiredCapacity, MinSize and MaxSize do not cause a new AutoScalingGroup to be created, but changing the LaunchConfiguration does.

Forcing this to fail with a DesiredCapacity = 1, CreationPolicy.ResourceSignal.Count = 2 and CreationPolicy.ResourceSignal.Timeout = PT1M causes CloudFormation to rollback to the old AutoScalingGroup with these event entries:

Failed to receive 1 resource signal(s) for the current batch. Each resource signal timeout is counted as a FAILURE.

Received 1 SUCCESS signal(s) out of 2. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement

Wait Condition / Handle

Use cases:

  • Synchonizing resource creation
  • Waiting for external resources

If you assign a CreationPolicy to a WaitCondition then you don’t have to use a WaitConditionHandle i.e. the CreationPolicy on a WaitCondition serves the same purpose as a WaitConditionHandle in that it receives signals. This is useful to coordinate the creation of different resources e.g.

"WaitCondition": {
  "Type": "AWS::CloudFormation::WaitCondition",
  "CreationPolicy": {
    "ResourceSignal": {
      "Timeout": "PT15M",
      "Count": "5"
    }
  }
}

A WaitCondition has an optional Count property for the number of signals it needs to receive from the CreationPolicy / WaitConditionHandle. This defaults to 1.

Signaling a resource (CreationPolicy):

/opt/aws/bin/cfn-signal -e 0 --stack { "Ref": "AWS::StackName" } --resource AutoScalingGroup --region { "Ref" : "AWS::Region" }

Signaling a WaitCondition Handle:

/opt/aws/bin/cfn-signal -e 0 --data <data> waitconditionhandle.url

i.e. a WaitConditionHandle can accept data

A WaitCondition is in CREATE_IN_PROGRESS state until they receive the required number of signals or they time out

If they get a failed signal or they timeout their state becomes CREATE_FAILED and the stack rolls back.

A WaitConditionHandle takes no properties. Using the Ref function it returns a pre-signed S3 URL that accepts PUT requests. The PUT request needs to be in the following format:

{
  "Status": "SUCCESS|FAILURE",
  "UniqueId": "ID342",
  "Data": "Data to pass back to CloudFormation",
  "Reason": "A string to return"
}

Test

Setup: (https://gist.github.com/SteveHoggNZ/2319b3bf3c30a2f1876c0dbf7aaba5c6)

  • The template creates a WaitConditionHandle and associates that with a WaitCondition.
  • The WaitConditionHandle is passed to an EC2 instance - Ec2Instance - as UserData.
  • A second EC2 instance - Ec2Instance2 - that DependsOn the WaitCondition and has the Data output of the WaitCondition as its UserData.

Upon creating a stack, the stack blocks in CREATE_IN_PROGRESS.

SSHing into Ec2Instance and running this:

/opt/aws/bin/cfn-signal -e 0 --data 'hello world' $(curl http://169.254.169.254/latest/user-data)

results in the following Output in the CloudFormation stack:

Key Value Description
ApplicationData {“i-0711e10d9f2b10a49”:“hello world”} The data passed back as part of signalling the WaitCondition.

After the first signal, the Ec2Instance2 instance is created and its UserData is as expected:

curl http://169.254.169.254/latest/user-data; echo
{"i-0711e10d9f2b10a49":"hello world"}

Stack Policy

When you create a stack, all update actions are allowed on all resources. By default, anyone with stack update permissions can update all of the resources in the stack.

A stack policy is a JSON document that defines the update actions that can be performed on designated resources.

After you set a stack policy, all of the resources in the stack are protected by default. To allow updates on specific resources, you specify an explicit Allow statement for those resources in your stack policy.

Once a policy has been applied it can not be deleted.

Use a stack policy only as a fail-safe mechanism to prevent accidental updates to specific stack resources. To control access to AWS resources or actions, use IAM.

{
  "Statement" : [
    {
      "Effect" : "Allow",
      "Action" : "Update:*",
      "Principal": "*",
      "Resource" : "*"
    },
    {
      "Effect" : "Deny",
      "Action" : "Update:*",
      "Principal": "*",
      "Resource" : "LogicalResourceId/ProductionDatabase"
    }
  ]
}

CloudFormation automatically updates resources that depend on an updated resource. So if one resource references another, and that referenced resource is updated, CloudFormation will also update the dependent resource. The dependent resources also have to have a policy that grants the update permission.

You can override the stack policy on a per-update basis by supplying another stack policy at update time. This can be done via the CLI, API and console.

The impact of an update can be:

  • No interruption
    • DynamoDB ProvisionedThroughput
  • Some interruption (e.g. reboot)
    • EbsOptimized (EBS backed only)
    • InstanceType (EBS backed only)
  • Replacement
    • EC2 AZ change i.e. snapshot then reprovision
    • ImageId
    • DynamoDB Tablename
    • Auto Scaling Launch Configuration
  • Delete

Deletion Polices

Setup: (https://gist.github.com/SteveHoggNZ/bdc55a1a6c633346f44620f1a9a9d848)

  • An S3 bucket with “DeletionPolicy” : “Retain” set
  • A EBS volume with “DeletionPolicy” : “Snapshot” set

Result: S3 DELETE_SKIPPED and a snapshot created for the volume

Update Policy

The AWS::AutoScaling::AutoScalingGroup resource can have an UpdatePolicy attribute set. Changing a launch configuration will not update existing instances, so an update policy is required to define how the new configuration is applied.

UpdatePolicy options are:

  • AutoScalingReplacingUpdate
    • WillReplace (boolean)
      • If true, the Auto Scaling group and its instances will be replaced during an update
        • During the update, CloudFormation retains the old Auto Scaling group allowing for easy rollback. The old group is deleted if the update succeeds.
      • If true, you should set a CreationPolicy to specify how many instances need to signal success for the update to succeed.
        • e.g. "CreationPolicy": { "AutoScalingCreationPolicy": { "MinSuccessfulInstancesPercent": "50" } }
  • AutoScalingRollingUpdate
    • Allows us to control how many instances are changed at a time
    • Options:
      • MaxBatchSize
      • MinInstancesInService
      • MinSuccessfulInstancesPercent
      • PauseTime
      • SuspendProcesses
        • this is required if we have rolling updates and scheduled actions associated with the same group i.e. "SuspendProcesses": ["ScheduledActions"]
      • WaitOnResourceSignals
  • AutoScalingScheduledAction
    • IgnoreUnmodifiedGroupSizeProperties - stops CloudFormation from changing Min, Max and Desired values unless they’re changed in the template.

AutoScalingReplacingUpdate and AutoScalingRollingUpdate apply when changes are made to:

  • The Auto Scaling Launch configuration
  • The Auto Scaling group’s VPCZoneIdentifier property (subnets)
  • When updating an Auto Scaling group that has instances that don’t match the current Launch Configuration

AutoScalingReplacingUpdate takes priority over AutoScalingRollingUpdate

cfn-hup could cause all instances to update at the same time. To prevent this, we could force a rolling update by changing the logical ID for the launch configuration, and references to it. This would trigger a rolling update on all instances. Note: creating a new launch configuration will mean replacement of all instances.

AutoScalingScheduledAction applies when we update a stack that includes an Auto Scaling group with an associated scheduled action

UpdatePolicy:
  AutoScalingScheduledAction:
    IgnoreUnmodifiedGroupSizeProperties: Boolean

Cross-stack References

Allows you to reference exported variables from another stack in the same region. Export Name must be unique.

MFA example export groups e.g.

{
   "SalesGroup": {
      "Value": { "Ref": "SalesGroup" },
      "Description": "The Sales Group",
      "Export": {
         "Name": {"Fn::Sub": "${AWS::StackName}-SalesGroup" }
      }
   }
}

MFA example import groups e.g.

{
   "SamUser" : {
      "Type": "AWS::IAM::User",
      "Properties": {
         "Groups": [ { "Fn::ImportValue": "MFAGroupsStack-SalesGroup" } ],
         "UserName": "sam"
      }
   }
}

Nested Stacks

You can declare a AWS::CloudFormation::Stack resource that has a TemplateURL set to another template. That template will be used to create the nested stack. e.g.

"VPCStack": {
  "Type": "AWS::CloudFormation::Stack",
  "Properties": {
      "TemplateURL": "https://s3-ap-southeast-2.amazonaws.com/<REDACTED>/<REDACTED>.json",
      "TimeoutInMinutes": "5"
  }
}

You can reference the nested stack’s outputs using Fn::GetAtt e.g.

"VpcId": { "Fn::GetAtt": ["VPCStack", "Outputs.VpcId"] },

Nested stacks can be passed parameters e.g.

"BastionStack": {
  "Type": "AWS::CloudFormation::Stack",
  "Properties": {
      "TemplateURL": "https://s3-ap-southeast-2.amazonaws.com/<REDACTED>/<REDACTED>.json",
      "Parameters": {
          "VpcId": { "Fn::GetAtt": ["VPCStack", "Outputs.VpcId"] },
          "DeployBastion": { "Ref": "DeployBastion" },
          "SubnetId": { "Fn::GetAtt": ["VPCStack", "Outputs.PublicSubnetA"] }
      }
  }
}

Reference: https://gist.github.com/SteveHoggNZ/cd3855a329632a3c3934adb80a5a646d

Conditional Resources

There is an optional Conditions section you can use. Following on from the nested stack example above (https://gist.github.com/SteveHoggNZ/3347bf2ab30f16a29b44c936ebcdd39a)

"Conditions" : {
  "Deploy" : {"Fn::Equals" : [{"Ref" : "DeployBastion"}, "Yes"]}
},

You can then specify that a resource is created only if the condition is met:

"BastionServer": {
   "Type" : "AWS::EC2::Instance",
   "Condition" : "Deploy"
   snip...
}

AWS::CloudFormation::Init

This can be set in a resource’s Metadata and has config sections like:

  • groups - gid
  • users - uid, groups and homeDir
  • packages - the packages to Install
  • commands - the commands to run. test (conditional execution), env and cwd can be set
  • services - OS services to enable / disabled
  • files - the files to create, content, source (URL), ownership and permissions
  • sources - download files to specific places on the file system

Following on from the nested stack example above (https://gist.github.com/SteveHoggNZ/3347bf2ab30f16a29b44c936ebcdd39a)

"BastionServer": {
  "Type" : "AWS::EC2::Instance",
  "Condition" : "Deploy",
  "Metadata" : {
    "AWS::CloudFormation::Init" : {
      "config" : {
        "packages" : {
          "yum": {
            "mysql": [],
            "python27-boto3": []
          }
        },
        "files": {
          "/root/.aws/config": {
            "content" : { "Fn::Join" : ["", [
              "[default]", "\n",
              "region=", { "Ref" : "AWS::Region" }, "\n"
            ]]},
            "mode"  : "000644",
            "owner" : "root",
            "group" : "root"
          },
          "/home/ec2-user/.aws/config": {
            "content" : { "Fn::Join" : ["", [
              "[default]", "\n",
              "region=", { "Ref" : "AWS::Region" }, "\n"
            ]]},
            "mode"  : "000644",
            "owner" : "ec2-user",
            "group" : "ec2-user"
          }
        }
      }
    }
  }
  snip...
}

Then the init metadata can be used by the cfn-init script on first boot:

"BastionServer": {
  "Type" : "AWS::EC2::Instance",
  "Condition" : "Deploy",
  snip...
   "Properties": {
     "ImageId": "ami-55d4e436",
     "InstanceType": "t2.micro",
     "KeyName": "DomainsDirectKeyPair",
     "NetworkInterfaces": [ {
       "AssociatePublicIpAddress": "true",
       "DeviceIndex": "0",
       "GroupSet": [{ "Ref": "BastionSecurityGroup" }],
       "SubnetId": { "Ref": "SubnetId" }
     } ],
     "IamInstanceProfile": { "Ref": "BastionInstanceProfile" },
     "UserData"       : { "Fn::Base64" : { "Fn::Join" : ["", [
       "#!/bin/bash -xe\n",
       "yum update -y aws-cfn-bootstrap\n",

       "# Install the files and packages from the metadata\n",
       "/opt/aws/bin/cfn-init -v ",
       "         --stack ", { "Ref" : "AWS::StackName" },
       "         --resource BastionServer ",
       "         --region ", { "Ref" : "AWS::Region" }, "\n"
     ]]}},
     "Tags": [
       {
         "Key": "Name",
         "Value": "BastionServer"
       }
     ]
   }
}

Config can be split up into multiple Configsets …

"AWS::CloudFormation::Init" : {
 "configSets" : {
  "ascending" : [ "config1" , "config2" ],
  "descending" : [ "config2" , "config1" ]
 },
 "config1" : {
  "commands" : {
   "test" : {
    "command" : "echo \"$CFNTEST\" > test.txt",
    "env" : { "CFNTEST" : "I come from config1." },
    "cwd" : "~"
   }
  }
 },
 "config2" : {
  "commands" : {
   "test" : {
    "command" : "echo \"$CFNTEST\" > test.txt",
    "env" : { "CFNTEST" : "I come from config2" },
    "cwd" : "~"
   }
  }
 }
}    

… that can then be selected via the command line

cfn-init -c ascending
cfn-init -c descending

-c is shorthand for --config-sets

A ConfigSet is a list that can include a config name, or another ConfigSet using:

{ "ConfigSet" : "test1" }

Metadata

This allows you to set arbitrary data for resources, which is then queryable e.g.

aws cloudformation describe-stack-resource

Metadata is used for cfn-init, designer layout and interface layout e.g.

You can group parameters in the console using the AWS::CloudFormation::Interface Metadata key. e.g. create 2 ParameterGroups

{
   "Metadata": {
    "AWS::CloudFormation::Interface" : {
     "ParameterGroups" : [
       {
         "Label" : { "default" : "Environment" },
         "Parameters" : [ "Environment", "DeployBastion" ]
       },
       {
         "Label" : { "default" : "Database Configuration" },
         "Parameters" : [ "DBName", "DBUser", "DBPassword" ]
       }
     ]
    }
   }
}

The cfn-hup script updates the meta data, triggers hooks.d scripts on an instance and by default runs every 15 minutes. e.g.

...
  LaunchConfig:
    Type: "AWS::AutoScaling::LaunchConfiguration"
    Metadata:
      QBVersion: !Ref paramQBVersion
      AWS::CloudFormation::Init:
...
            /etc/cfn/hooks.d/cfn-auto-reloader.conf:
              content: !Sub |
                [cfn-auto-reloader-hook]
                triggers=post.update
                path=Resources.LaunchConfig.Metadata.AWS::CloudFormation::Init
                action=/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource LaunchConfig --configsets wordpress_install --region ${AWS::Region}
              mode: "000400"
              owner: "root"
              group: "root"
...

Notifications

You can configure stack related events to an SNS topic using the CLI’s --notification-arns switch.

IAM Role

You can choose an IAM role that CloudFormation uses to create, modify, or delete resources in the stack. If you don’t choose a role, CloudFormation uses the permissions defined in your account. It creates a temporary session for itself based upon your account permissions.

Debugging

  • Creation failures
    • Rollback on failure is true by default, but you can turn this off. You can do this then check the logs.
    • You can log to CloudWatch Logs instead
  • Deletion failures
    • e.g. a non-empty S3 bucket can’t be deleted
    • If a stack is stuck in DELETE_FAILED state you can use the RetainResources option to delete it
    • You may not have IAM permissions to delete the resource
  • Rollback failures
    • e.g. nested stacks have dependencies between resources that are blocking rollback
    • e.g. a resource was modified outside of Cloud Formation
    • You may be able to manually resolve the problem or you can contact AWS support

Validating Templates

$aws cloudformation validate-template --template-body file://the-file.json
$aws cloudformation validate-template --template-url https://the-file.json

Common Errors

  • Dependency Errors
  • Insufficient IAM permissions
  • Invalid value or unsupported resource property
  • Security group does not exist in VPC
    • Not in the VPC
    • Or if you incorrectly use the name instead of the ID
  • Wait Condition didn’t receive the required number of signals

Immutable Attributes

If you change an attribute that is immutable, then the resource will be replaced.

Immutable:

  • AMI
  • Instance type - when changing architecture from HVM to Para-Virtualization and vice-versa
    • If they’re different architectures they must use different AMIs
    • e.g. t2.micro to m1.large causes a replacement

Mutable:

  • Instance type - when using the same architecture: Para-Virtualization or HVM
    • e.g. t2.micro to m4.large causes a stop/start, public IP change etc. The instance ID stays the same.

Deployment Methods

Types

Bootstrapping with CloudFormation

Helper scripts:

Available in Amazon Linux, RPM and Windows 2008+ (via Python)

  • cfn-init
    • reads template meta data from the AWS::CloudFormation::Init key
    • control of flow can be done using configsets
    • required options: stack and resource
  • cfn-signal
    • used in combination with cfn-init
    • signal back to CloudFormation a success of failure -e $?
    • required options: stack and resource
    • can be used with waitconditionhandle.url and send it data
  • cfn-get-metadata
    • get a metadata block and output it to stdout
    • you can use the --key flag to limit it to a branch of metadata
  • cfn-hup
    • allows you to make configuration changes on running instances
    • a daemon that detects changes in resource metadata and then runs actions when a change is detected
    • uses /etc/cfn/cfn-hup.conf by default: [main] stack=<stack> credentials-file=<file> region=<region> interval=<number> verbose=<boolean>
    • hooks [hookname] triggers=post.add or post.update or post.remove path=Resources.<LogicalResourceId> (.Metadata or .PhysicalResourceId)(.<optionalMetadataPath>) action=<arbitrary shell command> runas=<runas user>
      • Triggers - list of conditions to detect. Can be separated by a comma
      • Path - The path in the metadata object
        • the .Metadata suffix means we monitor for meta data changes only, optionally further filtered by path.
        • e.g. triggers=post.update path=Resources.WebServerInstance.Metadata.AWS::CloudFormation::Init
        • e.g. you could add a version number to the meta data and monitor for changes to that

Consider:

  • Bootstrapping actions can take a long time to complete, increasing deployment times. You can speed things up by pre-baking AMIs.
  • Storing secrets - keeping them out of your code
    • Use parameters with a NoEcho property
  • You have to use Cloud Formation for updating your application
    • Cloud Formation can perform rolling updates on Auto Scaling groups using an UpdatePolicy

Deploying with CloudFormation and (Puppet or Chef)

Use Cloud Formation scripts to install and configure Puppet to do the configuration management

Deploying with CloudFormation and Elastic Beanstalk

You can version control your templates, use the templates to create dev, test, prod environments

Elastic Beanstalk takes care of deployments:

Helps decouple environments e.g. create Amazon RDS, S3 or DynamoDB separately so they don’t have to be thrown away. e.g. Elastic Beanstalk can create RDS instances in the Data Tier, and you can use .ebextensions to create the DynamoDB tables. Those resources are then tied to the Elastic Beanstalk application.

  • “All at once” deployment
  • Rolling deployment
    • Detach a batch from the load balancer and update them first
  • Rolling with additional batch
    • Create new instances and treat them as the first batch
  • Immutable deployment
    • Deploy code to a newly created instance, then if that succeeds to more new instances
    • Cut over to the parallel fleet 25% at a time

Compared to OpsWorks, Elastic Beanstalk:

  • Doesn’t allow as much flexibility
  • Is more suitable for shorter application lifecycles where an environment can be thrown away with each deploy

Deploying with CodeDeploy, CodeCommit and CodePipeline

CloudFormation is capable of managing these resources too

In-place vs Disposable methods

In-place upgrades:

  • Usually faster

Disposable upgrades:

  • Usually safer
  • Elastic Beanstalk and CloudFormation are better suited for this method than OpsWorks