AWS and other interesting stuff

S3

S3

Storage Classes

  • Standard
    • 11 9s durability
    • 99.99% availability over a given year
    • Higher retrieval costs than RRS
  • Standard - Infrequent Access (Standard_IA)
    • 11 9s durability
    • 99.9% availability over a given year
    • Minimum size 128 KB
    • Minimum age 30 days for objects when transitioning them from Standard to Standard_IA
    • Higher request costs
  • Reduced Redundancy Storage (RRS)
    • 99.99% durability - data cannot sustain 2 concurrent facility failures
    • 99.99% availability over a given year
    • Low retrieval costs

The storage class must be set on each upload operation; it can’t be set on a per bucket basis.

Amazon Glacier

  • Low storage cost per GB
  • High durability
  • Very slow to retrieve data
  • Requests are expensive

CLI

Storage Class

–storage-class (STANDARD | REDUCED_REDUNDANCY | STANDARD_IA)

$ aws s3 cp network-for-north-west.jpg s3://h4-tmp2/ --storage-class REDUCED_REDUNDANCY

$ aws s3api list-objects --bucket h4-tmp2 --prefix network-for-north-west.jpg
{
    "Contents": [
        {
            "LastModified": "2017-02-09T22:29:01.000Z",
            "ETag": "\"3d46fd91b8aeb865f2516c35250b1511\"",
            "StorageClass": "REDUCED_REDUNDANCY",
            "Key": "network-for-north-west.jpg",
            "Owner": {
                "DisplayName": "aws",
                "ID": "971e6b5cd3085aeb6858af18e0db30e64a2693c074570d0a033c7f093840e1a2"
            },
            "Size": 178292
        }
    ]
}

Encryption

–sse AES256

$ cat > hello.txt <<EOF
Hello World
EOF
$ aws s3 cp hello.txt s3://h4-tmp2/ --sse AES256
upload: ./hello.txt to s3://h4-tmp2/hello.txt
$ aws s3 cp s3://h4-tmp2/hello.txt -
Hello World

–sse-c AES256

The supplied key should needs to be 32 characters (generation is left as an exercise for the reader)

$ aws s3 cp hello.txt s3://h4-tmp2/ --sse-c AES256 --sse-c-key 0CC175B9C0F1B6A831C399E269772661
upload: ./hello.txt to s3://h4-tmp2/hello.txt

With the correct key:

$ aws s3 cp s3://h4-tmp2/hello.txt - --sse-c AES256 --sse-c-key 0CC175B9C0F1B6A831C399E269772661
Hello World

With the wrong key:

$ aws s3 cp s3://h4-tmp2/hello.txt - --sse-c AES256 --sse-c-key WRONG_KEY_WRONG_KEY_WRONG_KEY_XX
download failed: s3://h4-tmp2/hello.txt to - An error occurred (403) when calling the HeadObject operation: Forbidden

Misc

Files uploaded to Amazon S3 that are smaller than 5GB have an ETag that is simply the MD5 hash of the file, which makes it easy to check if your local files are the same as what you put on S3. But if your file is larger than 5GB, then Amazon computes the ETag differently (it does an MD5 of the concatenation of each part’s MD5).

Permissions

  • ACLs can be set on buckets or objects put-bucket-acl put-object-acl
  • Bucket Polices can only be set on buckets
  • Folders can have encryption and storage class set on them, but all this really does is apply those settings to the objects currently within the bucket. i.e. folders don’t really exist, they’re just key prefixes.

ACL Format:

{
  "Grants": [
    {
      "Grantee": {
        "DisplayName": "string",
        "EmailAddress": "string",
        "ID": "string",
        "Type": "CanonicalUser"|"AmazonCustomerByEmail"|"Group",
        "URI": "string"
      },
      "Permission": "FULL_CONTROL"|"WRITE"|"WRITE_ACP"|"READ"|"READ_ACP"
    }
    ...
  ],
  "Owner": {
    "DisplayName": "string",
    "ID": "string"
  }
}

ACL Grantee Presets:

  • Everyone
  • Any Authenticated AWS User
  • Log Delivery
  • Me
  • aws

ACL Permissions:

  • List
  • Upload/Delete
  • View Permissions

  • Note: ACP permissions allow the user to write or read the ACL for the bucket

Object Lifecycle Management

Lifecycle configurations are XML documents, applied to buckets, made up of a set of rules with actions. Actions include:

  • Transition actions - where we define when objects transition to another S3 storage class
  • Expiration actions - where we define when objects get deleted

Limitations

  • Objects must be stored at least 30 days in the current storage class before they can transition to Standard_IA
  • You cannot transition from Standard_IA to Standard or RRS
  • You cannot transition from Glacier to any other storage class
  • You cannot transition from any storage class to RRS
    • You need to set this storage class manually when you upload it or copy it

Samples

<LifecycleConfiguration>
  <Rule>
    <ID>Transition and Expiration Rule</ID>
    <Filter>
       <Prefix>tax/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>365</Days>
      <StorageClass>GLACIER</StorageClass>
    </Transition>
    <Expiration>
      <Days>3650</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>
...
<Filter>
   <And>
      <Tag>
         <Key>key1</Key>
         <Value>value1</Value>
      </Tag>
      <Tag>
         <Key>key2</Key>
         <Value>value2</Value>
      </Tag>
    </And>
</Filter>
...
<LifecycleConfiguration>
  <Rule>
    <ID>example-id</ID>
    <Filter>
       <Prefix>logs/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>30</Days>
      <StorageClass>STANDARD_IA</StorageClass>
    </Transition>
    <Transition>
      <Days>90</Days>
      <StorageClass>GLACIER</StorageClass>
    </Transition>
    <Expiration>
      <Days>365</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Conflicting Rules

In this case, because you want objects to expire (removed), there is no point in changing the storage class, and Amazon S3 simply chooses the expiration action on these objects:

<LifecycleConfiguration>
  <Rule>
    <ID>Rule 1</ID>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Expiration>
      <Days>365</Days>
    </Expiration>        
  </Rule>
  <Rule>
    <ID>Rule 2</ID>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <StorageClass>STANDARD_IA<StorageClass>
      <Days>365</Days>
    </Transition>
   </Rule>
</LifecycleConfiguration>

Versioning-enabled Bucket

Otherwise, if there are overlapping rules the first one wins i.e. if one applies at 30 days and another at 31, the 30 days one wins.

<LifecycleConfiguration>
    <Rule>
        <ID>sample-rule</ID>
        <Filter>
           <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>
           <Days>90</Days>
           <StorageClass>STANDARD_IA</StorageClass>
        </Transition>
        <NoncurrentVersionTransition>      
            <NoncurrentDays>30</NoncurrentDays>      
            <StorageClass>GLACIER</StorageClass>   
        </NoncurrentVersionTransition>    
       <NoncurrentVersionExpiration>     
            <NoncurrentDays>365</NoncurrentDays>    
       </NoncurrentVersionExpiration>
    </Rule>
</LifecycleConfiguration>

Multipart Upload

<LifecycleConfiguration>
    <Rule>
        <ID>sample-rule</ID>
        <Filter>
           <Prefix>SomeKeyPrefix/</Prefix>
        </Filter>
        <Status>rule-status</Status>
        <AbortIncompleteMultipartUpload>
          <DaysAfterInitiation>7</DaysAfterInitiation>
        </AbortIncompleteMultipartUpload>
    </Rule>
</LifecycleConfiguration>