AWS/S3

From Christoph's Personal Wiki
Revision as of 00:03, 28 January 2017 by Christoph (Talk | contribs)

Jump to: navigation, search

Amazon Simple Storage Service (S3), provides developers and IT teams with secure, durable, highly-scalable cloud storage. Amazon S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost.

Features

  • Simple key-value store:
    • key = name of the object;
    • value = the actual data (made up of a sequence of bytes);
    • version ID (important for versioning); and
    • metadata (data about the data you are storing)
    • Sub-resources:
      • Access Control Lists (ACLs)
      • Torrent (support for the bittorrent protocol)
  • S3 bucket URL: https://s3-<region>.amazonaws.com/<bucket_name> (e.g., https://s3-us-west-1.amazonaws.com/foobar)
  • SLA:
    • availability: 99.99% (2 nines)
    • durability: 99.99999999999% (11 nines)
  • Files can be from 1 byte to 5 TB in size (split files larger than 5 GB into pieces to upload)
    • Note that the largest size file you can transfer to S3 using a PUT operation is 5 GB
  • Unlimited storage
  • Files/objects are stored in "buckets"
  • S3 is a universal namespace (i.e., bucket names must be unique globally; think domain names)
  • Read-after-Write consistency for PUTs of new Objects
  • Eventual Consistency for overwrite PUTs and DELETEs (can take some time to propagate)
  • Lifecyle management
  • Versioning
  • Encryption (default Advanced Encryption Standard (AES) 256bit)
    • In transit (SSL/TLS)
    • At Rest:
      • Server Side Encryption (SSE):
        • S3 Managed Keys (SSE-S3; 256bit);
        • AWS Key Management Service, Managed Keys (SSE-KMS)
        • Server Siide Encryption with Customer Provided Keys (SSE-C)
    • Client Side Encryption (user encypts data on their local machine and then upload to AWS S3)
  • Secure your data with Bucket Policies and ACLs
  • Storage tiers/classes:
    • S3 99.99% (durable, immediately available, frequently accessed): stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
    • S3 IA (Infrequently Accessed) (durable, immediately available, infrequently accessed): for data that is accessed less frequently, but requires rapid access when needed. Lower fee than S3, but you are charged a retrieval fee
    • S3 Reduced Redundancy Storage (RRS): designed to provide 99.99% availability/durability of objects over a given year (for objects where it is not critical if they are lost; e.g., thumbnails of images, as they can be easily regenerated). concurrent facility fault tollerance = 1
    • Glacier: Very cheap, but used for archival only. It takes 3-5 hours to restore from Glacier.
  • Storage Gateways:
    • Gateway Stored Volumes (entire dataset is stored on site and is asynchronously backed up to S3)
    • Gateway Cached Volumes (entire dataset is stored on S3 and the most frequently accessed data is cached on site)
    • Gateway Virtual Tape Library (VTL) (used for backup and uses popular backup applications like NetBackup, Backup Exec, Veam, etc.)
  • Import/Export Disk:
    • Import to EBS
    • Import to S3
    • Import to Glacier
    • Export from S3
  • Import/Export Snowball (only available in North America)
    • Import to S3
    • Export from S3
  • S3 - Securing your buckets:
    • By default, all newly created buckets are private
    • One can setup access control to one's buckets using:
      • Bucket Policies
      • Access Control Lists (can apply to individual objects)
    • S3 buckets can be configured to create access logs, which log all requests made to the bucket
  • S3 - Security & Encryption
    • Encryption types:
      • In transit
        • SSL/TLS
      • Data at rest
        • Server-side encryption
          • S3 Managed Keys (SSE-S3; AES 256bit)
          • AWS Key Management Service, Managed Keys (SSE-KMS). Provides an audit trail
          • Server-side encryption with customer provided keys (SSE-C)
      • Clide-side encryption
  • S3 Static Website:
    • http://foobar.s3-website-us-west-2.amazonaws.com/index.html (a static website link)
    • https://s3-us-west-2.amazonaws.com/foobar/demo.jpeg (not a static website link)
    • Static websites are always HTTP (not HTTPS, for now)
  • S3 Cross Origin Resource Sharing (CORS)
    • One needs to enable CORS for one S3 bucket to reference objects in another S3 bucket
  • S3 - Versioning
    • Stores all versions of an object (including all writes, including deleting the object)
    • Great backup tool
    • Once enabled, versioning cannot be disabled, only suspended
    • Integrates with Lifecycle rules
    • Versioning's MFA (Multi-Factor Authentication) delete capability, which uses MFA, can be used to provide an additional layer of security.
    • Cross Region Replication, requires versioning enabled on the source bucket.
  • S3 - Cross Region Replication
    • Versioning must be enabled on both the source and destination buckets.
    • Regions must be unique
    • Files in an existing bucket are not replicated automatically. All subsequent updated (or new) files will be replicated automatically (including all versions of the object).
    • As of January 2017, you cannot replicate to multiple buckets or use daisy chaining.
    • Delete markers are replicated.
    • Deleting individual versions or delete markers will not be replicated.
  • S3 - Lifecyle Management
    • Transition objects to Infrequent Access Storage Class (must wait a minimum of 30 days from initial upload for the object to transition to the new storage class; minimum object size 128KB) or Glacier Storage Class after x amount of days.
    • Infrequent retrieval: ~milliseconds
    • Glacier retrieval: 3 - 5 hours
    • With Versioning: Transition object versions as well (including deleting old/current versions)
    • Can not use Reduced Durability Storage Class with Lifecycle Management
    • Ability to permanently delete
    • As of January 2017, Glacier is not available for the Singapore and São Paulo regions
  • CloudFront (a content delivery network {CDN})
    • AWS Global CDN infrastructure
    • Edge Location: this is the location where content will be cached. This is different from an AWS Region/AZ (over 50 Edge Locations, as of April 2016)
    • Origin: This is the origin of all the files/objects that the CDN will distribute. This can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route53.
      • Possible to have multiple origin paths in the same distribution
    • Distribution: This is the name given to the CDN, which consists of a collection of Edge Locations.
    • Edge location are not just READ only, one can write to them as well (i.e., PUT and object to them)
    • Objects are cached for the life of the TTL (Time To Live)
    • One can clear the cache of an object stored on an Edge Location before the TTL expires, but one will be charged for that service. Create an "invalidation" request to clear the cache of a given object.
    • Restrict viewer access by using either Signed URLs or Signed Cookies (e.g., only allow paying users to view your content)
    • Allows for reo-restricting access (i.e., whitelist or blacklist countries)
  • S3 Transfer Acceleration:
    • Enabled fast, easy, and secure transfers of files over long distances between your end users and an S3 bucket. Transfer Acceleration takes advantage of CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.
    • Instead of uploading object directly to an S3 bucket in a given region, one can upload an object directly to the nearest Edge Location and AWS will then transfer the object to the S3 bucket.
    • Example URL: <bucket_name>.s3-accelerate.amazonaws.com
    • See: Amazon S3 Transfer Acceleration - Speed Comparison
  • S3 Pricing (i.e., one is charged for the following):
    • Storage
    • Requests
    • Storage Management Pricing
    • Data Transfer Pricing (e.g., replication)
    • Transfer Acceleration
S3 - Storage tiers/classes
Standard Standard - Infrequent Access Reduced Redundancy Storage Glacier
Durability 99.999999999% 99.999999999% 99.99% 99.999999999%
Availability 99.99% 99.9% 99.99% N/A
Concurrent facility fault tolerance 2 2 1  ?
SSL support Yes Yes Yes  ?
Minimum object size N/A 128 KB  ? N/A
Minimum storage duration N/A 30 days  ? 90 days
Retrieval fee N/A per GB retrieved  ? per GB retrieved
First byte latency milliseconds milliseconds milliseconds select minutes or hours
Lifecycle management policies Yes Yes Yes Yes


AWS Storage Gateway

The AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

The AWS Storage Gateway's software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacentre. The Storage Gateway supports VMware ESXi Hypervisor, Microsoft Hyper-V Hypervisor, or EC2 instance. Once you have install your gateway and associated it with your AWS account through the activation process, you can use the AWS Management Console to create the storage gateway option that is right for you.

Types of Storage Gateways:

  1. File Gateway (NFS). Store flatfiles in S3
    Files are stored as objects in your S3 buckets, accessed through a Network File System (NFS) mount point. Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object associated with the file. Once objects are transferred to S3, they can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication apply directly to objects stored in your bucket.
  2. Volumes Gateway (iSCSI). A Virtual HDD.
    The volume interface presents your applications with disk volumes using the iSCSI block protocol.
    Data written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as AWS EBS snapshots.
    Snapshots are incremental backups that capture only changed blocks. All snapshot storage is also compressed to minimize your storage charges.
    • Stored Volumes
    Cached Volumes
  3. Tape Gateway (VTL). Backup/archiving solution.

External links