AWS/S3

From Christoph's Personal Wiki
Revision as of 00:11, 26 January 2017 by Christoph (Talk | contribs) (Created page with "'''Amazon Simple Storage Service''' ('''S3'''), provides developers and IT teams with secure, durable, highly-scalable cloud storage. Amazon S3 is easy to use object storage,...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Amazon Simple Storage Service (S3), provides developers and IT teams with secure, durable, highly-scalable cloud storage. Amazon S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost.

Features

  • Simple key-value store:
    • key = name of the object;
    • value = the actual data (made up of a sequence of bytes);
    • version ID (important for versioning); and
    • metadata (data about the data you are storing)
    • Sub-resources:
      • Access Control Lists (ACLs)
      • Torrent (support for the bittorrent protocol)
  • S3 bucket URL: https://s3-<region>.amazonaws.com/<bucket_name> (e.g., https://s3-us-west-1.amazonaws.com/foobar)
  • SLA:
    • availability: 99.99% (2 nines)
    • durability: 99.99999999999% (11 nines)
  • Files can be from 1 byte to 5 TB in size (split files larger than 5 GB into pieces to upload)
    • Note that the largest size file you can transfer to S3 using a PUT operation is 5 GB
  • Unlimited storage
  • Files/objects are stored in "buckets"
  • S3 is a universal namespace (i.e., bucket names must be unique globally; think domain names)
  • Read-after-Write consistency for PUTs of new Objects
  • Eventual Consistency for overwrite PUTs and DELETEs (can take some time to propagate)
  • Lifecyle management
  • Versioning
  • Encryption (default Advanced Encryption Standard (AES) 256bit)
    • In transit (SSL/TLS)
    • At Rest:
      • Server Side Encryption (SSE):
        • S3 Managed Keys (SSE-S3; 256bit);
        • AWS Key Management Service, Managed Keys (SSE-KMS)
        • Server Siide Encryption with Customer Provided Keys (SSE-C)
    • Client Side Encryption (user encypts data on their local machine and then upload to AWS S3)
  • Secure your data with Bucket Policies and ACLs
  • Storage tiers/classes:
    • S3 99.99% (durable, immediately available, frequently accessed): stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
    • S3 IA (Infrequently Accessed) (durable, immediately available, infrequently accessed): for data that is accessed less frequently, but requires rapid access when needed. Lower fee than S3, but you are charged a retrieval fee
    • S3 Reduced Redundancy Storage (RRS): designed to provide 99.99% availability/durability of objects over a given year (for objects where it is not critical if they are lost; e.g., thumbnails of images, as they can be easily regenerated). concurrent facility fault tollerance = 1
    • Glacier: Very cheap, but used for archival only. It takes 3-5 hours to restore from Glacier.
  • Storage Gateways:
    • Gateway Stored Volumes (entire dataset is stored on site and is asynchronously backed up to S3)
    • Gateway Cached Volumes (entire dataset is stored on S3 and the most frequently accessed data is cached on site)
    • Gateway Virtual Tape Library (VTL) (used for backup and uses popular backup applications like NetBackup, Backup Exec, Veam, etc.)
  • Import/Export Disk:
    • Import to EBS
    • Import to S3
    • Import to Glacier
    • Export from S3
  • Import/Export Snowball (only available in North America)
    • Import to S3
    • Export from S3
  • S3 Static Website:
    • http://foobar.s3-website-us-west-2.amazonaws.com/index.html (a static website link)
    • https://s3-us-west-2.amazonaws.com/foobar/demo.jpeg (not a static website link)
    • Static websites are always HTTP (not HTTPS, for now)
  • S3 Cross Origin Resource Sharing (CORS)
    • One needs to enable CORS for one S3 bucket to reference objects in another S3 bucket
  • S3 Versioning
    • Stores all versions of an object (including all writes, including deleting the object)
    • Great backup tool
    • Once enabled, versioning cannot be disabled, only suspended
    • Integrates with Lifecycle rules
    • Versioning's MFA (Multi-Factor Authentication) delete capability, which uses MFA, can be used to provide an additional layer of security.
    • Cross Region Replication, requires versioning enabled on the source bucket.
  • S3 Lifecyle Management
    • Transition objects to Infrequent Access Storage Class (must wait a minimum of 30 days from initial upload for the object to transition to the new storage class; minimum object size 128KB) or Glacier Storage Class after x amount of days.
    • Infrequent retrieval: ~milliseconds
    • Glacier retrieval: 3 - 5 hours
    • With Versioning: Transition object versions as well (including deleting old/current versions)
    • Can not use Reduced Durability Storage Class with Lifecycle Management
  • CloudFront (a content delivery network {CDN})
    • AWS Global CDN infrastructure
    • Edge Location: this is the location where content will be cached. This is different from an AWS Region/AZ (over 50 Edge Locations, as of April 2016)
    • Origin: This is the origin of all the files/objects that the CDN will distribute. This can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route53.
      • Possible to have multiple origin paths in the same distribution
    • Distribution: This is the name given to the CDN, which consists of a collection of Edge Locations.
      • Web Distribution: Typically used for websites.
      • RTMP: Used for media streaming (e.g., Adobe Flash)
    • Edge location are not just READ only, one can write to them as well (i.e., PUT and object to them)
    • Objects are cached for the life of the TTL (Time To Live)
    • One can clear the cache of an object stored on an Edge Location before the TTL expires, but one will be charged for that service.
  • S3 Transfer Acceleration:
    • Enabled fast, easy, and secure transfers of files over long distances between your end users and an S3 bucket. Transfer Acceleration takes advantage of CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.
    • Instead of uploading object directly to an S3 bucket in a given region, one can upload an object directly to the nearest Edge Location and AWS will then transfer the object to the S3 bucket.
    • Example URL: <bucket_name>.s3-accelerate.amazonaws.com
    • See: Amazon S3 Transfer Acceleration - Speed Comparison
  • S3 Pricing (i.e., one is charged for the following):
    • Storage
    • Requests
    • Storage Management Pricing
    • Data Transfer Pricing (e.g., replication)
    • Transfer Acceleration
S3 - Storage tiers/classes
Standard Standard - Infrequent Access Reduced Redundancy Storage Glacier
Durability 99.999999999% 99.999999999% 99.99% 99.999999999%
Availability 99.99% 99.9% 99.99% N/A
Concurrent facility fault tolerance 2 2 1  ?
SSL support Yes Yes Yes  ?
Minimum object size N/A 128 KB  ? N/A
Minimum storage duration N/A 30 days  ? 90 days
Retrieval fee N/A per GB retrieved  ? per GB retrieved
First byte latency milliseconds milliseconds milliseconds select minutes or hours
Lifecycle management policies Yes Yes Yes Yes


External links