AWS/S3

From Christoph's Personal Wiki
Revision as of 22:30, 30 January 2017 by Christoph (Talk | contribs) (AWS Snowball)

Jump to: navigation, search

Amazon Simple Storage Service (S3), provides developers and IT teams with secure, durable, highly-scalable cloud storage. Amazon S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web. With Amazon S3, you pay only for the storage you actually use. There is no minimum fee and no setup cost.

Features

  • Simple key-value store:
    • key = name of the object;
    • value = the actual data (made up of a sequence of bytes);
    • version ID (important for versioning); and
    • metadata (data about the data you are storing)
    • Sub-resources:
      • Access Control Lists (ACLs)
      • Torrent (support for the bittorrent protocol)
  • S3 bucket URL: https://s3-<region>.amazonaws.com/<bucket_name> (e.g., https://s3-us-west-1.amazonaws.com/foobar)
  • SLA:
    • availability: 99.99% (2 nines)
    • durability: 99.99999999999% (11 nines)
  • Files can be from 1 byte to 5 TB in size (split files larger than 5 GB into pieces to upload)
    • Note that the largest size file you can transfer to S3 using a PUT operation is 5 GB
  • Unlimited storage
  • Files/objects are stored in "buckets"
  • S3 is a universal namespace (i.e., bucket names must be unique globally; think domain names)
  • Read-after-Write consistency for PUTs of new Objects
  • Eventual Consistency for overwrite PUTs and DELETEs (can take some time to propagate)
  • Lifecyle management
  • Versioning
  • Encryption (default Advanced Encryption Standard (AES) 256bit)
    • In transit (SSL/TLS)
    • At Rest:
      • Server Side Encryption (SSE):
        • S3 Managed Keys (SSE-S3; 256bit);
        • AWS Key Management Service, Managed Keys (SSE-KMS)
        • Server Siide Encryption with Customer Provided Keys (SSE-C)
    • Client Side Encryption (user encypts data on their local machine and then upload to AWS S3)
  • Secure your data with Bucket Policies and ACLs
  • Storage tiers/classes:
    • S3 99.99% (durable, immediately available, frequently accessed): stored across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
    • S3 IA (Infrequently Accessed) (durable, immediately available, infrequently accessed): for data that is accessed less frequently, but requires rapid access when needed. Lower fee than S3, but you are charged a retrieval fee
    • S3 Reduced Redundancy Storage (RRS): designed to provide 99.99% availability/durability of objects over a given year (for objects where it is not critical if they are lost; e.g., thumbnails of images, as they can be easily regenerated). concurrent facility fault tollerance = 1
    • Glacier: Very cheap, but used for archival only. It takes 3-5 hours to restore from Glacier.
  • Storage Gateways:
    • Gateway Stored Volumes (entire dataset is stored on site and is asynchronously backed up to S3)
    • Gateway Cached Volumes (entire dataset is stored on S3 and the most frequently accessed data is cached on site)
    • Gateway Virtual Tape Library (VTL) (used for backup and uses popular backup applications like NetBackup, Backup Exec, Veam, etc.)
  • Import/Export Disk:
    • Import to EBS
    • Import to S3
    • Import to Glacier
    • Export from S3
  • Import/Export Snowball (only available in North America)
    • Import to S3
    • Export from S3
  • S3 - Securing your buckets:
    • By default, all newly created buckets are private
    • One can setup access control to one's buckets using:
      • Bucket Policies
      • Access Control Lists (can apply to individual objects)
    • S3 buckets can be configured to create access logs, which log all requests made to the bucket
  • S3 - Security & Encryption
    • Encryption types:
      • In transit
        • SSL/TLS
      • Data at rest
        • Server-side encryption
          • S3 Managed Keys (SSE-S3; AES 256bit)
          • AWS Key Management Service, Managed Keys (SSE-KMS). Provides an audit trail
          • Server-side encryption with customer provided keys (SSE-C)
      • Clide-side encryption
  • S3 Static Website:
    • http://foobar.s3-website-us-west-2.amazonaws.com/index.html (a static website link)
    • https://s3-us-west-2.amazonaws.com/foobar/demo.jpeg (not a static website link)
    • Static websites are always HTTP (not HTTPS, for now)
  • S3 Cross Origin Resource Sharing (CORS)
    • One needs to enable CORS for one S3 bucket to reference objects in another S3 bucket
  • S3 - Versioning
    • Stores all versions of an object (including all writes, including deleting the object)
    • Great backup tool
    • Once enabled, versioning cannot be disabled, only suspended
    • Integrates with Lifecycle rules
    • Versioning's MFA (Multi-Factor Authentication) delete capability, which uses MFA, can be used to provide an additional layer of security.
    • Cross Region Replication, requires versioning enabled on the source bucket.
  • S3 - Cross Region Replication
    • Versioning must be enabled on both the source and destination buckets.
    • Regions must be unique
    • Files in an existing bucket are not replicated automatically. All subsequent updated (or new) files will be replicated automatically (including all versions of the object).
    • As of January 2017, you cannot replicate to multiple buckets or use daisy chaining.
    • Delete markers are replicated.
    • Deleting individual versions or delete markers will not be replicated.
  • S3 - Lifecyle Management
    • Transition objects to Infrequent Access Storage Class (must wait a minimum of 30 days from initial upload for the object to transition to the new storage class; minimum object size 128KB) or Glacier Storage Class after x amount of days.
    • Infrequent retrieval: ~milliseconds
    • Glacier retrieval: 3 - 5 hours
    • With Versioning: Transition object versions as well (including deleting old/current versions)
    • Can not use Reduced Durability Storage Class with Lifecycle Management
    • Ability to permanently delete
    • As of January 2017, Glacier is not available for the Singapore and São Paulo regions
  • CloudFront (a content delivery network {CDN})
    • AWS Global CDN infrastructure
    • Edge Location: this is the location where content will be cached. This is different from an AWS Region/AZ (over 50 Edge Locations, as of April 2016)
    • Origin: This is the origin of all the files/objects that the CDN will distribute. This can be an S3 Bucket, an EC2 Instance, an Elastic Load Balancer, or Route53.
      • Possible to have multiple origin paths in the same distribution
    • Distribution: This is the name given to the CDN, which consists of a collection of Edge Locations.
    • Edge location are not just READ only, one can write to them as well (i.e., PUT and object to them)
    • Objects are cached for the life of the TTL (Time To Live)
    • One can clear the cache of an object stored on an Edge Location before the TTL expires, but one will be charged for that service. Create an "invalidation" request to clear the cache of a given object.
    • Restrict viewer access by using either Signed URLs or Signed Cookies (e.g., only allow paying users to view your content)
    • Allows for reo-restricting access (i.e., whitelist or blacklist countries)
  • S3 Transfer Acceleration:
    • Enabled fast, easy, and secure transfers of files over long distances between your end users and an S3 bucket. Transfer Acceleration takes advantage of CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.
    • Instead of uploading object directly to an S3 bucket in a given region, one can upload an object directly to the nearest Edge Location and AWS will then transfer the object to the S3 bucket.
    • Example URL: <bucket_name>.s3-accelerate.amazonaws.com
    • See: Amazon S3 Transfer Acceleration - Speed Comparison
  • S3 Pricing (i.e., one is charged for the following):
    • Storage
    • Requests
    • Storage Management Pricing
    • Data Transfer Pricing (e.g., replication)
    • Transfer Acceleration
S3 - Storage tiers/classes
Standard Standard - Infrequent Access Reduced Redundancy Storage Glacier
Durability 99.999999999% 99.999999999% 99.99% 99.999999999%
Availability 99.99% 99.9% 99.99% N/A
Concurrent facility fault tolerance 2 2 1  ?
SSL support Yes Yes Yes  ?
Minimum object size N/A 128 KB  ? N/A
Minimum storage duration N/A 30 days  ? 90 days
Retrieval fee N/A per GB retrieved  ? per GB retrieved
First byte latency milliseconds milliseconds milliseconds select minutes or hours
Lifecycle management policies Yes Yes Yes Yes


AWS Storage Gateway

The AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

The AWS Storage Gateway's software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacentre. The Storage Gateway supports VMware ESXi Hypervisor, Microsoft Hyper-V Hypervisor, or EC2 instance. Once you have install your gateway and associated it with your AWS account through the activation process, you can use the AWS Management Console to create the storage gateway option that is right for you.

Types of Storage Gateways:

  1. File Gateway (NFS). Store flatfiles in S3.
    Files are stored as objects in your S3 buckets, accessed through a Network File System (NFS) mount point. Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object associated with the file. Once objects are transferred to S3, they can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication apply directly to objects stored in your bucket.
  2. Volumes Gateway (iSCSI). A Virtual HDD.
    The volume interface presents your applications with disk volumes using the iSCSI block protocol.
    Data written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as AWS EBS snapshots.
    Snapshots are incremental backups that capture only changed blocks. All snapshot storage is also compressed to minimize your storage charges.
    • Stored Volumes: These let you stored your primary data locally, while asynchronously backing up that data to AWS. Stored Volumes provide your on-premise applications with low-latency access to their entire datasets, while providing durable, off-site backups. You can create storage volumes and mount them as iSCSI devices from your on-premise application servers. Data written to your stored volumes is stored on your on-premise storage hardware. This data is asynchronously backup up to S3 in the form of Amazon Elastic Block Store (EBS) snapshots. 1 GB - 16 TB in size for Stored Volumes.
    • Cached Volumes: These let you use S3 as your primary data storage while retaining frequently accessed data locally in your storage gateway. Cached volumes minimize the need to scale your on-premise storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. You can create storage volumes up to 32 TiB in size and attach to them as iSCSI devices from your on-premise application servers. Your gateway stores data that you write to these volumes in S3 and retains recently read data in your on-premise storage gateway's cache and upload buffer storage. 1 GB - 32 TB in size for Cached Volumes.
  3. Tape Gateway (aka Gateway Virtual Tape Library). Backup/archiving solution.
    Offers a durable, cost-effective solution to archive your data in the AWS Cloud. The Virtual Tape Library (VTL) interface it provides lets you leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway. Each tape gateway is pre-configured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices. You add tape cartridges as you need to archive your data. Supported by NetBackup, Backup Exec, Veam, etc. Virtual Tape backed by Amazon Glacier.

AWS Snowball

Prior to AWS Snowball, there was AWS Import/Export Disk, which accelerated the moving of large amounts of data into and out of the AWS Cloud using portable storage devices for transport. AWS Import/Export Disk transfers your data directly onto and off of storage devices using Amazon's high-speed internal network and bypassing the Internet.

AWS Snowball was introduce at re:Invent 2015.

Types of Snowballs:

  • Snowball (re:Invent 2015)
    • petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS S3. Using Snowball addresses common challenges with large-scale data transfers, including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.
    • 80 TB Snowball in all regions. Snowball uses multiple layers of security designed to protect your data, including tamper-resistant enclosures, 256-bit encryption, and an industry-standard Trusted Platform Module (TPM) designed to ensure both security and full chain-of-custody of your data. Once the data transfer job has been processed and verified, AWS performs a software erasure of the Snowball appliance.
  • Snowball Edge (re:Invent 2016)
    • 100TB data transfer device with on-board storage and compute capabilities (e.g., Lambda functions). You can use Snowball Edge to move large amounts of data into and out of AWS, as a temporary storage tier for large local datasets, or to support local workloads in remote of offline locations.
    • Snowball Edge connects to your existing appliances and infrastructure using standard storage interfaces, streamlining the data transfer process and minimizing setup and integration. Snowball Edge can cluster together to form a local storage tier and process your data on-premise, helping to ensure your applications continue to run even when they are not able to access the Cloud.
  • Snowmobile (re:Invent 2016)
    • Exabyte-scale data transfer service using to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck. Snowmobile makes it easy to move massive amounts of data to the Cloud, including video libraries, image repositories, or even a complete data centre migration. Transferring data with Snowmobiles is secure, fast, and cost effective.

External links