Difference between revisions of "Google Cloud Platform"

From Christoph's Personal Wiki
Jump to: navigation, search
(Interacting with GCP)
(Compute Engine)
Line 215: Line 215:
 
* You can experiment via ''[https://developers.google.com/apis-explorer/ APIs Explorer]''
 
* You can experiment via ''[https://developers.google.com/apis-explorer/ APIs Explorer]''
  
==Compute Engine==
+
==Google Compute==
 +
 
 +
There are multiple Compute options in GCP for hosting your applications, where "option" is the method of hosting:
 +
* Google Compute Engine (GCE)
 +
* Google Container Engine (deprecated)
 +
* Google Kubernetes Engine (GKE)
 +
* Google App Engine (GAE)
 +
* Google Cloud Functions
 +
 
 +
In the above list, each option is ordered from "highly customizable" (GCE) to "highly managed" (Google Cloud Functions).
 +
 
 +
Each option can take advantage of the rest of the GCP services. E.g.,
 +
* Storage
 +
* Networking
 +
* Big Data
 +
* Security
 +
 
 +
===Google Compute Engine (GCE)===
 +
 
 +
; Google Compute Engine (GCE)
 +
 
 +
* Infrastructure as a Service (IaaS)
 +
* Virtual Machines (VMs), aka "instances"
  
 
; Connecting to a VM
 
; Connecting to a VM
Line 241: Line 263:
 
; Preemptible VMs
 
; Preemptible VMs
 
Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours. If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Google Compute Engine costs significantly.
 
Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours. If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Google Compute Engine costs significantly.
 +
 +
===Google App Engine (GAE)===
 +
 +
* Platform as a Service (PaaS)
 +
* Developers can focus on writing code, while GCP handles the rest
 +
* Build scalable web applications and mobile backends
 +
* It is a '''managed''' service
 +
** You never touch the underlying infrastructure
 +
** Deployment, maintenance, and scalability are handled for you
 +
** Reduces operational overhead
 +
* There are two environments available in GAE:
 +
*# ''Standard'': supports Python, PHP, Go, and Java
 +
*# ''Flexible'': (natively) supports Java 8, Servlet 3.1, Jetty 8, Python 2.7/3.5, Node.js, Ruby, PHP, .NET core, and Go (plus other runtimes, if using a custom Docker image)
 +
 +
===Google Cloud Functions===
 +
 +
* Serverless environment for building and connecting cloud services
 +
* Event-driven
 +
** Function executes as a "trigger" in response to a cloud-based event
 +
** Simple, single-purpose functions
 +
* Write function code in either Node.js or Python 3
 +
* Example: A file is uploaded to Cloud Storage (event), function executes in response to event (trigger)
 +
* Easier and less expensive than provisioning a server to watch for events
 +
* Not available in all Regions. As of January 2019, available in:
 +
** <code>us-central1</code> (Iowa, USA)
 +
** <code>us-east1</code> (South Carolina, USA)
 +
** <code>europe-west1</code> (Belgium)
 +
** <code>asia-northeast1</code> (Tokyo, Japan)
  
 
==Command Line Interface (CLI)==
 
==Command Line Interface (CLI)==

Revision as of 21:38, 11 January 2019

Google Cloud Platform (GCP) is a cloud computing service by Google that offers hosting on the same supporting infrastructure that Google uses internally for end-user products like Gmail, Google Search, Maps, and YouTube.

Overview

  • Google Cloud Platform (GCP) high-level view:
    • Compute
      • App Engine
      • Kubernetes Engine (see Kubernetes)
      • Compute Engine
    • Storage
      • BigTable
      • Cloud Storage
      • Cloud SQL
      • Cloud Datastore
    • Networking
      • VPCs
      • Load Balancers
      • Cloud DNS
      • Cloud CDN
    • Stackdriver
      • Monitoring
      • Logging
    • Big Data
      • BigQuery
      • Pub/Sub
      • Dataflow
      • Dataproc
      • Datalab
    • Artificial Intelligence
      • Vision API
      • Speech API
      • Translate API
      • Machine Learning
Examples
  • Google Compute Engine – IaaS service providing virtual machines similar to Amazon EC2.
  • Google App Engine – PaaS service for directly hosting applications similar to AWS Elastic Beanstalk.
  • BigTable – IaaS service providing map reduce services. Similar to Hadoop.
  • BigQuery – IaaS service providing Columnar database. Similar to Amazon Redshift.
  • Google Cloud Functions – FaaS service allowing functions to be triggered by events without developer resource management similar to Amazon Lambda or IBM OpenWhisk.
Cloud Regions and Zones
  • Region
    • A Region is specific geographical location where you can run your resources
    • It is a collection of zones
    • Regional resources are available to resources in any zone in the region
    • They are frequently expanding
  • Zone
    • Zones are isolated physical locations within a region
    • Zonal resources are only available in that zone
    • Machines in different zones have no single point of failure

An effective disaster recovery plan would have assets deployed across multiple zones, or even different regions.

Standards, regulations, and certifications
GCP Certifications

Main

Cloud Resource Hierarchy
  • Provides a hierarchy of ownership
    • Identity and Access Management (IAM)
  • Provides "attach" points and inheritance for access control and organization policies
  • Hierarchy overview
    • Organization (not applicable to individual accounts)
    • Projects
    • Resources
Projects
  • Core organizational component of GCP
  • Controls access to resources (who has access to what)
  • Projects are where you create, enable, and use all GCP services
    • Per project basis
    • Permissions
    • Billing
    • APIs
    • Etc.
  • Projects have three identifying attributes:
    1. Project Name (user-friendly name)
    2. Project ID (aka Application ID; must be unique across GCP)
    3. Project Number (used in various places for identifying resources that belong to specific projects. For example, service account access names)

Identity and Access Management (IAM)

  • Who can do what on which resource
    • Members (who) are granted permissions and roles (what) to GCP services (resource) using the principle of least privilege
  • IAM -> Policy -> Roles + Identities
  • See predefined roles
  • Members (the "who")
    • Can be either a person or a service account
    • People via:
      • Google account
      • Google group (e.g., dev.team@thecompany.com)
      • G Suite Domain
      • Cloud Identity (organization domain that is not a Google domain/account)
    • Service account
      • Special type of Google account that belongs to your application, not an end user
      • Identity for carrying out server-to-server interactions in a project (e.g., local server back application writing data to Cloud Storage)
      • Identified with an email address:
        • <project_number>@developer.gserviceaccount.com
        • <project_id>@developer.gserviceaccount.com
      • Application access
  • Roles (the "what")
    • A collection of permissions to give access to a given resource
    • Permissions are represented as: <service>.<resource>.<verb> (e.g., compute.instances.delete)
    • Permissions vs. roles
      • Users are not directly assigned permissions, but are assigned roles, which contain a collection of permissions:
      Role                 List of permissions

                           compute.instances.delete
                           compute.instances.get
compute.instanceAdmin ---> compute.instances.list
                           compute.instances.setMachineType
                           compute.instances.start
                           compute.instances.stop
Primitive vs. Predefined (aka curated) Roles
  • Primitive Roles
    • Historically available GCP roles before Cloud IAM was implemented
    • Applied at Project-level
    • Broad roles:
    1. Viewer: read only actions that preserve state (i.e., cannot make changes)
    2. Editor: same as above + can modify state
    3. Owner: same as above + can manage access to project and all project resources + can setup project billing
  • When to choose Primitive Roles
    • When the GCP service does not provide a predefined role
    • When you only need broad permissions for a project
    • When you want to allow a member to modify permissions for a project
    • When you work in a small team where the team members do not need granular permissions
  • Predefined ("Curated") Roles
    • Provides much more granular access (e.g., prevent unwanted access to other resources)
    • Granted at the resource-level
    • Example: App Engine Admin (full access to only App Engine resources)
    • Multiple predefined roles can be given to individual users
IAM Policy
  • A collection of statements that define who has what type of access
  • A full list of roles granted to a member for a resource
  • IAM Policy hierarchy
    • Resource access is organized hierarchically, from the Organization down to the Resource(s)
    • Organization -> Project -> Resource(s) — parent/child format
    • Each child has exactly one parent
    • Children inherit parent roles
    • Parent policies overrule restrictive child policies

Interacting with GCP

  • There are three methods of interacting with GCP:
    1. Cloud console (web interface)
    2. Google Cloud SDK
    3. RESTful API
Google Cloud SDK
  • Command line interface (CLI) tools for managing resources and applications on GCP
  • Includes:
    • gcloud — many common GCP tasks
    • gsutil — interact with Cloud Storage
    • bq — interact with data in BigQuery
  • Can also be installed locally as a Docker image or run from within Cloud Shell (via the UI)
  • install
Cloud Shell
  • Interactive web-based shell environment for GCP, accessed from a web console
  • Easy to manage resources without having to install the Google Cloud SDK locally.
  • Includes:
    • A temporary Compute Engine virtual machine/instance
    • CLI access to the instance from a web browser
    • 5 GB of persistent disk storage
    • Pre-installed Google Cloud SDK and other tools
    • Language support for: Python, Go, Node.js, PHP, Ruby, and Java
    • Web preview functionality (especially useful for App Engine)
    • Built-in authorization for access to GCP projects and resources
  • Limitations
    • 1 hour time out for inactivity
      • Machine will terminate/self-delete
      • $HOME directory contents will be preserved for a new session
    • Direct interactive use only
      • Not for running high computational/network workloads
      • If in violation of GCP terms of use, session can be terminated without notice
    • For long periods of inactivity, home disk may be recycled (with advance notice via email)
      • If you need longer inactive period, consider either locally installed SDK or use Cloud Storage for long-term storage
RESTful APIs
  • "Intended for software developers" ~ Google
  • Programmatic access to GCP resources
    • Typically uses JSON as an interchange format
    • Uses OAuth 2.0 for authentication and authorization
  • Enabled via the GCP Console
  • Most APIs have daily quotas, which can be increased upon request (to Google Support)
  • You can experiment via APIs Explorer

Google Compute

There are multiple Compute options in GCP for hosting your applications, where "option" is the method of hosting:

  • Google Compute Engine (GCE)
  • Google Container Engine (deprecated)
  • Google Kubernetes Engine (GKE)
  • Google App Engine (GAE)
  • Google Cloud Functions

In the above list, each option is ordered from "highly customizable" (GCE) to "highly managed" (Google Cloud Functions).

Each option can take advantage of the rest of the GCP services. E.g.,

  • Storage
  • Networking
  • Big Data
  • Security

Google Compute Engine (GCE)

Google Compute Engine (GCE)
  • Infrastructure as a Service (IaaS)
  • Virtual Machines (VMs), aka "instances"
Connecting to a VM
$ gcloud compute --project "my-project-123456" ssh --zone "us-west1-b" "my-vm"

The above command will create SSH keys (stored in ~/.ssh by default). After that, you can use the private key to SSH into your VM:

$ ssh -i ~/.ssh/google_compute_engine username@x.x.x.x
Compute Engine Metadata
  • The project metadata URL is: http://metadata.google.internal/computeMetadata/v1/project/
  • The instance metadata URL is: http://metadata.google.internal/computeMetadata/v1/instance/

Each URL returns a set of entries that can be appended to the URL. Project settings contain info (e.g., project ID). Instance settings contain info on disks, hostname, machine type, etc.

One can also set one's own values so that one can use them in code on the VM.

$ curl -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"
{"access_token":"aa00...","expires_in":3599,"token_type":"Bearer"}

$ curl -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/project/"
attributes/
numeric-project-id
project-id
Preemptible VMs

Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours. If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Google Compute Engine costs significantly.

Google App Engine (GAE)

  • Platform as a Service (PaaS)
  • Developers can focus on writing code, while GCP handles the rest
  • Build scalable web applications and mobile backends
  • It is a managed service
    • You never touch the underlying infrastructure
    • Deployment, maintenance, and scalability are handled for you
    • Reduces operational overhead
  • There are two environments available in GAE:
    1. Standard: supports Python, PHP, Go, and Java
    2. Flexible: (natively) supports Java 8, Servlet 3.1, Jetty 8, Python 2.7/3.5, Node.js, Ruby, PHP, .NET core, and Go (plus other runtimes, if using a custom Docker image)

Google Cloud Functions

  • Serverless environment for building and connecting cloud services
  • Event-driven
    • Function executes as a "trigger" in response to a cloud-based event
    • Simple, single-purpose functions
  • Write function code in either Node.js or Python 3
  • Example: A file is uploaded to Cloud Storage (event), function executes in response to event (trigger)
  • Easier and less expensive than provisioning a server to watch for events
  • Not available in all Regions. As of January 2019, available in:
    • us-central1 (Iowa, USA)
    • us-east1 (South Carolina, USA)
    • europe-west1 (Belgium)
    • asia-northeast1 (Tokyo, Japan)

Command Line Interface (CLI)

The Google Cloud SDK is a set of tools that you can use to manage resources and applications hosted on the Google Cloud Platform (GCP). These include the gcloud, gsutil, and bq command line tools. The gcloud command-line tool is downloaded along with the Cloud SDK.

Configuration and services

$ gcloud components list
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                  Components                                                 │
├───────────────┬──────────────────────────────────────────────────────┬──────────────────────────┬───────────┤
│     Status    │                         Name                         │            ID            │    Size   │
├───────────────┼──────────────────────────────────────────────────────┼──────────────────────────┼───────────┤
│ Not Installed │ App Engine Go Extensions                             │ app-engine-go            │  56.6 MiB │
│ Not Installed │ Cloud Bigtable Command Line Tool                     │ cbt                      │   6.4 MiB │
│ Not Installed │ Cloud Bigtable Emulator                              │ bigtable                 │   5.6 MiB │
│ Not Installed │ Cloud Datalab Command Line Tool                      │ datalab                  │   < 1 MiB │
│ Not Installed │ Cloud Datastore Emulator                             │ cloud-datastore-emulator │  17.7 MiB │
│ Not Installed │ Cloud Datastore Emulator (Legacy)                    │ gcd-emulator             │  38.1 MiB │
│ Not Installed │ Cloud Firestore Emulator                             │ cloud-firestore-emulator │  27.5 MiB │
│ Not Installed │ Cloud Pub/Sub Emulator                               │ pubsub-emulator          │  33.4 MiB │
│ Not Installed │ Cloud SQL Proxy                                      │ cloud_sql_proxy          │   3.8 MiB │
│ Not Installed │ Emulator Reverse Proxy                               │ emulator-reverse-proxy   │  14.5 MiB │
│ Not Installed │ Google Cloud Build Local Builder                     │ cloud-build-local        │   6.0 MiB │
│ Not Installed │ Google Container Registry's Docker credential helper │ docker-credential-gcr    │   1.8 MiB │
│ Not Installed │ gcloud Alpha Commands                                │ alpha                    │   < 1 MiB │
│ Not Installed │ gcloud Beta Commands                                 │ beta                     │   < 1 MiB │
│ Not Installed │ gcloud app Java Extensions                           │ app-engine-java          │ 107.5 MiB │
│ Not Installed │ gcloud app PHP Extensions                            │ app-engine-php           │           │
│ Not Installed │ gcloud app Python Extensions                         │ app-engine-python        │   6.2 MiB │
│ Not Installed │ gcloud app Python Extensions (Extra Libraries)       │ app-engine-python-extras │  28.5 MiB │
│ Not Installed │ kubectl                                              │ kubectl                  │   < 1 MiB │
│ Installed     │ BigQuery Command Line Tool                           │ bq                       │   < 1 MiB │
│ Installed     │ Cloud SDK Core Libraries                             │ core                     │   9.1 MiB │
│ Installed     │ Cloud Storage Command Line Tool                      │ gsutil                   │   3.5 MiB │
└───────────────┴──────────────────────────────────────────────────────┴──────────────────────────┴───────────┘
  • To install or remove components at your current SDK version [228.0.0], run:
$ gcloud components install COMPONENT_ID
$ gcloud components remove COMPONENT_ID
  • To update your SDK installation to the latest version [228.0.0], run:
$ gcloud components update
  • Initialize gcloud:
$ gcloud init
  • Get current gcloud configuration:
$ gcloud config list
[compute]
region = us-west1
zone = us-west1-a
[core]
account = someone@somewhere.com
disable_usage_reporting = True
project = my-project-223521

Your active configuration is: [default]
  • Get a list of all configurations:
$ gcloud config configurations list
NAME     IS_ACTIVE  ACCOUNT                PROJECT            DEFAULT_ZONE  DEFAULT_REGION
default  True       someone@somewhere.com  my-project-223521  us-west1-a    us-west1
  • Get a list of all (enabled) services:
$ gcloud services list
NAME                                    TITLE
bigquery-json.googleapis.com            BigQuery API
cloudapis.googleapis.com                Google Cloud APIs
clouddebugger.googleapis.com            Stackdriver Debugger API
cloudtrace.googleapis.com               Stackdriver Trace API
compute.googleapis.com                  Compute Engine API
container.googleapis.com                Kubernetes Engine API
containerregistry.googleapis.com        Container Registry API
datastore.googleapis.com                Cloud Datastore API
dns.googleapis.com                      Google Cloud DNS API
logging.googleapis.com                  Stackdriver Logging API
monitoring.googleapis.com               Stackdriver Monitoring API
oslogin.googleapis.com                  Cloud OS Login API
pubsub.googleapis.com                   Cloud Pub/Sub API
servicemanagement.googleapis.com        Service Management API
serviceusage.googleapis.com             Service Usage API
sql-component.googleapis.com            Cloud SQL
stackdriver.googleapis.com              Stackdriver API
stackdriverprovisioning.googleapis.com  Stackdriver Provisioning Service
storage-api.googleapis.com              Google Cloud Storage JSON API
storage-component.googleapis.com        Google Cloud Storage
$ gcloud services list --enabled --sort-by="NAME"
$ gcloud services list --available --sort-by="NAME"
Creating a project
  • Get a list of billing accounts:
$ gcloud beta billing accounts list
ACCOUNT_ID            NAME                OPEN  MASTER_ACCOUNT_ID
000000-000000-000000  My Billing Account  True
  • Create a project:
$ gcloud projects create dev-project-01 --name="dev-project-01" \
    --labels=team=area51
  • Link the above project to a billing account:
$ gcloud beta billing projects link dev-project-01 \
    --billing-account=000000-000000-000000
  • Switch between projects:
$ gcloud config set project ${PROJECT_NAME}
Managing multiple SDK configurations

Note: When you install the SDK, it will setup a default configuration and ask you to assign a project to it (and a default region).

  • Create a new configuration, activate, and switch between configurations:
$ gcloud config configurations create dev
$ gcloud config configurations list
$ gcloud config list
$ gcloud config configurations activate default
$ gcloud config set project dev-project-01
$ gcloud config set account someone@somewhere.com

Compute Engine

Connecting to a VM (via SSH)
  • Google-managed:
$ gcloud compute instances list
$ gcloud compute ssh xtof@dev-server
$ gcloud compute ssh xtof@dev-server --dry-run  # see the actual command
  • Using your own SSH key:
$ ssh-keygen -t rsa -f my-ssh-key -C xtof
$ echo "xtof:$(cat my-ssh-key.pub)" > gcp_keys.txt
$ gcloud compute instances add-metadata dev-server --metadata-from-file ssh-keys=gcp_keys.txt
Snapshots
$ gcloud compute snapshots list
$ gcloud compute disks list
$ gcloud compute disks snapshot dev-server
$ gcloud compute snapshots delete <snapshot_name>
Images
  • Show public and private images (from which we can create instances from):
$ gcloud compute images list
NAME                 PROJECT        FAMILY     DEPRECATED   STATUS
centos-6-v20181210   centos-cloud   centos-6                READY
centos-7-v20181210   centos-cloud   centos-7                READY
...

Kubernetes

Managing a GKE cluster
  • Create a Kubernetes cluster:
$ gcloud beta container --project "gcp-k8s-123456" clusters create "xtof-gcp-k8s" \
   --zone "us-west1-a" \
   --username "admin" \
   --cluster-version "1.11.5-gke.5" \
   --machine-type "n1-standard-1" \
   --image-type "COS" \
   --disk-type "pd-standard" \
   --disk-size "100" \
   --scopes \
     "https://www.googleapis.com/auth/devstorage.read_only",
     "https://www.googleapis.com/auth/logging.write",
     "https://www.googleapis.com/auth/monitoring",
     "https://www.googleapis.com/auth/servicecontrol",
     "https://www.googleapis.com/auth/service.management.readonly",
     "https://www.googleapis.com/auth/trace.append" \
   --num-nodes "3" \
   --enable-stackdriver-kubernetes \
   --no-enable-ip-alias \
   --network "projects/gcp-k8s-123456/global/networks/default" \
   --subnetwork "projects/gcp-k8s-123456/regions/us-west1/subnetworks/default" \
   --addons HorizontalPodAutoscaling,HttpLoadBalancing,KubernetesDashboard,Istio \
   --istio-config auth=NONE \
   --enable-autoupgrade \
   --enable-autorepair
  • Get the Kubernetes credentials:
$ gcloud container clusters get-credentials xtof-gcp-k8s --zone us-west1-a --project gcp-k8s-123456
  • Resize a Kubernetes cluster:
$ gcloud container cluster resize --size=1 --zone=us-west1-a xtof-gcp-k8s
  • Delete the cluster:
$ gcloud container clusters delete --project "gcp-k8s-123456" "xtof-gcp-k8s" --zone "us-west1-a"

Deployment Manager

"Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud Platform resources for you. Write flexible template and configuration files and use them to create deployments that have a variety of Cloud Platform services, such as Google Cloud Storage, Google Compute Engine, and Google Cloud SQL, configured to work together". source

  • Example:
$ gcloud deployment-manager deployments create my-deployment --config my-deployment.yml
$ gcloud deployment-manager deployments update my-deployment --config my-deployment.yml
$ gcloud deployment-manager deployments describe my-deployment

Miscellaneous

$ gcloud config set project <project-name>
$ gcloud config set compute/zone us-west1
$ gcloud config unset compute/zone
$ gcloud iam service-accounts list \
    --filter='displayName:"Compute Engine default service account"' \
    --format='value(email)'
$ gcloud compute networks subnets list
NAME     REGION                   NETWORK  RANGE
default  us-west2                 default  10.168.0.0/20
default  asia-northeast1          default  10.146.0.0/20
default  us-west1                 default  10.138.0.0/20
default  southamerica-east1       default  10.158.0.0/20
default  europe-west4             default  10.164.0.0/20
default  asia-east1               default  10.140.0.0/20
default  europe-north1            default  10.166.0.0/20
default  asia-southeast1          default  10.148.0.0/20
default  us-east4                 default  10.150.0.0/20
default  europe-west1             default  10.132.0.0/20
default  europe-west2             default  10.154.0.0/20
default  europe-west3             default  10.156.0.0/20
default  australia-southeast1     default  10.152.0.0/20
default  asia-south1              default  10.160.0.0/20
default  us-east1                 default  10.142.0.0/20
default  us-central1              default  10.128.0.0/20
default  asia-east2               default  10.170.0.0/20
default  northamerica-northeast1  default  10.162.0.0/20
$ gcloud projects create example-foo-bar-1 --name="Happy project" \
    --labels=type=happy
$ gcloud compute forwarding-rules list \
    --filter='name:"my-app-forwarding-rules"' \
    --format='value(IPAddress)'
x.x.x.x
$ gcloud pubsub topics publish myTopic --message '{"name":"bob"}'
$ gcloud functions logs read

Cloud Storage

Storage Classes
Storage Class Name for APIs and gsutil
Multi-Regional Storage multi_regional
Regional Storage regional
Nearline Storage nearline
Coldline Storage coldline

See: for details


  • Create a bucket:
$ PROJECT_NAME=my-project
$ REGION=us-west1
$ STORAGE_CLASS=regional
$ BUCKET_NAME=xtof-test

# Basic (using defaults):
$ gsutil mb gs://${BUCKET_NAME}

# Advanced (override defaults):
$ gsutil mb -p ${PROJECT_NAME} -c ${STORAGE_CLASS} -l ${REGION} gs://${BUCKET_NAME}

Note: All buckets (and their objects) are private by default.

  • Upload an object to the above bucket:
$ gsutil cp Pictures/foobar.jpg gs://${BUCKET_NAME}
  • Move an object (file) from one bucket to another:
$ gsutil mv gs://${SOURCE_BUCKET} gs://${DESTINATION_BUCKET}
  • List the contents of a bucket:
$ gsutil ls gs://${BUCKET_NAME}     # basic info
$ gsutil ls -l gs://${BUCKET_NAME}  # extended info
Identity and Access Management
  • Get the IAM roles and rules for a given bucket (note: these are the default ones):
$ gsutil iam get gs://${BUCKET_NAME}
{
  "bindings": [
    {
      "members": [
        "projectEditor:my-project-123456", 
        "projectOwner:my-project-123456"
      ], 
      "role": "roles/storage.legacyBucketOwner"
    }, 
    {
      "members": [
        "projectViewer:my-project-123456"
      ], 
      "role": "roles/storage.legacyBucketReader"
    }
  ], 
  "etag": "CAE="
}
Lifecycle Management
  • Find all objects in a given bucket older than 2 days (i.e., when they were uploaded to the bucket or last modified) and convert them from "regional" to "nearline" storage class:
$ cat << EOF > lifecycle.json
{
  "lifecycle": {
    "rule": [
      {
        "action": {
          "type": "SetStorageClass",
          "storageClass": "NEARLINE"
        },
        "condition": {
          "age": 2,
          "matchesStorageClass": [
            "REGIONAL"
          ]
        }
      }
    ]
  }
}
EOF

$ gsutil lifecycle set lifecycle.json gs://${BUCKET_NAME}/
Signed-URLs

First, create a Service Account, with just enough privileges to modify Cloud Storage, and add and download the assigned key.

$ gsutil cp test.txt gs://xtof-sandbox/
$ gsutil signurl -d 3m key.json gs://xtof-sandbox/test.txt

The above will return a signed-URL (it will look something like https://storage.googleapis.com/xtof-sandbox/test.txt?x-goog-signature=23asd...), which you can send to users and will only be valid for 3 minutes. After 3 minutes, they will get an "ExpiredToken" error.

GCP vs. AWS

Note: All of the following are as of February 2017.

  • Compute
    • Compute Engine vs. EC2
    • App Engine vs. Elastic Beanstalk
    • Container Engine vs. EC2
    • Container Registry vs. ECR
    • Cloud Functions vs. Lambda
  • Identity & Security
    • Cloud IAM vs. IAM
    • Cloud Resource Manager vs. n/a
    • Cloud Security Scanner vs. Inspector
    • Cloud Platform Security vs. n/a
  • Networking
    • Cloud Virtual Network vs. VPC
    • Cloud Load Balancing vs. ELB
    • Cloud CDN vs. CloudFront
    • Cloud Interconnect vs. Direct Connect
    • Cloud DNS vs. Route53
  • Storage and Databases
    • Cloud Storage vs. S3
    • Cloud Bigtable vs. DynamoDB
    • Cloud Datastore vs. SimpleDB
    • Cloud SQL vs. RDS
    • Persistent Disk vs. EBS
  • Big Data
    • BigQuery vs. Redshift
    • Cloud Dataflow vs. EMR
    • Cloud Dataproc vs. EMR
    • Cloud Datalab vs. n/a
    • Cloud Pub/Sub vs. Kinesis
    • Genomics vs. n/a
  • Machine Learning
    • Cloud Machine Learning vs. Machine Learning
    • Vision API vs. Rekognition
    • Speech API vs. Polly
    • Natural Language API vs. Lex
    • Translation API vs. n/a
    • Jobs API vs. n/a
  • Compute Services (GCP vs. AWS):
    • Infrastructure as a Service (IaaS): Compute Engine vs. EC2
    • Platform as a Service (PaaS): App Engine vs. Elastic Beanstalk
    • Containers as a Service: Container Engine vs. EC2
Compute IaaS comparison
Feature Amazon EC2 Compute Engine
Virtual machines Instances Instances
Machine images Amazon Machine Image (AMI) Image
Temporary virtual machines Spot instances Preemptible VMs
Firewall Security groups Compute Engine firewall rules
Automatic instance scaling Auto Scaling Compute Engine autoscaler
Local attached disk Ephemeral disk Local SSD
VM import Supported formats: RAW, OVA, VMDK, VHD Supported formats: AMI, RAW, VirtualBox
Deployment locality Zonal Zonal


Networking services comparison
Networking Load Balancing CDN On-premises connection DNS
AWS VPC ELB CloudFront Direct Connect Route53
GCP Cloud VirtualNetwork1 Cloud LoadBalancing2 Cloud CDN Cloud InterConnect Cloud DNS

1GCP allows for 802.1q tagging (aka VLAN taggin). AWS does not.
2GCP allows for cross-region load balancing. AWS does not.


Storage services comparison
Object Block Cold File
AWS S3 EBS1 Glacier EFS
GCP Cloud Storage Compute Engine Persistent Disks2 Cloud Storage Nearline ZFS/Avere


1An EBS volume can be attached to only one EC2 instance at a time. Can attach up to 40 disk volumes to a Linux instance. Available in only one region by default.
2GCP Persistent Disks in read-only mode can be attached to multiple instances simultaneously. Can attach up to 128 disk volumes. Snapshots are global and can be used in any region without additional operations or charges.


Database services comparison
RDMS NoSQL (key-value) NoSQL (indexed)
AWS RDS DynamoDB DynamoDB
GCP Cloud SQL1 Cloud Bigtable2 Cloud Datastore


1MySQL only.
2100 MB maximum item size. Does not support secondary indexes.


Big Data services comparison
Streaming data ingestion Streaming data processing Batch data processing Analytics
AWS Kinesis Kinesis EMR Redshift
GCP Cloud Pub/Sub Cloud Dataflow Cloud Dataflow / Cloud Dataproc BigQuery


Cloud Pub/Sub 
GCPs offering for data streaming and message queue. It allows for secure communication between applications and can also serve as a de-coupling method (a good way to scale).
Dataflow 
GCPs managed service offering for batch and streaming data processing. Apache Beam under-the-hood.
Dataproc 
GCPs offering for data processing using Apache Hadoop and Apache Spark. It is a massively parallel data processing and transformation engine.
Supported services: MapReduce, Apache Hive, Apache Pig, Apache Spark, Spark SQL, PySpark, and support for parallel jobs with YARN.
BigQuery 
GCPs offering for a fully managed, massive data warehousing and analytics engine, allowing for data analytics using SQL.
Application services comparison
Messaging
AWS SNS
GCP Cloud Pub/Sub


Cloud Pub/Sub (publisher/subscriber)
Management services comparison
Monitoring Deployment (IaC)
AWS CloudWatch CloudFormation
GCP Stackdriver Deployment Manager


See also

External links