Difference between revisions of "Google Cloud Platform"
(→Identity and Access Management (IAM)) |
(→Cloud Storage) |
||
Line 1,514: | Line 1,514: | ||
The above will return a signed-URL (it will look something like <code><nowiki>https://storage.googleapis.com/xtof-sandbox/test.txt?x-goog-signature=23asd...</nowiki></code>), which you can send to users and will only be valid for 3 minutes. After 3 minutes, they will get an "ExpiredToken" error. | The above will return a signed-URL (it will look something like <code><nowiki>https://storage.googleapis.com/xtof-sandbox/test.txt?x-goog-signature=23asd...</nowiki></code>), which you can send to users and will only be valid for 3 minutes. After 3 minutes, they will get an "ExpiredToken" error. | ||
+ | |||
+ | ===BigQuery=== | ||
+ | |||
+ | <pre> | ||
+ | $ bq query "select string_field_10 as request, count(*) as requestcount from logdata.accesslog group by request order by requestcount desc" | ||
+ | +----------------------------------------+--------------+ | ||
+ | | request | requestcount | | ||
+ | +----------------------------------------+--------------+ | ||
+ | | GET /store HTTP/1.0 | 337293 | | ||
+ | | GET /index.html HTTP/1.0 | 336193 | | ||
+ | | GET /products HTTP/1.0 | 280937 | | ||
+ | | GET /services HTTP/1.0 | 169090 | | ||
+ | | GET /products/desserttoppings HTTP/1.0 | 56580 | | ||
+ | | GET /products/floorwaxes HTTP/1.0 | 56451 | | ||
+ | | GET /careers HTTP/1.0 | 56412 | | ||
+ | | GET /services/turnipwinding HTTP/1.0 | 56401 | | ||
+ | | GET /services/spacetravel HTTP/1.0 | 56176 | | ||
+ | | GET /favicon.ico HTTP/1.0 | 55845 | | ||
+ | +----------------------------------------+--------------+ | ||
+ | </pre> | ||
==GCP vs. AWS== | ==GCP vs. AWS== |
Revision as of 23:58, 7 March 2019
Google Cloud Platform (GCP) is a cloud computing service by Google that offers hosting on the same supporting infrastructure that Google uses internally for end-user products like Gmail, Google Search, Maps, and YouTube.
Contents
Overview
- Google Cloud Platform (GCP) high-level view:
- Compute
- App Engine (PaaS)
- Kubernetes Engine (Hybrid; see Kubernetes)
- Compute Engine (IaaS)
- Cloud Functions
- Storage
- BigTable
- Cloud Storage
- Cloud SQL
- Cloud Spanner
- Cloud Datastore
- Networking
- VPCs
- Load Balancers
- Cloud DNS
- Cloud CDN
- Stackdriver
- Monitoring
- Logging
- Big Data
- BigQuery
- Pub/Sub
- Dataflow
- Dataproc
- Datalab
- Artificial Intelligence
- Natural Language API
- Vision API
- Speech API
- Translate API
- Machine Learning
- Compute
- Examples
- Google Compute Engine – IaaS service providing virtual machines similar to Amazon EC2.
- Google App Engine – PaaS service for directly hosting applications similar to AWS Elastic Beanstalk.
- BigTable – IaaS service providing map reduce services. Similar to Hadoop.
- BigQuery – IaaS service providing Columnar database. Similar to Amazon Redshift.
- Google Cloud Functions – FaaS service allowing functions to be triggered by events without developer resource management similar to Amazon Lambda or IBM OpenWhisk.
- Cloud Regions and Zones
- Region
- A Region is specific geographical location where you can run your resources
- It is a collection of zones
- Regional resources are available to resources in any zone in the region
- They are frequently expanding
- Zone
- Zones are isolated physical locations within a region
- Zonal resources are only available in that zone
- Machines in different zones have no single point of failure
An effective disaster recovery plan would have assets deployed across multiple zones, or even different regions.
- Standards, regulations, and certifications
- SSAE16
- ISO 27001
- ISO 27017
- ISO 27018
- PCI
- HIPAA
- Complete list
- GCP Certifications
- SEE: GCP Training
Main
- Cloud Resource Hierarchy
- Provides a hierarchy of ownership
- Identity and Access Management (IAM)
- Provides "attach" points and inheritance for access control and organization policies
- Hierarchy overview
- Organization (not applicable to individual accounts)
- Projects
- Resources
- Projects
- Core organizational component of GCP
- Controls access to resources (who has access to what)
- Projects are where you create, enable, and use all GCP services
- Per project basis
- Permissions
- Billing
- APIs
- Etc.
- Projects have three identifying attributes:
- Project Name (user-friendly name)
- Project ID (aka Application ID; must be unique across GCP)
- Project Number (used in various places for identifying resources that belong to specific projects. For example, service account access names)
Identity and Access Management (IAM)
- Who can do what on which resource
- Members (who) are granted permissions and roles (what) to GCP services (resource) using the principle of least privilege
- IAM -> Policy -> Roles + Identities
- See predefined roles
- Members (the "who")
- Can be either a person or a service account
- People via:
- Google account
- Google group (e.g., dev.team@thecompany.com)
- G Suite Domain
- Cloud Identity (organization domain that is not a Google domain/account)
- Service account
- Special type of Google account that belongs to your application, not an end user
- Does not user username/password; uses encryption keys
- Identity for carrying out server-to-server interactions in a project (e.g., local server back application writing data to Cloud Storage)
- Identified with an email address:
-
<project_number>@developer.gserviceaccount.com
-
<project_id>@developer.gserviceaccount.com
-
- Application access
- Roles (the "what")
- A collection of permissions to give access to a given resource
- Permissions are represented as:
<service>.<resource>.<verb>
(e.g.,compute.instances.delete
) - Permissions vs. roles
- Users are not directly assigned permissions, but are assigned roles, which contain a collection of permissions:
Role List of permissions compute.instances.delete compute.instances.get compute.instanceAdmin ---> compute.instances.list compute.instances.setMachineType compute.instances.start compute.instances.stop
- Cloud IAM objects
- Organization (created by Google Sales)
- Organization Owners are established at creation (note: always have more than one organization owner, for security purposes).
- Organization Owner assigns the Organization Administrator role from the G Suite Admin Console (Admin is a separate product).
- Organization Administrators manage GCP from the Cloud Console.
- Folders
- Additional grouping mechanism and isolation boundaries between projects (e.g., different departments or teams).
- Folders allow delegation of administration rights.
- Projects
- Members
- Roles
- Resources
- Products
- G Suite Super Admins (are the only Organization Owners)
- Administers a Google-hosted domain
- Creates users, groups
- Controls user membership in groups
- Resource manager roles
- Organization
- Admin: full control over all resources
- Viewer: view access to all resources
- Folder
- Admin: full control over folders
- Creator: browse hierarchy and create folders
- Viewer: view folders and projects below a resource
- Project
- Creator: create new projects (automatic owner) and migrate new projects into organization
- Deleter: delete projects
- Google Cloud Directory Sync (GCDS)
- Synchronizes G Suite accounts to match the user data in existing LDAP or MS Active Directory
- Syncs groups and memberships, not content or settings
- Supports sophisticated rules for custom mapping of users, groups, non-employee contacts, user profiles, aliases, and exceptions
- One-way synchronization from LDAP to directory
- Administer in LDAP, then periodically update to G Suite
- Runs as a utility in your server environment
- Cloud IAM best practices
- Principle of least privileges
- Always apply the minimal access level required.
- Use groups
- If group membership is secure, assign roles to groups and let the G Suite Admins handle membership.
- Always maintain an alternate.
- For high-risk areas, assign roles to individuals directly and forego the convenience of group assignment.
- Control who can change policies and group memberships
- Audit policy changes
- Audit logs record project-level permission changes.
- Additional levels are being added all the time.
- Primitive vs. Predefined (aka curated) vs. Custom Roles
- Primitive Roles
- Historically available GCP roles before Cloud IAM was implemented
- Applied at Project-level
- Broad roles:
- Viewer: read only actions that preserve state (i.e., cannot make changes)
- Editor: same as above + can modify state (e.g., deploy applications, modify code, configure services)
- Owner: same as above + can manage access to project and all project resources (e.g., invite/remove members and delete projects) + can setup project billing
- Billing administrator: manage billing + add/remove administrators
- A project can have multiple owners, editors, viewers, and billing administrators
- When to choose Primitive Roles
- When the GCP service does not provide a predefined role
- When you only need broad permissions for a project
- When you want to allow a member to modify permissions for a project
- When you work in a small team where the team members do not need granular permissions
- Predefined ("Curated") Roles
- Provides much more granular access (e.g., prevent unwanted access to other resources)
- Granted at the resource-level
- Example: App Engine Admin (full access to only App Engine resources)
- Multiple predefined roles can be given to individual users
- Custom Roles
- Can only be used at the project or organization levels (the cannot be used at the folder level)
- If you want to give custom permissions to a Compute Engine VM, use a service account
- IAM Policy
- A collection of statements that define who has what type of access
- A full list of roles granted to a member for a resource
- IAM Policy hierarchy
- Resource access is organized hierarchically, from the Organization down to the Resource(s)
- Organization -> Project -> Resource(s) — parent/child format
- Each child has exactly one parent
- Children inherit parent roles
- Parent policies overrule restrictive child policies
Interacting with GCP
- There are four methods of interacting with GCP:
- Cloud console (web user interface)
- Cloud Shell and Cloud SDK (command-line interface)
- Cloud Console Mobile App (for Android or iOS)
- RESTful API (for custom applications)
- Google Cloud SDK
- Command line interface (CLI) tools for managing resources and applications on GCP
- Includes:
-
gcloud
— many common GCP tasks -
gsutil
— interact with Cloud Storage -
bq
— interact with data in BigQuery
-
- Can also be installed locally as a Docker image or run from within Cloud Shell (via the UI)
- install
- Cloud Shell
- Interactive web-based shell environment for GCP, accessed from a web console
- Easy to manage resources without having to install the Google Cloud SDK locally.
- Includes:
- A temporary Compute Engine virtual machine/instance
- CLI access to the instance from a web browser
- 5 GB of persistent disk storage
- Pre-installed Google Cloud SDK and other tools
- Language support for: Python, Go, Node.js, PHP, Ruby, and Java
- Web preview functionality (especially useful for App Engine)
- Built-in authorization for access to GCP projects and resources
- Limitations
- 1 hour time out for inactivity
- Machine will terminate/self-delete
-
$HOME
directory contents will be preserved for a new session
- Direct interactive use only
- Not for running high computational/network workloads
- If in violation of GCP terms of use, session can be terminated without notice
- For long periods of inactivity, home disk may be recycled (with advance notice via email)
- If you need longer inactive period, consider either locally installed SDK or use Cloud Storage for long-term storage
- 1 hour time out for inactivity
- RESTful APIs
- "Intended for software developers" ~ Google
- Programmatic access to GCP resources
- Typically uses JSON as an interchange format
- Uses OAuth 2.0 for authentication and authorization
- Enabled via the GCP Console
- Most APIs have daily quotas, which can be increased upon request (to Google Support)
- You can experiment via APIs Explorer
- APIs Explorer
- An interactive tool that lets you easily try Google APIs using a browser
- With the APIs Explorer, you can:
- Browse quickly through available APIs and versions
- See methods available for each API and what parameters they support, along with inline documentation
- Execute requests for any method and see responses in real time
- Easily make authenticated and authorized API calls
Cloud Marketplace
- Formerly called "Cloud Launcher"
- A solution marketplace containing pre-packaged, ready-to-deploy solutions
- Some offered by Google
- Others by third-party vendors
Google Compute
There are multiple Compute options in GCP for hosting your applications, where "option" is the method of hosting:
- Google Compute Engine (GCE)
- Google Container Engine (deprecated)
- Google Kubernetes Engine (GKE)
- Google App Engine (GAE)
- Google Cloud Functions
In the above list, each option is ordered from "highly customizable" (GCE) to "highly managed" (Google Cloud Functions).
See: Choosing the right compute option in GCP: a decision tree
Each option can take advantage of the rest of the GCP services. E.g.,
- Storage
- Networking
- Big Data
- Security
Google Compute Engine (GCE)
- Google Compute Engine (GCE)
- Infrastructure as a Service (IaaS)
- Virtual Machines (VMs), aka "instances"
- Per-second billing; sustained use discounts
- High throughput to storage at no extra cost
- Custom machine types: Only pay for the hardware you need/use
- Storage for VMs
- Persistent disks (either standard or SSD)
- Any data save to scratch space (local SSD) will not be saved when the VM is terminated
- Preemtible VM
- Connecting to a VM
$ gcloud compute --project "my-project-123456" ssh --zone "us-west1-b" "my-vm"
The above command will create SSH keys (stored in ~/.ssh
by default). After that, you can use the private key to SSH into your VM:
$ ssh -i ~/.ssh/google_compute_engine username@x.x.x.x
- Compute Engine Metadata
- The project metadata URL is:
http://metadata.google.internal/computeMetadata/v1/project/
- The instance metadata URL is:
http://metadata.google.internal/computeMetadata/v1/instance/
Each URL returns a set of entries that can be appended to the URL. Project settings contain info (e.g., project ID). Instance settings contain info on disks, hostname, machine type, etc.
One can also set one's own values so that one can use them in code on the VM.
$ curl -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" {"access_token":"aa00...","expires_in":3599,"token_type":"Bearer"} $ curl -H "Metadata-Flavor: Google" "http://metadata.google.internal/computeMetadata/v1/project/" attributes/ numeric-project-id project-id
- Preemptible VMs
Preemptible VMs are highly affordable, short-lived compute instances suitable for batch jobs and fault-tolerant workloads. Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours. If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Google Compute Engine costs significantly.
Google App Engine (GAE)
- Platform as a Service (PaaS)
- Developers can focus on writing code, while GCP handles the rest
- Build scalable web applications and mobile backends
- Your code is run as a binary called a "runtime"
- It is a managed service
- You never touch the underlying infrastructure
- Deployment, maintenance, and scalability are handled for you
- Reduces operational overhead
- There are two environments available in GAE:
- Standard:
- Simpler to use
- Finer grain autoscale
- Free daily quota
- Supports specific versions of Python, PHP, Go, and Java
- Flexible:
- (natively) supports Java 8, Servlet 3.1, Jetty 8, Python 2.7/3.5, Node.js, Ruby, PHP, .NET core, and Go (plus other runtimes, if using a custom Docker image)
- Standard:
- App Engine Standard Environment
- Your Standard App Engine Environment applications run in a "sandbox" and have the following constraints:
- No writing to local files (must write to a DB instead for persistent data)
- All requests your application receives have a 60 second time out
- Limits on third-party software
- Example App Engine Standard environment workflow (e.g., for you web app)
- Develop and test the web app locally
- Use the SDK to deploy to App Engine (Project -> App Engine -> App Servers -> Application Instances)
- App Engine automatically scales and reliable serves your web application
- App Engine can access a variety of services using dedicated APIs (e.g., NoSQL, Memcache, task queues, scheduled tasks, search, logs, etc.)
- App Engine Flexible Environment
- Build and deploy containerized apps with a click
- No sandbox constraints
- Can access App Engine resources
- Your apps run inside Docker containers on (managed) Google Compute Engine VMs.
- Their health is monitored and automatically healed
- You choose the geographical region
- Critical backward compatible updates to the VM's OS is automatically applied
Comparison of App Engine environments | ||
---|---|---|
Standard Environment | Flexible Environment | |
Instance startup: | Milliseconds | Minutes |
SSH access | No | Yes (not by default) |
Write to local disk | No | Yes (scratch space; writes are ephemeral) |
Support for 3rd-party binaries | No | Yes |
Network access | Via App Engine Services | Yes |
Pricing model | After free daily quota, pay per instance class, with automatic shutdown | Pay for resource allocation per hour; no automatic shutdown |
Google Cloud Functions
- Serverless environment for building and connecting cloud services
- Event-driven
- Function executes as a "trigger" in response to a cloud-based event (Cloud Storage, Cloud Pub/Sub, or in an HTTP call)
- Simple, single-purpose functions
- Write function code in either Node.js or Python 3
- Example: A file is uploaded to Cloud Storage (event), function executes in response to event (trigger)
- Easier and less expensive than provisioning a server to watch for events
- Not available in all Regions. As of January 2019, available in:
-
us-central1
(Iowa, USA) -
us-east1
(South Carolina, USA) -
europe-west1
(Belgium) -
asia-northeast1
(Tokyo, Japan)
-
Comparing Compute Options
Comparison of Google Compute options | |||||
---|---|---|---|---|---|
Compute Engine | Kubernetes Engine | App Engine Flexible | App Engine Standard | Cloud Functions | |
Service model | IaaS | Hybrid | PaaS | PaaS | Serverless |
Use cases | General computing workloads | Container-based workloads | Web and mobile apps; container-based workloads |
Web and mobile apps | Ephemeral functions responding to events |
Toward managed infrastructure <--------------------> Toward dynamic infrastructure |
Storage
Cloud Storage
Cloud Storage is binary, large-object storage
- High performance, Internet-scale
- Data encryption at rest
- Data encryption in transit by default from Google to endpoint
- Bucket attributes
- Globally unique name
- Storage class
- Location (region or multi-region)
- IAM policies or Access Control Lists (ACLs)
- Objects are immutable (turn on versioning for updating a "file" and keeping a history of changes)
- Object lifecycle management rules (e.g., delete objects older than x-number of days; keep only the 3 most recent versions of an object {if versioning as been enabled on the bucket})
Cloud Storage Classes | ||||
---|---|---|---|---|
Multi-regional | Regional | Nearline | Coldline | |
Intended for data that is: | Most frequently accessed | Accessed frequently within a region | Accessed less than once a month | Accessed less than once a year |
Availability SLA | 99.95% | 99.90% | 99.00% | 99.00% |
Access APIs | Consistent APIs | |||
Access time | Millisecond access | |||
Use cases | Content storage and delivery | In-region analytics; transcoding | Long-tail content; backups | Archiving; disaster recovery |
Storage price | $$$$ | $$$ | $$ | $ |
Retrieval price | $ | $ | $$$ | $$$$ |
There are several ways to bring data into Cloud Storage:
- Online transfer: self-managed copies using CLI or drap-and-drop
- Storage Transfer Storage: scheduled, managed batch transfers
- Transfer Appliance: rackable appliances to securely ship your data
Cloud Storage works with other GCP services:
- Compute Engine: VM startup scripts, images, and general object storage
- App Engine: object storage, logs, and Datastore backups
- Cloud SQL: import/export tables
- BigQuery: import/export tables
Cloud BigTable
Cloud BigTable is a fully managed NoSQL, wide-column database service for terabyte applications.
- Accessed using the HBase API
- Native compatibility with big data, Hadoop ecosystems
- Managed, scalable storage
- Data encryption in-flight and at rest
- Control access with IAM
- BigTable drives major applications, such as Google Search, Google Analytics, and Gmail
- BigTable access patterns
- Application API
- Data can be read from and written to Cloud BigTable through a data service layer, like Managed VMs, the HBase REST Server, or a Java Server using the HBase client. Typically, this will be to serve data to applications, dashboards, and data services.
- Streaming
- Data can be streamed in (written even-by-even) through a variety of popular stream processing frameworks, like Cloud Dataflow Streaming, Spark Streaming, and Storm.
- Batch Processing
- Data can be read from and written to Cloud BigTable through batch processes, like Hadoop MapReduce, Dataflow, or Spark. Often, summarized or newly calculated data is written back to Cloud BigTable or to a downstream database.
Cloud SQL
Cloud SQL is a managed RDBMS.
- Offers MySQL and PostgeSQLBeta databases as a service (DBaaS)
- Automatic replication
- Managed backups (automatic or scheduled)
- Vertical scaling (read and write)
- Horizontal scaling (read)
- Google security (network firewalls and encryption
- Use cases
- App Engine
- Cloud SQL can be used with App Engine, using standard drivers.
- You can configure a Cloud SQL instance to follow an App Engine application.
- Compute Engine
- Compute Engine instances can be authorized to access Cloud SQL instances using an external IP address.
- Cloud SQL instance can be configured with a preferred zone.
- External services
- Cloud SQL can be used with external applications and clients.
- Standard tools can be used to administer databases.
- External read replicas can be configured.
Cloud Spanner
- A horizontally scalable RDBMS (can scale to larger database sizes than Cloud SQL)
- Transactional consistency at global scale
- Managed instances with high availability
- SQL queries (ANSI 2011 with extensions)
- Automatic replication
- Sharding
- Use cases include financial applications and inventory applications
Cloud Datastore
Cloud Datastore is a horizontally scalable NoSQL DB.
- Designed for application backends (databases can span Compute Engine and App Engine)
- Scales automatically
- Handles sharding and replication
- Supports transactions that affect multiple database rows (unlike Cloud BigTable)
- Allows for SQL-like queries
- Includes a free daily quota (for storage, reads, writes, deletes, and small operations)
Comparing storage options
Cloud Datastore | BigTable | Cloud Storage | Cloud SQL | Cloud Spanner | BigQuery | |
---|---|---|---|---|---|---|
Type | NoSQL document |
NoSQL wide column |
Blobstore | Relational SQL for OLTP1 |
Relational SQL for OLTP1 |
Relational SQL for OLAP2 |
Transactions | Yes | Single-row | No | Yes | Yes | No |
Complex queries | No | No | No | Yes | Yes | Yes |
Capacity | Terabytes+ | Petabytes+ | Petabytes+ | Terabytes | Petabytes | Petabytes+ |
Unit size | 1 MB/entry | ~10 MB/cell ~100 MB/row |
5 TB/object | Determined by DB engine | 10,240 MiB/row | 10 MB/row |
Best for | Getting started, App Engine apps | "Flat" data, heavy read/write, events, analytical data | Structured and unstructured binary or object data | Web frameworks, existing apps | Large-scale database apps (> ~2 TB) | Interactive querying, offline analytics |
Use cases | Getting started, App Engine apps | AdTech, financial, and IoT data | Images, large media files, backups | User credentials, customer orders | Whenever high I/O, global consistency is needed | Data warehousing |
1Online Transaction Processing (OLTP)
2Online Analytical Processing (OLAP)
Networking
Virtual Private Cloud (VPC)
- Each VPC network is contained in a GCP project.
- You can provision GCP resources, connect them to each other, and isolate them from one another.
- Google Cloud VPC networks are global; subnets are regional (and subnets can span the zones that make up the region).
- You can have resources in different zones on the same subnet.
- You can dynamically increase the size of a subnet in a custom network by expanding the range of IP addresses allocated to it (without any workload shutdown or downtime).
- Forward traffic from one instance to another instance within the same network, even across subnets, without requiring external IP addresses.
- Use your VPC route table to forward traffic within the network, even across subnets (and zones) without requiring an external IP address.
- VPCs give you a global distributed firewall.
- You can define firewall rules in terms of metadata tags on VMs (e.g., tag all of your web servers {VMs} with "web" and write a firewall rule stating that traffic on ports 80 and/or 443 is allowed into all VMs with the "web" tag, no matter what their IP address happens to be).
- VPCs belong to GCP projects, however, if you wish to establish connections between VPCs, you can use VPC peering.
- If you want to use the full power of IAM to control who and what in one project can interact with a VPC in another project, use shared VPCs.
Cloud Load Balancers
With global Cloud Load Balancing, your application presents a single front-end to the world.
- Users get a single, global anycast IP address.
- Traffic goes over the Google backbone from the closest point-of-presence to the user.
- Backends are selected based on load.
- Only healthy backends receive traffic.
- No pre-warming is required.
Cloud Load Balancing Options | ||||
---|---|---|---|---|
Global HTTP(S) | Global SSL Proxy | Global TCP Proxy | Regional | Regional (internal) |
Layer 7 load balancing based on load | Layer 4 load balancing of non-HTTPS SSL traffic based on load | Layer 4 load balancing of non-SSL TCP traffic | Load balancing of any traffic (TCP, UDP) | Load balancing of traffic inside a VPC |
Can route different URLs to different backends | Supported on specific port numbers | Supported on specific port numbers | Supported on any port number | Used for the internal tiers of multi-tier applications |
Cloud DNS
Cloud DNS is highly available and scalable
- Create managed zones, then add, edit, delete DNS records.
- Programmatically manage zones and records using RESTful API or CLI.
Cloud Content Delivery Network (CDN)
- Uses Google's globally distributed edge caches to cache content close to yours.
- Or, you can use CDN Interconnect if you would prefer to use a different (non-GCP) CDN.
Stackdriver
Stackdriver provides services for:
- Monitoring
- Platform, system, and application metrics
- Uptime/health checks
- Dashboards and alerts
- Logging
- Platform, system, and application logs
- Log search, view, filter, and export
- Log-based metrics
- Export logs to BigQuery, Cloud Storage, and Cloud Pub/Sub
- Debugger
- Debug applications
- Error reporting
- Analyzes and aggregates the errors in your Cloud apps and notifies you when new errors are detected
- Trace
- Latency reporting and sampling
- Per-URL latency and statistics
Big Data
In the very near future, every company will be a data company, as making the fastest and best use of data is a critical source of competitive advantage.
Google Cloud's big data services are fully managed and scalable.
- BigQuery
- Analytics database; stream data at 100,000 rows per seconds
- Cloud Pub/Sub
- Scalable and flexible enterprise messaging
- Cloud Dataproc
- Managed Hadoop MapReduce, Spark, Pig, and Hive service
- Cloud Dataflow
- Stream and batch processing; unified and simplified pipelines
- Cloud Datalab
- Interactive data exploration
BigQuery
BigQuery is a fast, highly scalable, cost-effective, and fully managed Cloud data warehouse for analytics, with built-in machine learning.
- Provides near real-time interactive analysis of massive datasets (hundreds of TBs) using SQL syntax (SQL 2011)
- Instead of using a dynamic pipeline (like Cloud Dataflow), use BigQuery for data that needs to run more in the way of exploring a vast sea of data (and are able to do ad hoc SQL queries on that massive dataset)
- No cluster maintenance required
- Load data from Cloud Storage or Cloud Datastore or stream it into BigQuery at up to 100,000 rows per second
- In addition to SQL queries, you can read/write data in BigQuery via Cloud Dataflow, Hadoop, and Spark
- Compute and storage are separated with a terabit network in between
- You only pay for storage and processing use
- You pay for your data storage separately from queries
- Automatic discount for long-term data storage
- When the age of your data reach 90 days in BigQuery, Google will automatically drop the price of storage.
- Free monthly quotas
- 99.9% SLA
Google's infrastructure is global and so is BigQuery. BigQuery lets you specify the region where your data will be kept. For example, if you want to keep data in Europe, you do not have to setup a cluster in Europe. Simply specify "EU" as the location when you create your dataset. US and Asia location are also available.
Cloud Pub/Sub
Cloud Pub/Sub allows you to ingest event streams from anywhere, at any scale, for simple, reliable, real-time stream analytics.
- Scalable, reliable messaging
- "Pub" => Publishers; "Sub" => Subscribers
- Supports many-to-many asynchronous messaging
- Application components make push/pull subscriptions to topics
- Includes support for offline consumers
- Designed to provide at least once delivery at low latency (i.e., it is possible for some messages to be delivered more than one; write your code to handle such situations)
- Building block for data ingestion in Dataflow, IoT, Marketing Analytics, etc.
- It is the foundation for Dataflow streaming
- Useful for push notifications for cloud-based apps
- Connect apps across GCP (e.g., push/pull between Compute Engine and App Engine)
Cloud Dataproc
Cloud Dataproc is a managed, Cloud-native Apache Hadoop & Apache Spark service.
- A fast, easy, managed way to run Hadoop and Spark/Hive/Pig on GCP
- Create clusters in 90 seconds or less (on average)
- Scale clusters up and down, even when jobs are running
- Easily migrate on-premises Hadoop jobs to the Cloud
- Use Spark SQL and Spark Machine Learning libraries (MLlib) to run classification algorithms
- Save money with preemptible instances
- The rate for pricing is based on the hour, but Dataproc is billed by the second (one minute minimum)
The MapReduce module means that one function (traditionally called the "Map" function) runs in parallel with a massive dataset to produce intermediate results. Another function (the "Reduce" function) builds a final result set, based on all those intermediate results.
Cloud Dataflow
Cloud Dataflow is a simplified stream and batch data processing service, with equal reliability and expressiveness.
Use Cloud Dataproc when you have a dataset of known size or when you want to manage your cluster size yourself. If your data is ingested in real-time or is of an unpredictable size or rate, use Cloud Dataflow.
- Offers managed data pipelines
- Useful for fraud detection, financial services, IoT analytics, healthcare, logistics, clickstream, Point-of-Sale (PoS) and segmentation analysis in retail
- Extract/Transform/Load (ETL) pipelines to move, filter, enrich, and shape data
- Data analysis: batch computation and continuous computation using streaming
- Processes data using Compute Engine instances
- Clusters are sized for you
- Automated scaling; no instance provisioning required
- Write code once and get batch and streaming (transform-based programming model)
- Orchestration: create pipelines that coordinate services, including external services
- Integrates with GCP services: Cloud Storage, Cloud Pub/Sub, BigQuery, and BigTable
- Open source Python and Java SDKs
Cloud Datalab
An interactive tool for data exploration, analysis, visualization, and machine learning.
- Interactive tool for large-scale data exploration, transformation, analysis, and visualization
- Integrated and open source (built on Jupyter)
- Only pay for the resources you use (no charge for using Datalab itself)
- Analyze data in BigQuery, Compute Engine, and Cloud Storage using Python, SQL, and JavaScaript
- Easily deploy models to BigQuery
- Visualize your data with Google Charts and matplotlib
Cloud Machine Learning
The Google Cloud Machine Learning Platform provides modern machine learning services with pre-trained models and a platform to generate your own taillored models.
- Open source tool to build and run neural network models
- Wide platform support: CPU, GPU, or TPU; mobile, server, or Cloud
- Fully managed machine learning service
- Familiar notebook-based developer experience
- Optimized for Google's infrastructure; integrates with BigQuery and Cloud Storage
- Pre-trained machine learning models built by Google
- Speech: stream results in real-time; detects 80 languages
- Vision: Identify objects, landmarks, text, and content
- Translate: Language translation, including detection
- Natural language: structure and meaning of text
- Why use the Cloud Machine Learning Platform?
- For structure data
- Classification and regression
- Recommendation
- Anomaly detection
- For unstructured data
- Image and video analytics
- Text analytics
Cloud Vision API
Analyze images with a simple REST API.
- Logo detection, label detection, etc.
- With the Cloud Vision API, you can:
- Gain insight from images
- Detect inappropriate context
- Analyze sentiment
- Extract text
Cloud Natural Language API
- Can return text in real-time
- Highly accurate, even in noisy environments
- Access from any device
- As of March 2019, it recognizes over 120 languages and variants
- Uses ML models to reveal structure and meaning of text
- It can do syntax analysis (breaking down sentences into tokens, identify nouns, verbs, adjectives, and other parts of speech and figure out the relationships among the words).
- Extract information about items mentioned in text documents, news articles, and blog posts
Cloud Translation API
Dynamically translate between languages.
- Translate arbitrary strings between thousands of language pairs
- Programmatically detect a document's language
- Support for dozens of languages
Cloud Video Intelligence API
Search and discover your media content with Cloud Video Intelligence.
- Annotate the contents of videos
- Detect scene changes
- Flag inappropriate content
- Support for a variety of video formats
Tools
Cloud Endpoints
Develop, deploy, and manage APIs on any Google Cloud backend.
- Distributed API management
- Export your API using a RESTful interface
- Control access and validate calls with JSON Web Tokens and Google API keys
- Identify web / mobile users with Auth0 and Firebase Authentication
- Generate client libraries
- Supported platforms
- Runtime environment
- App Engine Flexible Environment
- Kubernetes Engine
- Compute Engine
- Clients
- Android
- iOS
- Javascript
- Apigee Edge
- A platform for making APIs available to your customers and partners
- Helps you secure and monetize APIs
- Contains analytics, monetization, and a developer portal
Cloud Source Repositories
Fully featured Git repositories hosted on GCP.
Deployment Manager
Create and manage Cloud resources with simple templates (written in YAML).
- It is an infrastructure management tool
- Provides repeatable deployments
- It is declarative; not imperative
- a declarative approach allows the user to specify what the configuration should be and let the system figure out the steps to take;
- an imperative approach requires the user to define the steps to take to create and configure resources
- Besides YAML, you can also use Python or Jinja2 templates
- Deployment Manager is available at no additional charge to Cloud Platform customers.
Command Line Interface (CLI)
The Google Cloud SDK is a set of tools that you can use to manage resources and applications hosted on the Google Cloud Platform (GCP). These include the gcloud, gsutil, and bq command line tools. The gcloud command-line tool is downloaded along with the Cloud SDK.
Configuration and services
$ gcloud components list
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Components │ ├───────────────┬──────────────────────────────────────────────────────┬──────────────────────────┬───────────┤ │ Status │ Name │ ID │ Size │ ├───────────────┼──────────────────────────────────────────────────────┼──────────────────────────┼───────────┤ │ Not Installed │ App Engine Go Extensions │ app-engine-go │ 56.6 MiB │ │ Not Installed │ Cloud Bigtable Command Line Tool │ cbt │ 6.4 MiB │ │ Not Installed │ Cloud Bigtable Emulator │ bigtable │ 5.6 MiB │ │ Not Installed │ Cloud Datalab Command Line Tool │ datalab │ < 1 MiB │ │ Not Installed │ Cloud Datastore Emulator │ cloud-datastore-emulator │ 17.7 MiB │ │ Not Installed │ Cloud Datastore Emulator (Legacy) │ gcd-emulator │ 38.1 MiB │ │ Not Installed │ Cloud Firestore Emulator │ cloud-firestore-emulator │ 27.5 MiB │ │ Not Installed │ Cloud Pub/Sub Emulator │ pubsub-emulator │ 33.4 MiB │ │ Not Installed │ Cloud SQL Proxy │ cloud_sql_proxy │ 3.8 MiB │ │ Not Installed │ Emulator Reverse Proxy │ emulator-reverse-proxy │ 14.5 MiB │ │ Not Installed │ Google Cloud Build Local Builder │ cloud-build-local │ 6.0 MiB │ │ Not Installed │ Google Container Registry's Docker credential helper │ docker-credential-gcr │ 1.8 MiB │ │ Not Installed │ gcloud Alpha Commands │ alpha │ < 1 MiB │ │ Not Installed │ gcloud Beta Commands │ beta │ < 1 MiB │ │ Not Installed │ gcloud app Java Extensions │ app-engine-java │ 107.5 MiB │ │ Not Installed │ gcloud app PHP Extensions │ app-engine-php │ │ │ Not Installed │ gcloud app Python Extensions │ app-engine-python │ 6.2 MiB │ │ Not Installed │ gcloud app Python Extensions (Extra Libraries) │ app-engine-python-extras │ 28.5 MiB │ │ Not Installed │ kubectl │ kubectl │ < 1 MiB │ │ Installed │ BigQuery Command Line Tool │ bq │ < 1 MiB │ │ Installed │ Cloud SDK Core Libraries │ core │ 9.1 MiB │ │ Installed │ Cloud Storage Command Line Tool │ gsutil │ 3.5 MiB │ └───────────────┴──────────────────────────────────────────────────────┴──────────────────────────┴───────────┘
- To install or remove components at your current SDK version [228.0.0], run:
$ gcloud components install COMPONENT_ID $ gcloud components remove COMPONENT_ID
- To update your SDK installation to the latest version [228.0.0], run:
$ gcloud components update
- Initialize gcloud:
$ gcloud init
- Get current gcloud configuration:
$ gcloud config list
[compute] region = us-west1 zone = us-west1-a [core] account = someone@somewhere.com disable_usage_reporting = True project = my-project-223521 Your active configuration is: [default]
- Get a list of all configurations:
$ gcloud config configurations list
NAME IS_ACTIVE ACCOUNT PROJECT DEFAULT_ZONE DEFAULT_REGION default True someone@somewhere.com my-project-223521 us-west1-a us-west1
- Get a list of all (enabled) services:
$ gcloud services list
NAME TITLE bigquery-json.googleapis.com BigQuery API cloudapis.googleapis.com Google Cloud APIs clouddebugger.googleapis.com Stackdriver Debugger API cloudtrace.googleapis.com Stackdriver Trace API compute.googleapis.com Compute Engine API container.googleapis.com Kubernetes Engine API containerregistry.googleapis.com Container Registry API datastore.googleapis.com Cloud Datastore API dns.googleapis.com Google Cloud DNS API logging.googleapis.com Stackdriver Logging API monitoring.googleapis.com Stackdriver Monitoring API oslogin.googleapis.com Cloud OS Login API pubsub.googleapis.com Cloud Pub/Sub API servicemanagement.googleapis.com Service Management API serviceusage.googleapis.com Service Usage API sql-component.googleapis.com Cloud SQL stackdriver.googleapis.com Stackdriver API stackdriverprovisioning.googleapis.com Stackdriver Provisioning Service storage-api.googleapis.com Google Cloud Storage JSON API storage-component.googleapis.com Google Cloud Storage
$ gcloud services list --enabled --sort-by="NAME" $ gcloud services list --available --sort-by="NAME"
- Creating a project
- Get a list of billing accounts:
$ gcloud beta billing accounts list
ACCOUNT_ID NAME OPEN MASTER_ACCOUNT_ID 000000-000000-000000 My Billing Account True
- Create a project:
$ gcloud projects create dev-project-01 --name="dev-project-01" \ --labels=team=area51
- Link the above project to a billing account:
$ gcloud beta billing projects link dev-project-01 \ --billing-account=000000-000000-000000
- Switch between projects:
$ gcloud config set project ${PROJECT_NAME}
- Get project-wide metadata (including project quotas):
$ gcloud compute project-info describe # current project #~OR~ specific project: $ gcloud compute project-info describe --project ${PROJECT_NAME}
- Managing multiple SDK configurations
Note: When you install the SDK, it will setup a default configuration and ask you to assign a project to it (and a default region).
- Create a new configuration, activate, and switch between configurations:
$ gcloud config configurations create dev $ gcloud config configurations list $ gcloud config list $ gcloud config configurations activate default $ gcloud config set project dev-project-01 $ gcloud config set account someone@somewhere.com
Compute Engine
- Creating a VM/instance
- Use default values:
$ gcloud compute instances create "dev-server" --zone us-west1-a
- Use customised values:
$ gcloud compute instances create "dev-server" \ --project=my-project-123456 \ --zone=us-west1-a \ --machine-type=f1-micro \ --subnet=default \ --network-tier=PREMIUM \ --maintenance-policy=MIGRATE \ --service-account=00000000000-compute@developer.gserviceaccount.com \ --scopes=https://www.googleapis.com/auth/devstorage.read_only,\ https://www.googleapis.com/auth/logging.write,\ https://www.googleapis.com/auth/monitoring.write,\ https://www.googleapis.com/auth/servicecontrol,\ https://www.googleapis.com/auth/service.management.readonly,\ https://www.googleapis.com/auth/trace.append \ --image=centos-7-v20181210 \ --image-project=centos-cloud \ --boot-disk-size=10GB \ --boot-disk-type=pd-standard \ --boot-disk-device-name=dev-server
- Another example of creating a VM:
$ gcloud config list $ gcloud compute zones list | grep us-west $ gcloud config set compute/zone us-west1-a $ gcloud compute images list --filter="debian" $ gcloud compute instances create "my-vm-2" \ --machine-type "n1-standard-1" \ --image-project "debian-cloud" \ --image "debian-9-stretch-v20190213" \ --subnet "default"
- Connecting to a VM (via SSH)
- Google-managed:
$ gcloud compute instances list $ gcloud compute ssh xtof@dev-server $ gcloud compute ssh xtof@dev-server --dry-run # see the actual command
- Using your own SSH key:
$ ssh-keygen -t rsa -f my-ssh-key -C xtof $ echo "xtof:$(cat my-ssh-key.pub)" > gcp_keys.txt $ gcloud compute instances add-metadata dev-server --metadata-from-file ssh-keys=gcp_keys.txt
- Snapshots
$ gcloud compute snapshots list $ gcloud compute disks list $ gcloud compute disks snapshot dev-server $ gcloud compute snapshots delete <snapshot_name>
- Images
- Show public and private images (from which we can create instances from):
$ gcloud compute images list
NAME PROJECT FAMILY DEPRECATED STATUS centos-6-v20181210 centos-cloud centos-6 READY centos-7-v20181210 centos-cloud centos-7 READY ...
- Setting firewall rules
- Get a list of current firewall rules (project-wide):
$ gcloud compute firewall-rules list NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED default-allow-icmp default INGRESS 65534 icmp False default-allow-internal default INGRESS 65534 tcp:0-65535,udp:0-65535,icmp False default-allow-rdp default INGRESS 65534 tcp:3389 False default-allow-ssh default INGRESS 65534 tcp:22 False
- Set firewall rules (e.g., allow HTTP/HTTPS traffic):
$ gcloud compute firewall-rules create default-allow-http \ --project=my-project-123456 \ --direction=INGRESS --priority=1000 --network=default \ --action=ALLOW --rules=tcp:80 --source-ranges=0.0.0.0/0 \ --target-tags=http-server $ gcloud compute firewall-rules create default-allow-https \ --project=my-project-123456 \ --direction=INGRESS --priority=1000 --network=default \ --action=ALLOW --rules=tcp:443 --source-ranges=0.0.0.0/0 \ --target-tags=https-server
- List updated firewall rules:
$ gcloud compute firewall-rules list NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED default-allow-http default INGRESS 1000 tcp:80 False default-allow-https default INGRESS 1000 tcp:443 False ...
- Create an instance using the above firewall rules (HTTP/HTTPS):
$ gcloud compute instances create "dev-server" --zone us-west1-a \ --tags=http-server,https-server
- Deleting a VM/instance
- Get a list of instances:
$ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS dev-server us-west1-a f1-micro 10.138.0.2 35.230.26.217 RUNNING
- Delete the above instance:
$ gcloud compute instances delete "dev-server" --zone "us-west1-a"
Kubernetes
SEE: The Kubernetes main article.
- Managing a GKE cluster
- Create a basic Kubernetes cluster:
$ gcloud container clusters create my-k8s-cluster --zone us-west1-a --num-nodes 2
- Create a Kubernetes cluster (with more options defined):
$ gcloud beta container --project "gcp-k8s-123456" clusters create "xtof-gcp-k8s" \ --zone "us-west1-a" \ --username "admin" \ --cluster-version "1.11.5-gke.5" \ --machine-type "n1-standard-1" \ --image-type "COS" \ --disk-type "pd-standard" \ --disk-size "100" \ --scopes \ "https://www.googleapis.com/auth/devstorage.read_only", "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/servicecontrol", "https://www.googleapis.com/auth/service.management.readonly", "https://www.googleapis.com/auth/trace.append" \ --num-nodes "3" \ --enable-stackdriver-kubernetes \ --no-enable-ip-alias \ --network "projects/gcp-k8s-123456/global/networks/default" \ --subnetwork "projects/gcp-k8s-123456/regions/us-west1/subnetworks/default" \ --addons HorizontalPodAutoscaling,HttpLoadBalancing,KubernetesDashboard,Istio \ --istio-config auth=NONE \ --enable-autoupgrade \ --enable-autorepair
- Get the Kubernetes credentials:
$ gcloud container clusters get-credentials xtof-gcp-k8s --zone us-west1-a --project gcp-k8s-123456
- Resize a Kubernetes cluster:
$ gcloud container cluster resize --size=1 --zone=us-west1-a xtof-gcp-k8s
- Delete the cluster:
$ gcloud container clusters delete --project "gcp-k8s-123456" "xtof-gcp-k8s" --zone "us-west1-a"
Google Container Registry (GCR)
SEE: The Container Registry Quick Start guide for details. SEE: Docker for more details.
- Configure Docker to use the gcloud command-line tool as a credential helper:
$ gcloud auth configure-docker
Note: The above command will add the following settings to your (local) Docker config file (located at ${HOME}/.docker/config.json
):
{ "credHelpers": { "gcr.io": "gcloud", "us.gcr.io": "gcloud", "eu.gcr.io": "gcloud", "asia.gcr.io": "gcloud", "staging-k8s.gcr.io": "gcloud", "marketplace.gcr.io": "gcloud" } }
- Tag the image with a registry name:
$ docker tag ${IMAGE_NAME} gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${IMAGE_TAG}
- Push the image to Container Registry:
$ docker push gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${IMAGE_TAG}
- List images in the Container Registry:
$ gcloud container images list #~OR~ $ gcloud container images list --repository=gcr.io/${PROJECT_ID} #~OR~ $ gcloud container images list --repository=gcr.io/${PROJECT_ID} --filter "name:${IMAGE_NAME}"
- Pull the image from Container Registry:
$ docker pull gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${IMAGE_TAG}
- Cleanup (delete):
$ gcloud container images delete gcr.io/${PROJECT_ID}/${IMAGE_NAME}:${IMAGE_TAG} --force-delete-tags
Deployment Manager
"Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud Platform resources for you. Write flexible template and configuration files and use them to create deployments that have a variety of Cloud Platform services, such as Google Cloud Storage, Google Compute Engine, and Google Cloud SQL, configured to work together". source
- Example:
$ gcloud deployment-manager deployments create my-deployment --config my-deployment.yml $ gcloud deployment-manager deployments update my-deployment --config my-deployment.yml $ gcloud deployment-manager deployments describe my-deployment
Miscellaneous
$ gcloud config set project <project-name> $ gcloud config set compute/zone us-west1 $ gcloud config unset compute/zone $ gcloud iam service-accounts list \ --filter='displayName:"Compute Engine default service account"' \ --format='value(email)' $ gcloud iam service-accounts list --format=json | \ jq -r '.[] | select(.email | startswith("my-project@")) | .email' my-project@gmy-project-123456.iam.gserviceaccount.com
$ gcloud compute networks subnets list
NAME REGION NETWORK RANGE default us-west2 default 10.168.0.0/20 default asia-northeast1 default 10.146.0.0/20 default us-west1 default 10.138.0.0/20 default southamerica-east1 default 10.158.0.0/20 default europe-west4 default 10.164.0.0/20 default asia-east1 default 10.140.0.0/20 default europe-north1 default 10.166.0.0/20 default asia-southeast1 default 10.148.0.0/20 default us-east4 default 10.150.0.0/20 default europe-west1 default 10.132.0.0/20 default europe-west2 default 10.154.0.0/20 default europe-west3 default 10.156.0.0/20 default australia-southeast1 default 10.152.0.0/20 default asia-south1 default 10.160.0.0/20 default us-east1 default 10.142.0.0/20 default us-central1 default 10.128.0.0/20 default asia-east2 default 10.170.0.0/20 default northamerica-northeast1 default 10.162.0.0/20
$ gcloud projects create example-foo-bar-1 --name="Happy project" \ --labels=type=happy
$ gcloud compute forwarding-rules list \ --filter='name:"my-app-forwarding-rules"' \ --format='value(IPAddress)' x.x.x.x
$ gcloud pubsub topics publish myTopic --message '{"name":"bob"}' $ gcloud functions logs read
Cloud Storage
Storage Classes | ||
---|---|---|
Storage Class | Name for APIs and gsutil | |
Multi-Regional Storage | multi_regional
| |
Regional Storage | regional
| |
Nearline Storage | nearline
| |
Coldline Storage | coldline
|
See: for details
- Create a bucket:
$ PROJECT_NAME=my-project $ REGION=us-west1 $ STORAGE_CLASS=regional $ BUCKET_NAME=xtof-test # Basic (using defaults): $ gsutil mb gs://${BUCKET_NAME} # Advanced (override defaults): $ gsutil mb -p ${PROJECT_NAME} -c ${STORAGE_CLASS} -l ${REGION} gs://${BUCKET_NAME}
# Use Cloud Shell variables: $ gsutil mb -l US ${DEVSHELL_PROJECT_ID} # <- creates a globally unique bucket name based off of your project ID
# Set the ACL of an object in your bucket: $ gsutil acl ch -u allUsers:R gs://${DEVSHELL_PROJECT_ID}/foobar.png
Note: All buckets (and their objects) are private by default.
- Upload an object to the above bucket:
$ gsutil cp Pictures/foobar.jpg gs://${BUCKET_NAME}
- Move an object (file) from one bucket to another:
$ gsutil mv gs://${SOURCE_BUCKET} gs://${DESTINATION_BUCKET}
- List the contents of a bucket:
$ gsutil ls gs://${BUCKET_NAME} # basic info $ gsutil ls -l gs://${BUCKET_NAME} # extended info
- Identity and Access Management
- Get the IAM roles and rules for a given bucket (note: these are the default ones):
$ gsutil iam get gs://${BUCKET_NAME}
{ "bindings": [ { "members": [ "projectEditor:my-project-123456", "projectOwner:my-project-123456" ], "role": "roles/storage.legacyBucketOwner" }, { "members": [ "projectViewer:my-project-123456" ], "role": "roles/storage.legacyBucketReader" } ], "etag": "CAE=" }
- Lifecycle Management
- Find all objects in a given bucket older than 2 days (i.e., when they were uploaded to the bucket or last modified) and convert them from "regional" to "nearline" storage class:
$ cat << EOF > lifecycle.json { "lifecycle": { "rule": [ { "action": { "type": "SetStorageClass", "storageClass": "NEARLINE" }, "condition": { "age": 2, "matchesStorageClass": [ "REGIONAL" ] } } ] } } EOF $ gsutil lifecycle set lifecycle.json gs://${BUCKET_NAME}/
- Signed-URLs
First, create a Service Account, with just enough privileges to modify Cloud Storage, and add and download the assigned key.
$ gsutil cp test.txt gs://xtof-sandbox/ $ gsutil signurl -d 3m key.json gs://xtof-sandbox/test.txt
The above will return a signed-URL (it will look something like https://storage.googleapis.com/xtof-sandbox/test.txt?x-goog-signature=23asd...
), which you can send to users and will only be valid for 3 minutes. After 3 minutes, they will get an "ExpiredToken" error.
BigQuery
$ bq query "select string_field_10 as request, count(*) as requestcount from logdata.accesslog group by request order by requestcount desc" +----------------------------------------+--------------+ | request | requestcount | +----------------------------------------+--------------+ | GET /store HTTP/1.0 | 337293 | | GET /index.html HTTP/1.0 | 336193 | | GET /products HTTP/1.0 | 280937 | | GET /services HTTP/1.0 | 169090 | | GET /products/desserttoppings HTTP/1.0 | 56580 | | GET /products/floorwaxes HTTP/1.0 | 56451 | | GET /careers HTTP/1.0 | 56412 | | GET /services/turnipwinding HTTP/1.0 | 56401 | | GET /services/spacetravel HTTP/1.0 | 56176 | | GET /favicon.ico HTTP/1.0 | 55845 | +----------------------------------------+--------------+
GCP vs. AWS
Note: All of the following are as of February 2017.
- Compute
- Compute Engine vs. EC2
- App Engine vs. Elastic Beanstalk
- Container Engine vs. EC2
- Container Registry vs. ECR
- Cloud Functions vs. Lambda
- Identity & Security
- Cloud IAM vs. IAM
- Cloud Resource Manager vs. n/a
- Cloud Security Scanner vs. Inspector
- Cloud Platform Security vs. n/a
- Networking
- Cloud Virtual Network vs. VPC
- Cloud Load Balancing vs. ELB
- Cloud CDN vs. CloudFront
- Cloud Interconnect vs. Direct Connect
- Cloud DNS vs. Route53
- Storage and Databases
- Cloud Storage vs. S3
- Cloud Bigtable vs. DynamoDB
- Cloud Datastore vs. SimpleDB
- Cloud SQL vs. RDS
- Persistent Disk vs. EBS
- Big Data
- BigQuery vs. Redshift
- Cloud Dataflow vs. EMR
- Cloud Dataproc vs. EMR
- Cloud Datalab vs. n/a
- Cloud Pub/Sub vs. Kinesis
- Genomics vs. n/a
- Machine Learning
- Cloud Machine Learning vs. Machine Learning
- Vision API vs. Rekognition
- Speech API vs. Polly
- Natural Language API vs. Lex
- Translation API vs. n/a
- Jobs API vs. n/a
- Compute Services (GCP vs. AWS):
- Infrastructure as a Service (IaaS): Compute Engine vs. EC2
- Platform as a Service (PaaS): App Engine vs. Elastic Beanstalk
- Containers as a Service: Container Engine vs. EC2
Compute IaaS comparison | ||
---|---|---|
Feature | Amazon EC2 | Compute Engine |
Virtual machines | Instances | Instances |
Machine images | Amazon Machine Image (AMI) | Image |
Temporary virtual machines | Spot instances | Preemptible VMs |
Firewall | Security groups | Compute Engine firewall rules |
Automatic instance scaling | Auto Scaling | Compute Engine autoscaler |
Local attached disk | Ephemeral disk | Local SSD |
VM import | Supported formats: RAW, OVA, VMDK, VHD | Supported formats: AMI, RAW, VirtualBox |
Deployment locality | Zonal | Zonal |
Networking services comparison | |||||
---|---|---|---|---|---|
Networking | Load Balancing | CDN | On-premises connection | DNS | |
AWS | VPC | ELB | CloudFront | Direct Connect | Route53 |
GCP | Cloud VirtualNetwork1 | Cloud LoadBalancing2 | Cloud CDN | Cloud InterConnect | Cloud DNS |
1GCP allows for 802.1q tagging (aka VLAN taggin). AWS does not.
2GCP allows for cross-region load balancing. AWS does not.
Storage services comparison | ||||
---|---|---|---|---|
Object | Block | Cold | File | |
AWS | S3 | EBS1 | Glacier | EFS |
GCP | Cloud Storage | Compute Engine Persistent Disks2 | Cloud Storage Nearline | ZFS/Avere |
1An EBS volume can be attached to only one EC2 instance at a time. Can attach up to 40 disk volumes to a Linux instance. Available in only one region by default.
2GCP Persistent Disks in read-only mode can be attached to multiple instances simultaneously. Can attach up to 128 disk volumes. Snapshots are global and can be used in any region without additional operations or charges.
Database services comparison | |||
---|---|---|---|
RDMS | NoSQL (key-value) | NoSQL (indexed) | |
AWS | RDS | DynamoDB | DynamoDB |
GCP | Cloud SQL1 | Cloud Bigtable2 | Cloud Datastore |
1MySQL only.
2100 MB maximum item size. Does not support secondary indexes.
Big Data services comparison | ||||
---|---|---|---|---|
Streaming data ingestion | Streaming data processing | Batch data processing | Analytics | |
AWS | Kinesis | Kinesis | EMR | Redshift |
GCP | Cloud Pub/Sub | Cloud Dataflow | Cloud Dataflow / Cloud Dataproc | BigQuery |
- Cloud Pub/Sub
- GCPs offering for data streaming and message queue. It allows for secure communication between applications and can also serve as a de-coupling method (a good way to scale).
- Dataflow
- GCPs managed service offering for batch and streaming data processing. Apache Beam under-the-hood.
- Dataproc
- GCPs offering for data processing using Apache Hadoop and Apache Spark. It is a massively parallel data processing and transformation engine.
- Supported services: MapReduce, Apache Hive, Apache Pig, Apache Spark, Spark SQL, PySpark, and support for parallel jobs with YARN.
- BigQuery
- GCPs offering for a fully managed, massive data warehousing and analytics engine, allowing for data analytics using SQL.
Application services comparison | |
---|---|
Messaging | |
AWS | SNS |
GCP | Cloud Pub/Sub |
- Cloud Pub/Sub (publisher/subscriber)
Management services comparison | ||
---|---|---|
Monitoring | Deployment (IaC) | |
AWS | CloudWatch | CloudFormation |
GCP | Stackdriver | Deployment Manager |