From Christoph's Personal Wiki
Jump to: navigation, search

This article will cover topics related to the AWS Certified DevOps Engineer - Professional exam and certification.



The 6 domains outlined in the AWS blueprint for the certification include:

  1. Software Development LifeCycle (SDLC) Automation [22%]
  2. Configuration Management and Infrastructure as Code [19%]
  3. Monitoring and Logging [15%]
  4. Policies and Standards Automation [10%]
  5. Incident and Event Response [18%]
  6. High Availability, Fault Tolerance, and Disaster Recovery [16%]

Domain 1: SDLC Automation

  • 1.1 Apply concepts required to automate a CI/CD pipeline
  • 1.2 Determine source control strategies and how to implement them
  • 1.3 Apply concepts required to automate and integrate testing
  • 1.4 Apply concepts required to build and manage artifacts securely
  • 1.5 Determine deployment/delivery strategies (e.g., A/B, Blue/Green, Canary, Red/Black) and how to implement them using AWS Services

Domain 2: Configuration Management and Infrastructure as Code

  • 2.1 Determine deployment services based on deployment needs
  • 2.2 Determine application and infrastructure deployment models based on business needs
  • 2.3 Apply security concepts in the automation of resource provisioning
  • 2.4 Determine how to implement lifecycle hooks on a deployment
  • 2.5 Apply concepts required to manage systems using AWS configuration management tools and services

Domain 3: Monitoring and Logging

  • 3.1 Determine how to set up the aggregation, storage, and analysis of logs and metrics
  • 3.2 Apply concepts required to automate monitoring and event management of an environment
  • 3.3 Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications
  • 3.4 Determine how to implement tagging and other metadata strategies

Domain 4: Policies and Standards Automation

  • 4.1 Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security
  • 4.2 Determine how to optimize cost through automation
  • 4.3 Apply concepts required to implement governance strategies

Domain 5: Incident and Event Response

  • 5.1 Troubleshoot issues and determine how to restore operations
  • 5.2 Determine how to automate event management and alerting
  • 5.3 Apply concepts required to implement automated healing
  • 5.4 Apply concepts required to set up event-driven automated actions

Domain 6: High Availability, Fault Tolerance, and Disaster Recovery

  • 6.1 Determine appropriate use of multi-AZ versus multi-region architectures
  • 6.2 Determine how to implement high availability, scalability, and fault tolerance
  • 6.3 Determine the right services based on business needs (e.g., RTO/RPO, cost)
  • 6.4 Determine how to design and automate disaster recovery strategies
  • 6.5 Evaluate a deployment for points of failure

SDLC Automation


What is CI/CD?

The CI/CD Pipeline
  • AWS CodePipeline
    • Source Stage
      • AWS CodeCommit (think "git")
    • Deploy Stage - Development
      • AWS CodeDeploy -> EC2 instance
    • Deploy Stage - Production
      • AWS CodeDeploy -> EC2 instance

AWS CodeCommit

AWS CodeBuild

  • A fully managed build service
  • Compiles your code
  • Runs unit tests
  • Produces artifacts that are ready to deploy
  • Eliminates the need to provision/manage/scale your own build servers
  • Provides pre-packaged build environments
  • Allows you to build your own customized build environment
  • Scales automatically to meet your build requirements
Benefits of CodeBuild
  • It is fully managed
    • You do not have to set up any build servers, nor patch, update, or maintain them.
  • It is on-demand
    • It automatically scales to meet your requirements. No more migrating to larger EC2 servers because your builds are taking too long. You only pay for the minutes (seconds?) you consume.
  • It is preconfigured
    • It comes with many pre-configured build environments for the most popular programming languages. You just need to configure it to use your build script.
SEE: AWS - Troubleshooting CodeBuild

AWS CodeDeploy

What is CodeDeploy?
  • A fully managed deployment service that automates deployments to:
    • Amazon EC2 instances
    • On-premise instances
    • AWS Lambda functions
  • Makes it easier to:
    • Rapidly deploy new features
    • Update Lambda function versions
    • Avoid downtime during deployment
    • Handle the full complex deployment process without human intervention

AWS CodePipeline

CodePipeline is the "CD" of CI/CD.

  • Automatic
    • From the check-in of your code to deployment on to your service, CodePipeline takes care of it all.
  • Easy to set up
    • CodePipeline has no servers to provision, it is dead simple to configure and get working. There are pre-built plugins or you can roll your own.
  • Configurable
    • You can create, configure, and modify all stages of your software release process with ease. You can implement automated testing and customize the deployment process.


Why do we test?
  • Meet the requirements defined
  • Ensure the code performs in an accepatble period of time
  • Ensure the code is usable
  • Ensure the code responds correctly to all kinds of inputs
  • Achieves the result the programmer desired
Types of testings (see Wikipedia)
Automated testing
  • Automatic execution oif test
  • Comparision of actual outcomes to predicted outcomes
  • Fast, continuos feedback
  • Immediate notifcation
  • Save resources
Unit test example


What are artifacts?

An artifact is a product or by-product produced during the software development process.

For example:

  • Compiled binaries
  • Source code
  • Documentation
  • Use cases
  • Class diagrams

Artifacts are stored in S3 (note: this has nothing to do with AWS Artifact!)

Deployment Strategies

Single Target Deployment (build -> target)
  • Use for small development projects, especially when legacy or non-highly-available infrastructure is involved.
  • When it is initiated, a new application version is installed on the target server.
  • A brief outage occurs during installation. There are no secondary servers, so testing is limited. Rollback involves removing the new version and install the previous.
All-at-Once Deployment (build -> x2 targets)
  • Deployment happens in one step, just like single target deployment.
  • With this method, the destination is multiple targets.
  • More complicated than single target; often requiring orchestration tooling.
  • Shares negatives of single target. No ability to test, still has deployment outages, and less than ideal rollback.
Minimum in-service Deployment (initial build stage -> t1 t2 t3 ...)
  • Deployment happens in multiple stages
  • Deployment happens to as many targets as possible while maintaining the minimum in-service targets.
  • A few moving parts, orchestration and health checks are required.
  • Allows automated testing, deployment targets are assessed and testsd prior to continuing.
  • Generally, no downtime.
  • Often quicker and less stages than a rolling deployment.
Rolling Deployment
  • Deployment happens in multiple stages. Number of targets per stage is user-defined.
  • Moving parts; orchestration and health-checks are required.
  • Overall applicable health is not necessariliy maintained.
  • Can be the leasat efficient deployment time based on time-taken.
  • Allows automated testing; deployment targets are assessed and tested prior to continuing.
  • Generally, no downtime, assuming number of targets per run is not large neough to impact the application.
  • Can be paused, allowing limited multi-version testing (combined with small targets per stage).
Blue/Green Deployment (aka Red/Black)
  • Requires advanced orchestration tooling
  • Carries significant cost - maintiang 2 environments for the duration of deployments.
  • Deployment process is rapid - entire environemnt (blue or green) is deployed all at once.
  • Cutover and migration is clean and controlled (e.g., DNS change)
  • Rollback is equally clean (e.g., DNS regression)
  • Health and performance of entire "green" environment can be tested prior to cutover.
  • using advanced template systems, such as CloudFormation, the entire process can be fully automated.
Canary Deployment
  • Like Blue/Green, but keep blue active and route percentage of traffic to green
  • In AWS, use Route53 w/weighted round-robin

Configuration Management and Infrastructure as Code


Infrastructure as Code (IaC) lifecycle
  1. Resource provisioning:
    AWS CloudFormation
  2. Configuration management:
    AWS OpsWorks for Chef Automate
    Amazon EC2 Systems Manager
  3. Monitoring and performance:
    Amazon CloudWatch
  4. Governance and compliance:
    AWS Config
  5. Resource optimization:
    AWS Trusted Advisor

AWS CloudFormation, AWS OpsWorks for Chef Automate, Amazon EC2 Systems Manager, Amazon CloudWatch, AWS Config, and AWS Trusted Advisor enable you to integrate the concept of Infrastructure as Code into all phases of the project lifecycle. By using Infrastructure as Code, your organization can automatically deploy consistently built environments that, in turn, can help your organization to improve its overall maturity.

AWS CloudFormation



    # set of parameters
    # set of mappings
      Type: AWS::EC2::Instance
        ImageId: "ami-00000000"
    # set of outputs
    Type: AWS::EC2::Instance
      ImageId: "ami-088ff0e3bde7b3fdf"
      InstanceType: "t2.micro"

AWS CloudFormation Intrinsic Functions


FindInMap YAML example
      HVM64: "ami-0000"
      HVMG2: "ami-0000"
      HVM64: "ami-0000"
      HVMG2: "ami-0000"
    Type: "AWS::EC2::Instance"
      ImageId: !FindInMap
        - RegionMap
        - !Ref 'AWS::Region' # intrinsic function
        - HVM64
      InstanceType: m1.small

AWS CloudFormation Wait Conditions

AWS CloudFormation Nested Stacks

AWS CloudFormation Deletion Policies

AWS CloudFormation Stack Updates

AWS CloudFormation Change Sets

AWS CloudFormation Custom Resources

AWS Elastic Beanstalk

AWS Elastic Beanstalk extensions

AWS Config

AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. AWS Config automatically builds an inventory of your resources and tracks changes made to them.

AWS Config also provides a clear view of the resource change timeline, including changes in both the resource configurations and the associations of those resources to other AWS resources.

AWS Config is a web service that performs configuration management of supported AWS resources within your account and delivers log files to you. The recorded information includes the configuration item (AWS resource), relationships between configuration items (AWS resources), any configuration changes between resources. The AWS configuration item history captured by AWS Config enables security analysis, resource change tracking, and compliance auditing.

When many different resources are changing frequently and automatically, automating compliance can become as important as automating the delivery pipeline. To respond to changes in the environment, you can use AWS Config rules.

Best Practices

Here are some recommendations for implementing AWS Config in your environments:

  • Enable AWS Config for all regions to record the configuration item history, to facilitate auditing and compliance tracking.
  • Implement a process to respond to changes detected by AWS Config. This could include email notifications and the use of AWS Config rules to respond to changes programmatically.
  • Delete Config service:
$ aws configservice delete-configuration-recorder --configuration-recorder-name default

AWS Config extends the concept of infrastructure code into the realm of governance and compliance. AWS Config can continuously record the configuration of resources while AWS Config rules allow for event-driven responses to changes in the configuration of tracked resources. You can use this capability to assist your organization with the monitoring of compliance controls.

Amazon ECS

AWS Managed Services

AWS Lambda

AWS Lambda Step Functions

AWS OpsWorks

AWS OpsWorks for Chef Automate brings the capabilities of Chef, a configuration management platform, to AWS. OpsWorks for Chef Automate further builds on Chef's capabilities by providing additional features that support DevOps capabilities at scale. Chef is based on the concept of recipes, configuration scripts written in the Ruby language that perform tasks such as installing services. Chef recipes, like AWS CloudFormation templates, are a form of source code that can be version controlled, thereby extending the principle of Infrastructure as Code to the configuration management stage of the resource lifecycle.

OpsWorks for Chef Automate expands the capabilities of Chef to enable your organization to implement DevOps at scale. OpsWorks for Chef Automate provides three key capabilities that you can configure to support DevOps practices: workflow, compliance, and visibility.

Monitoring and Logging



Amazon CloudWatch is a set of services that ingests, interprets, and responds to runtime metrics, logs, and events. CloudWatch automatically collects metrics from many AWS services, such as Amazon EC2, Elastic Load Balancing (ELB), and Amazon DynamoDB. Responses can include built-in actions such as sending notifications or custom actions handled by AWS Lambda, a serverless event-driven compute platform. The code for Lambda functions becomes part of the infrastructure codebase, thereby extending Infrastructure as Code to the operational level. CloudWatch consists of three services: the main CloudWatch service, Amazon CloudWatch Logs, and Amazon CloudWatch Events.

The main Amazon CloudWatch service collects and tracks metrics for many AWS services such as Amazon EC2, ELB, DynamoDB, and Amazon Relational Database Service (RDS). You can also create custom metrics for services you develop, such as applications. CloudWatch issues alarms when metrics reach a given threshold over a period of time.

Here are some examples of metrics and potential responses that could apply to the situations mentioned at the start of this section:

  • If the latency of ELB exceeds five seconds over two minutes, send an email notification to the systems administrators.
  • When the average EC2 instance CPU usage exceeds 60 percent for three minutes, launch another EC2 instance.
  • Increase the capacity units of a DynamoDB table when excessive throttling occurs.

You can implement responses to metrics-based alarms using built-in notifications, or by writing custom Lambda functions in Python, Node.js, Java, or C#.

$ aws cloudwatch put-metric-data \
    --metric-name randomNumber \
    --namespace Random \
    --value $(shuf -i 1-1000 -n1) \

CloudWatch Custom Metrics

CloudWatch Events

Amazon CloudWatch Events produces a stream of events from changes to AWS environments, applies a rules engine, and delivers matching events to specified targets. Examples of events that can be streamed include EC2 instance state changes, Auto Scaling actions, API calls published by CloudTrail, AWS console sign-ins, AWS Trusted Advisor optimization notifications, custom application- level events, and time-scheduled actions. Targets can include built-in actions such as SNS notifications or custom responses using Lambda functions.

The ability of an infrastructure to respond to selected events offers benefits in both operations and security. From the operations perspective, events can automate maintenance activities without having to manage a separate scheduling system. With regard to information security, events can provide notifications of console logins, authentication failures, and risky API calls recorded by CloudTrail. In both realms, incorporating event responses into the infrastructure code promotes a greater degree of self-healing and a higher level of operational maturity.

Best Practices

Here are some recommendations for best practices related to monitoring:

  • Ensure that all AWS resources are emitting metrics.
  • Create CloudWatch alarms for metrics that provide the appropriate responses as metric-related events arise.
  • Send logs from AWS resources, including Amazon S3 and Amazon EC2, to CloudWatch Logs for analysis using log stream triggers and Lambda functions.
  • Schedule ongoing maintenance tasks with CloudWatch and Lambda.
  • Use CloudWatch custom events to respond to application-level issues.

CloudWatch Logs

Amazon CloudWatch Logs monitors and stores logs from Amazon EC2, AWS CloudTrail, and other sources. EC2 instances can ship logging information using the CloudWatch Logs Agent and logging tools such as Logstash, Graylog, and Fluentd. Logs stored in Amazon S3 can be sent to CloudWatch Logs by configuring an Amazon S3 event to trigger a Lambda function.

Ingested log data can be the basis for new CloudWatch metrics that can, in turn, trigger CloudWatch alarms. You can use this capability to monitor any resource that generates logs without writing any code whatsoever. If you need a more advanced response procedure, you can create a Lambda function to take the appropriate actions. For example, a Lambda function can use the SES.SendEmail or SNS.Publish APIs to publish information to a Slack channel when NullPointerException errors appear in production logs. Log processing and correlation allow for deeper analysis of application behaviours and can expose internal details that are hard to figure out from metrics. CloudWatch Logs provides both the storage and analysis of logs, and processing to enable data-driven responses to operational issues.

$ sudo dpkg -i amazon-cloudwatch-agent.deb
$ sudo vi /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml
# create and edit config
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:cloudwatchconfig.cfg -s


Think managed "Kiali + Jaeger"

Policies and Standards Automation


AWS Service Catalog

AWS Trusted Advisor

AWS Trusted Advisor helps you observe best practices by scanning your AWS resources and comparing their usage against AWS best practices in four categories: cost optimization, performance, security, and fault tolerance. As part of ongoing improvement to your infrastructure and applications, taking advantage of Trusted Advisor can help keep your resources provisioned optimally.


Trusted Advisor provides a variety of checks to determine if the infrastructure is following best practices. The checks include detailed descriptions of recommended best practices, alert criteria, guidelines for action, and a list of useful resources on the topic. Trusted Advisor provides the results of the checks and can also provide ongoing weekly notifications for status updates and cost savings.

All customers have access to a core set of Trusted Advisor checks. Customers with AWS Business or Enterprise support can access all Trusted Advisor checks and the Trusted Advisor APIs. Using the APIs, you can obtain information from Trusted Advisor and take corrective actions. For example, a program could leverage Trusted Advisor to examine current account service limits. If current resource usages approach the limits, you can automatically create a support case to increase the limits.

Additionally, Trusted Advisor now integrates with Amazon CloudWatch Events. You can design a Lambda function to notify a Slack channel when the status of Trusted Advisor checks changes. These examples illustrate how the concept of Infrastructure as Code can be extended to the resource optimization level of the information resource lifecycle.

Best Practices

The best practices for AWS Trusted Advisor are listed below:

  • Subscribe to Trusted Advisor notifications through email or an alternative delivery system.
  • Use distribution lists and ensure that the appropriate recipients are included on all such notifications.
  • If you have AWS Business or Enterprise support, use the AWS Support API in conjunction with Trusted Advisor notifications to create cases with AWS Support to perform remediation.

You must continuously monitor your infrastructure to optimize the infrastructure resources with regard to performance, security, and cost. AWS Trusted Advisor provides the ability to use APIs to interrogate your AWS infrastructure for recommendations, thus extending Infrastructure as Code to the optimization phase of the information resource lifecycle.

AWS Systems Manager

Amazon EC2 Systems Manager is a collection of capabilities that simplifies common maintenance, management, deployment, and execution of operational tasks on EC2 instances and servers or virtual machines (VMs) in on-premises environments. Systems Manager helps you easily understand and control the current state of your EC2 instance and OS configurations. You can track and remotely manage system configuration, OS patch levels, application configurations, and other details about deployments as they occur over time. These capabilities help with automating complex and repetitive tasks, defining system configurations, preventing drift, and maintaining software compliance of both Amazon EC2 and on-premises configurations.

Systems Manager is a management service that assists with:

  • Collecting software inventory
  • Applying OS patches
  • Creating system images
  • Configuring operating systems
  • Manage Hybrid Cloud systems from a single interface (AWS and on-prem)
  • Reducing costs
Run Command

Lets you run a given command(s) across all of your EC2 instances (or a group of them).

The table below lists the tasks that Systems Manager simplifies.

Systems Manager
Tasks Details
Run Command Manage the configuration of managed instances at scale by distributing commands across a fleet.
Inventory Automate the collection of the software inventory from managed instances.
State Manager Keep managed instances in a defined and consistent state.
Maintenance Window Define a maintenance window for running administrative tasks.
Patch Manager Deploy software patches automatically across groups of instances.
Automation Perform common maintenance and deployment tasks, such as updating Amazon Machine Images (AMIs).
Parameter Store Store, control, access, and retrieve configuration data, whether plain-text data such as database strings or secrets such

as passwords, encrypted through AWS Key Management System (KMS).

AWS Organizations

AWS Secrets Manager

Amazon Macie

Macie is a security server that uses machine learning to automatically discover, classify, and protect sensitive data in AWS.

  • Macie:
    • Can recognize any Personally Identifiable Information (PII)
    • Provides a dashboard
    • Monitors data access activity for anomalies
    • Generates detailed alerts when it detects risk of unauthorized access or accidental data leaks
    • As of February 2021 <CHECK>, it only protects data in S3, with more AWS data stores planned for the future.
    • It gives you superior visibility of data
    • Simple to set up and easy to manage

AWS Certificate Manager

Incident and Event Response


Amazon GuardDuty

GuardDuty is a threat-detection service that continuously monitors for malicious or unauthorized behaviour.

Amazon Inspector

Inspector is an automated service that assesses your applications for vulnerabilities and produces a security findings report.

Amazon Kinesis

Easily collect, process, and analyze video and data streams in real time.

  • Kinesis Data Analytics
    • Analyze streaming data
    • Respond in real-time
    • Query using SQL
    • Completely managed service (no servers required)
    • Pay-as-you-go for what you use
    • Powerful real-time processing
  • Kinesis Data Firehose
    • Deliver streaming data
    • No applications to write or manage
    • Just configure the producer
    • Data can be transformed
    • Destinations such as S3, Redshift, ElasiticSearch, and Splunk
    • Accepts records in chunks of up to 1,000 kb
  • Kinesis Data Streams
    • Collect streaming data
    • Massively scalable
    • Capture gigabytes per second (from thousands of sources)
    • Data is available in milliseconds
    • Durable (data in stored in 3 x DCs in a region)
    • Data is stored for 7 days
    • Elastic
  • Kinesis Video Streams
    • Collect streaming video
    • Can handle ingestion from millions of devices
    • Enables live and on-demand playback
    • Take advantage of Amazon Recognition Video and Machine Learning frameworks for video
    • Access your data through APIs
    • Build real-time video enabled applications

Review the tutorial: "Using AWS Lambda with Amazon Kinesis".

High Availability, Fault Tolerance, and Disaster Recovery

  • Introduction
  • AWS Single Sign-On
  • Amazon CloudFront
  • AutoScaling and Lifecycle hooks
  • Amazon Route53
  • Amazon RDS
  • Amazon Aurora
  • Amazon DynamoDB
  • Amazon DynamoDB Keys and Streams

Other Services You Need to Know About

  • Introduction
  • Tagging
  • Amazon Elastic File System
  • Amazon ElastiCache
  • Amazon S3 Glacier
  • AWS Direct Connect
  • AWS Lambda Function Dead Letter Queues
  • Amazon CloudSearch
  • Amazon Elasticsearch Service
  • Amazon DynamoDB Accelerator
  • AWS Server Migration Service

External links

Study tips