This article will cover topics related to the AWS Certified DevOps Engineer - Professional exam and certification.
- 1 Domains
- 2 SDLC Automation
- 3 Configuration Management and Infrastructure as Code
- 3.1 Introduction
- 3.2 AWS CloudFormation
- 3.3 AWS Elastic Beanstalk
- 3.4 AWS Config
- 3.5 Amazon ECS
- 3.6 AWS Managed Services
- 3.7 AWS Lambda
- 3.8 AWS OpsWorks
- 4 Monitoring and Logging
- 5 Policies and Standards Automation
- 6 Incident and Event Response
- 7 High Availability, Fault Tolerance, and Disaster Recovery
- 8 Other Services You Need to Know About
- 9 External links
The 6 domains outlined in the AWS blueprint for the certification include:
- Software Development LifeCycle (SDLC) Automation [22%]
- Configuration Management and Infrastructure as Code [19%]
- Monitoring and Logging [15%]
- Policies and Standards Automation [10%]
- Incident and Event Response [18%]
- High Availability, Fault Tolerance, and Disaster Recovery [16%]
Domain 1: SDLC Automation
- 1.1 Apply concepts required to automate a CI/CD pipeline
- 1.2 Determine source control strategies and how to implement them
- 1.3 Apply concepts required to automate and integrate testing
- 1.4 Apply concepts required to build and manage artifacts securely
- 1.5 Determine deployment/delivery strategies (e.g., A/B, Blue/Green, Canary, Red/Black) and how to implement them using AWS Services
Domain 2: Configuration Management and Infrastructure as Code
- 2.1 Determine deployment services based on deployment needs
- 2.2 Determine application and infrastructure deployment models based on business needs
- 2.3 Apply security concepts in the automation of resource provisioning
- 2.4 Determine how to implement lifecycle hooks on a deployment
- 2.5 Apply concepts required to manage systems using AWS configuration management tools and services
Domain 3: Monitoring and Logging
- 3.1 Determine how to set up the aggregation, storage, and analysis of logs and metrics
- 3.2 Apply concepts required to automate monitoring and event management of an environment
- 3.3 Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications
- 3.4 Determine how to implement tagging and other metadata strategies
Domain 4: Policies and Standards Automation
- 4.1 Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security
- 4.2 Determine how to optimize cost through automation
- 4.3 Apply concepts required to implement governance strategies
Domain 5: Incident and Event Response
- 5.1 Troubleshoot issues and determine how to restore operations
- 5.2 Determine how to automate event management and alerting
- 5.3 Apply concepts required to implement automated healing
- 5.4 Apply concepts required to set up event-driven automated actions
Domain 6: High Availability, Fault Tolerance, and Disaster Recovery
- 6.1 Determine appropriate use of multi-AZ versus multi-region architectures
- 6.2 Determine how to implement high availability, scalability, and fault tolerance
- 6.3 Determine the right services based on business needs (e.g., RTO/RPO, cost)
- 6.4 Determine how to design and automate disaster recovery strategies
- 6.5 Evaluate a deployment for points of failure
What is CI/CD?
- The CI/CD Pipeline
- AWS CodePipeline
- Source Stage
- AWS CodeCommit (think "git")
- Deploy Stage - Development
- AWS CodeDeploy -> EC2 instance
- Deploy Stage - Production
- AWS CodeDeploy -> EC2 instance
- Source Stage
- A fully managed build service
- Compiles your code
- Runs unit tests
- Produces artifacts that are ready to deploy
- Eliminates the need to provision/manage/scale your own build servers
- Provides pre-packaged build environments
- Allows you to build your own customized build environment
- Scales automatically to meet your build requirements
- Benefits of CodeBuild
- It is fully managed
- You do not have to set up any build servers, nor patch, update, or maintain them.
- It is on-demand
- It automatically scales to meet your requirements. No more migrating to larger EC2 servers because your builds are taking too long. You only pay for the minutes (seconds?) you consume.
- It is preconfigured
- It comes with many pre-configured build environments for the most popular programming languages. You just need to configure it to use your build script.
- What is CodeDeploy?
- A fully managed deployment service that automates deployments to:
- Amazon EC2 instances
- On-premise instances
- AWS Lambda functions
- Makes it easier to:
- Rapidly deploy new features
- Update Lambda function versions
- Avoid downtime during deployment
- Handle the full complex deployment process without human intervention
CodePipeline is the "CD" of CI/CD.
- From the check-in of your code to deployment on to your service, CodePipeline takes care of it all.
- Easy to set up
- CodePipeline has no servers to provision, it is dead simple to configure and get working. There are pre-built plugins or you can roll your own.
- You can create, configure, and modify all stages of your software release process with ease. You can implement automated testing and customize the deployment process.
- Why do we test?
- Meet the requirements defined
- Ensure the code performs in an accepatble period of time
- Ensure the code is usable
- Ensure the code responds correctly to all kinds of inputs
- Achieves the result the programmer desired
- Types of testings (see Wikipedia)
- Automated testing
- Automatic execution oif test
- Comparision of actual outcomes to predicted outcomes
- Fast, continuos feedback
- Immediate notifcation
- Save resources
- Unit test example
- What are artifacts?
An artifact is a product or by-product produced during the software development process.
- Compiled binaries
- Source code
- Use cases
- Class diagrams
Artifacts are stored in S3 (note: this has nothing to do with AWS Artifact!)
- Single Target Deployment (build -> target)
- Use for small development projects, especially when legacy or non-highly-available infrastructure is involved.
- When it is initiated, a new application version is installed on the target server.
- A brief outage occurs during installation. There are no secondary servers, so testing is limited. Rollback involves removing the new version and install the previous.
- All-at-Once Deployment (build -> x2 targets)
- Deployment happens in one step, just like single target deployment.
- With this method, the destination is multiple targets.
- More complicated than single target; often requiring orchestration tooling.
- Shares negatives of single target. No ability to test, still has deployment outages, and less than ideal rollback.
- Minimum in-service Deployment (initial build stage -> t1 t2 t3 ...)
- Deployment happens in multiple stages
- Deployment happens to as many targets as possible while maintaining the minimum in-service targets.
- A few moving parts, orchestration and health checks are required.
- Allows automated testing, deployment targets are assessed and testsd prior to continuing.
- Generally, no downtime.
- Often quicker and less stages than a rolling deployment.
- Rolling Deployment
- Deployment happens in multiple stages. Number of targets per stage is user-defined.
- Moving parts; orchestration and health-checks are required.
- Overall applicable health is not necessariliy maintained.
- Can be the leasat efficient deployment time based on time-taken.
- Allows automated testing; deployment targets are assessed and tested prior to continuing.
- Generally, no downtime, assuming number of targets per run is not large neough to impact the application.
- Can be paused, allowing limited multi-version testing (combined with small targets per stage).
- Blue/Green Deployment (aka Red/Black)
- Requires advanced orchestration tooling
- Carries significant cost - maintiang 2 environments for the duration of deployments.
- Deployment process is rapid - entire environemnt (blue or green) is deployed all at once.
- Cutover and migration is clean and controlled (e.g., DNS change)
- Rollback is equally clean (e.g., DNS regression)
- Health and performance of entire "green" environment can be tested prior to cutover.
- using advanced template systems, such as CloudFormation, the entire process can be fully automated.
- Canary Deployment
- Like Blue/Green, but keep blue active and route percentage of traffic to green
- In AWS, use Route53 w/weighted round-robin
Configuration Management and Infrastructure as Code
- Infrastructure as Code (IaC) lifecycle
- Resource provisioning:
- AWS CloudFormation
- Configuration management:
- AWS OpsWorks for Chef Automate
- Amazon EC2 Systems Manager
- Monitoring and performance:
- Amazon CloudWatch
- Governance and compliance:
- AWS Config
- Resource optimization:
- AWS Trusted Advisor
AWS CloudFormation, AWS OpsWorks for Chef Automate, Amazon EC2 Systems Manager, Amazon CloudWatch, AWS Config, and AWS Trusted Advisor enable you to integrate the concept of Infrastructure as Code into all phases of the project lifecycle. By using Infrastructure as Code, your organization can automatically deploy consistently built environments that, in turn, can help your organization to improve its overall maturity.
Resources: Parameters: # set of parameters Mappings: # set of mappings Resources: MyEC2Instance: Type: AWS::EC2::Instance Properties: ImageId: "ami-00000000" Outputs: # set of outputs
Resources: MyEC2Instance: Type: AWS::EC2::Instance Properties: ImageId: "ami-088ff0e3bde7b3fdf" InstanceType: "t2.micro"
AWS CloudFormation Intrinsic Functions
- FindInMap YAML example
Mappings: RegionMap: us-east-1: HVM64: "ami-0000" HVMG2: "ami-0000" us-west-1: HVM64: "ami-0000" HVMG2: "ami-0000" --- Resources: myEC2Instance: Type: "AWS::EC2::Instance" Properties: ImageId: !FindInMap - RegionMap - !Ref 'AWS::Region' # intrinsic function - HVM64 InstanceType: m1.small
AWS CloudFormation Wait Conditions
AWS CloudFormation Nested Stacks
AWS CloudFormation Deletion Policies
AWS CloudFormation Stack Updates
AWS CloudFormation Change Sets
AWS CloudFormation Custom Resources
AWS Elastic Beanstalk
AWS Elastic Beanstalk extensions
AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. AWS Config automatically builds an inventory of your resources and tracks changes made to them.
AWS Config also provides a clear view of the resource change timeline, including changes in both the resource configurations and the associations of those resources to other AWS resources.
When many different resources are changing frequently and automatically, automating compliance can become as important as automating the delivery pipeline. To respond to changes in the environment, you can use AWS Config rules.
- Best Practices
Here are some recommendations for implementing AWS Config in your environments:
- Enable AWS Config for all regions to record the configuration item history, to facilitate auditing and compliance tracking.
- Implement a process to respond to changes detected by AWS Config. This could include email notifications and the use of AWS Config rules to respond to changes programmatically.
- Delete Config service:
$ aws configservice delete-configuration-recorder --configuration-recorder-name default
AWS Config extends the concept of infrastructure code into the realm of governance and compliance. AWS Config can continuously record the configuration of resources while AWS Config rules allow for event-driven responses to changes in the configuration of tracked resources. You can use this capability to assist your organization with the monitoring of compliance controls.
AWS Managed Services
AWS Lambda Step Functions
AWS OpsWorks for Chef Automate brings the capabilities of Chef, a configuration management platform, to AWS. OpsWorks for Chef Automate further builds on Chef's capabilities by providing additional features that support DevOps capabilities at scale. Chef is based on the concept of recipes, configuration scripts written in the Ruby language that perform tasks such as installing services. Chef recipes, like AWS CloudFormation templates, are a form of source code that can be version controlled, thereby extending the principle of Infrastructure as Code to the configuration management stage of the resource lifecycle.
OpsWorks for Chef Automate expands the capabilities of Chef to enable your organization to implement DevOps at scale. OpsWorks for Chef Automate provides three key capabilities that you can configure to support DevOps practices: workflow, compliance, and visibility.
Monitoring and Logging
Amazon CloudWatch is a set of services that ingests, interprets, and responds to runtime metrics, logs, and events. CloudWatch automatically collects metrics from many AWS services, such as Amazon EC2, Elastic Load Balancing (ELB), and Amazon DynamoDB. Responses can include built-in actions such as sending notifications or custom actions handled by AWS Lambda, a serverless event-driven compute platform. The code for Lambda functions becomes part of the infrastructure codebase, thereby extending Infrastructure as Code to the operational level. CloudWatch consists of three services: the main CloudWatch service, Amazon CloudWatch Logs, and Amazon CloudWatch Events.
The main Amazon CloudWatch service collects and tracks metrics for many AWS services such as Amazon EC2, ELB, DynamoDB, and Amazon Relational Database Service (RDS). You can also create custom metrics for services you develop, such as applications. CloudWatch issues alarms when metrics reach a given threshold over a period of time.
Here are some examples of metrics and potential responses that could apply to the situations mentioned at the start of this section:
- If the latency of ELB exceeds five seconds over two minutes, send an email notification to the systems administrators.
- When the average EC2 instance CPU usage exceeds 60 percent for three minutes, launch another EC2 instance.
- Increase the capacity units of a DynamoDB table when excessive throttling occurs.
You can implement responses to metrics-based alarms using built-in notifications, or by writing custom Lambda functions in Python, Node.js, Java, or C#.
$ aws cloudwatch put-metric-data \ --metric-name randomNumber \ --namespace Random \ --value $(shuf -i 1-1000 -n1) \ --region=us-west-2
CloudWatch Custom Metrics
Amazon CloudWatch Events produces a stream of events from changes to AWS environments, applies a rules engine, and delivers matching events to specified targets. Examples of events that can be streamed include EC2 instance state changes, Auto Scaling actions, API calls published by CloudTrail, AWS console sign-ins, AWS Trusted Advisor optimization notifications, custom application- level events, and time-scheduled actions. Targets can include built-in actions such as SNS notifications or custom responses using Lambda functions.
The ability of an infrastructure to respond to selected events offers benefits in both operations and security. From the operations perspective, events can automate maintenance activities without having to manage a separate scheduling system. With regard to information security, events can provide notifications of console logins, authentication failures, and risky API calls recorded by CloudTrail. In both realms, incorporating event responses into the infrastructure code promotes a greater degree of self-healing and a higher level of operational maturity.
- Best Practices
Here are some recommendations for best practices related to monitoring:
- Ensure that all AWS resources are emitting metrics.
- Create CloudWatch alarms for metrics that provide the appropriate responses as metric-related events arise.
- Send logs from AWS resources, including Amazon S3 and Amazon EC2, to CloudWatch Logs for analysis using log stream triggers and Lambda functions.
- Schedule ongoing maintenance tasks with CloudWatch and Lambda.
- Use CloudWatch custom events to respond to application-level issues.
Amazon CloudWatch Logs monitors and stores logs from Amazon EC2, AWS CloudTrail, and other sources. EC2 instances can ship logging information using the CloudWatch Logs Agent and logging tools such as Logstash, Graylog, and Fluentd. Logs stored in Amazon S3 can be sent to CloudWatch Logs by configuring an Amazon S3 event to trigger a Lambda function.
Ingested log data can be the basis for new CloudWatch metrics that can, in turn, trigger CloudWatch alarms. You can use this capability to monitor any resource that generates logs without writing any code whatsoever. If you need a more advanced response procedure, you can create a Lambda function to take the appropriate actions. For example, a Lambda function can use the
SNS.Publish APIs to publish information to a Slack channel when
NullPointerException errors appear in production logs. Log processing and correlation allow for deeper analysis of application behaviours and can expose internal details that are hard to figure out from metrics. CloudWatch Logs provides both the storage and analysis of logs, and processing to enable data-driven responses to operational issues.
$ sudo dpkg -i amazon-cloudwatch-agent.deb $ sudo vi /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml # create and edit config $ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:cloudwatchconfig.cfg -s
Think managed "Kiali + Jaeger"
Policies and Standards Automation
AWS Service Catalog
AWS Trusted Advisor
AWS Trusted Advisor helps you observe best practices by scanning your AWS resources and comparing their usage against AWS best practices in four categories: cost optimization, performance, security, and fault tolerance. As part of ongoing improvement to your infrastructure and applications, taking advantage of Trusted Advisor can help keep your resources provisioned optimally.
Trusted Advisor provides a variety of checks to determine if the infrastructure is following best practices. The checks include detailed descriptions of recommended best practices, alert criteria, guidelines for action, and a list of useful resources on the topic. Trusted Advisor provides the results of the checks and can also provide ongoing weekly notifications for status updates and cost savings.
All customers have access to a core set of Trusted Advisor checks. Customers with AWS Business or Enterprise support can access all Trusted Advisor checks and the Trusted Advisor APIs. Using the APIs, you can obtain information from Trusted Advisor and take corrective actions. For example, a program could leverage Trusted Advisor to examine current account service limits. If current resource usages approach the limits, you can automatically create a support case to increase the limits.
Additionally, Trusted Advisor now integrates with Amazon CloudWatch Events. You can design a Lambda function to notify a Slack channel when the status of Trusted Advisor checks changes. These examples illustrate how the concept of Infrastructure as Code can be extended to the resource optimization level of the information resource lifecycle.
- Best Practices
The best practices for AWS Trusted Advisor are listed below:
- Subscribe to Trusted Advisor notifications through email or an alternative delivery system.
- Use distribution lists and ensure that the appropriate recipients are included on all such notifications.
- If you have AWS Business or Enterprise support, use the AWS Support API in conjunction with Trusted Advisor notifications to create cases with AWS Support to perform remediation.
You must continuously monitor your infrastructure to optimize the infrastructure resources with regard to performance, security, and cost. AWS Trusted Advisor provides the ability to use APIs to interrogate your AWS infrastructure for recommendations, thus extending Infrastructure as Code to the optimization phase of the information resource lifecycle.
AWS Systems Manager
Amazon EC2 Systems Manager is a collection of capabilities that simplifies common maintenance, management, deployment, and execution of operational tasks on EC2 instances and servers or virtual machines (VMs) in on-premises environments. Systems Manager helps you easily understand and control the current state of your EC2 instance and OS configurations. You can track and remotely manage system configuration, OS patch levels, application configurations, and other details about deployments as they occur over time. These capabilities help with automating complex and repetitive tasks, defining system configurations, preventing drift, and maintaining software compliance of both Amazon EC2 and on-premises configurations.
Systems Manager is a management service that assists with:
- Collecting software inventory
- Applying OS patches
- Creating system images
- Configuring operating systems
- Manage Hybrid Cloud systems from a single interface (AWS and on-prem)
- Reducing costs
- Run Command
Lets you run a given command(s) across all of your EC2 instances (or a group of them).
The table below lists the tasks that Systems Manager simplifies.
|Run Command||Manage the configuration of managed instances at scale by distributing commands across a fleet.|
|Inventory||Automate the collection of the software inventory from managed instances.|
|State Manager||Keep managed instances in a defined and consistent state.|
|Maintenance Window||Define a maintenance window for running administrative tasks.|
|Patch Manager||Deploy software patches automatically across groups of instances.|
|Automation||Perform common maintenance and deployment tasks, such as updating Amazon Machine Images (AMIs).|
|Parameter Store|| Store, control, access, and retrieve configuration data, whether plain-text data such as database strings or secrets such
as passwords, encrypted through AWS Key Management System (KMS).
AWS Secrets Manager
Macie is a security server that uses machine learning to automatically discover, classify, and protect sensitive data in AWS.
- Can recognize any Personally Identifiable Information (PII)
- Provides a dashboard
- Monitors data access activity for anomalies
- Generates detailed alerts when it detects risk of unauthorized access or accidental data leaks
- As of February 2021 <CHECK>, it only protects data in S3, with more AWS data stores planned for the future.
- It gives you superior visibility of data
- Simple to set up and easy to manage
AWS Certificate Manager
Incident and Event Response
GuardDuty is a threat-detection service that continuously monitors for malicious or unauthorized behaviour.
Inspector is an automated service that assesses your applications for vulnerabilities and produces a security findings report.
Easily collect, process, and analyze video and data streams in real time.
- Kinesis Data Analytics
- Analyze streaming data
- Respond in real-time
- Query using SQL
- Completely managed service (no servers required)
- Pay-as-you-go for what you use
- Powerful real-time processing
- Kinesis Data Firehose
- Deliver streaming data
- No applications to write or manage
- Just configure the producer
- Data can be transformed
- Destinations such as S3, Redshift, ElasiticSearch, and Splunk
- Accepts records in chunks of up to 1,000 kb
- Kinesis Data Streams
- Collect streaming data
- Massively scalable
- Capture gigabytes per second (from thousands of sources)
- Data is available in milliseconds
- Durable (data in stored in 3 x DCs in a region)
- Data is stored for 7 days
- Kinesis Video Streams
- Collect streaming video
- Can handle ingestion from millions of devices
- Enables live and on-demand playback
- Take advantage of Amazon Recognition Video and Machine Learning frameworks for video
- Access your data through APIs
- Build real-time video enabled applications
Review the tutorial: "Using AWS Lambda with Amazon Kinesis".
High Availability, Fault Tolerance, and Disaster Recovery
- AWS Single Sign-On
- Amazon CloudFront
- AutoScaling and Lifecycle hooks
- Amazon Route53
- Amazon RDS
- Amazon Aurora
- Amazon DynamoDB
- Amazon DynamoDB Keys and Streams
Other Services You Need to Know About
- Amazon Elastic File System
- Amazon ElastiCache
- Amazon S3 Glacier
- AWS Direct Connect
- AWS Lambda Function Dead Letter Queues
- Amazon CloudSearch
- Amazon Elasticsearch Service
- Amazon DynamoDB Accelerator
- AWS Server Migration Service
- AWS Certified DevOps Engineer - Professional
- Introduction to AWS DevOps (whitepapers)
- Prepare for Your AWS Certification Exam
- Study tips