Difference between revisions of "Kubernetes/GKE"

From Christoph's Personal Wiki
Jump to: navigation, search
(External links)
Line 346: Line 346:
 
All the Jobs were removed.
 
All the Jobs were removed.
  
 +
 +
==Cluster scaling==
 +
 +
Think of cluster scaling as a coarse-grain operation that should happen infrequently in pods scaling with deployments as a fine-grain operation that should happen frequently.
 +
 +
; Pod conditions that prevent node deletion
 +
* Not run by a controller
 +
** e.g., Pods that are not set in a Deployment, ReplicaSet, Job, etc.
 +
* Has local storage
 +
* Restricted by constraint rules
 +
* Pods that have <code>cluster-autoscaler.kubernetes.io/safe-to-evict</code> annotation set to False
 +
* Pods that have the <code>RestrictivePodDisruptionBudget</code>
 +
* At the node-level, if the <code>kubernetes.io/scale-down-disabled</code> annotation is set to True
 +
 +
; gcloud
 +
 +
* Create a cluster with autoscaling enabled:
 +
<pre>
 +
$ gcloud container clusters create <cluster-name> \
 +
  --num-nodes 30 \
 +
  --enable-autoscaling \
 +
  --min-nodes 15 \
 +
  --max-nodes 50 \
 +
  [--zone <compute-zone>]
 +
</pre>
 +
 +
* Add a node pool with autoscaling enabled:
 +
<pre>
 +
$ gcloud container node-pools create <pool-name> \
 +
  --cluster <cluster-name> \
 +
  --enable-autoscaling \
 +
  --min-nodes 15 \
 +
  --max-nodes 50 \
 +
  [--zone <compute-zone>]
 +
</pre>
 +
 +
* Enable autoscaling for an existing node pool:
 +
<pre>
 +
$ gcloud container clusters update \
 +
  <cluster-name> \
 +
  --enable-autoscaling \
 +
  --min-nodes 1 \
 +
  --max-nodes 10 \
 +
  --zone <compute-zone> \
 +
  --node-pool <pool-name>
 +
</pre>
 +
 +
* Disable autoscaling for an existing node pool:
 +
<pre>
 +
$ gcloud container clusters update \
 +
  <cluster-name> \
 +
  --no-enable-autoscaling \
 +
  --node-pool <pool-name> \
 +
  [--zone <compute-zone> --project <project-id>]
 +
</pre>
 +
 +
==Configuring Pod Autoscaling and NodePools==
 +
 +
===Create a GKE cluster===
 +
 +
In Cloud Shell, type the following command to create environment variables for the GCP zone and cluster name that will be used to create the cluster for this lab.
 +
<pre>
 +
export my_zone=us-central1-a
 +
export my_cluster=standard-cluster-1
 +
</pre>
 +
 +
* Configure tab completion for the kubectl command-line tool.
 +
<pre>
 +
source <(kubectl completion bash)
 +
</pre>
 +
 +
* Create a VPC-native Kubernetes cluster:
 +
<pre>
 +
$ gcloud container clusters create $my_cluster \
 +
  --num-nodes 2 --enable-ip-alias --zone $my_zone
 +
</pre>
 +
 +
* Configure access to your cluster for kubectl:
 +
<pre>
 +
$ gcloud container clusters get-credentials $my_cluster --zone $my_zone
 +
</pre>
 +
 +
; Deploy a sample web application to your GKE cluster
 +
 +
Deploy a sample application to your cluster using the web.yaml deployment file that has been created for you:
 +
<pre>
 +
apiVersion: extensions/v1beta1
 +
kind: Deployment
 +
metadata:
 +
  name: web
 +
spec:
 +
  replicas: 1
 +
  selector:
 +
    matchLabels:
 +
      run: web
 +
  template:
 +
    metadata:
 +
      labels:
 +
        run: web
 +
    spec:
 +
      containers:
 +
      - image: gcr.io/google-samples/hello-app:1.0
 +
        name: web
 +
        ports:
 +
        - containerPort: 8080
 +
          protocol: TCP
 +
</pre>
 +
This manifest creates a deployment using a sample web application container image that listens on an HTTP server on port 8080.
 +
 +
* To create a deployment from this file, execute the following command:
 +
<pre>
 +
$ kubectl create -f web.yaml --save-config
 +
</pre>
 +
 +
* Create a service resource of type NodePort on port 8080 for the web deployment:
 +
<pre>
 +
$ kubectl expose deployment web --target-port=8080 --type=NodePort
 +
</pre>
 +
 +
* Verify that the service was created and that a node port was allocated:
 +
<pre>
 +
$ kubectl get service web
 +
NAME  TYPE      CLUSTER-IP    EXTERNAL-IP  PORT(S)          AGE
 +
web    NodePort  10.12.6.154  <none>        8080:30972/TCP  5m4s
 +
</pre>
 +
 +
Your IP address and port number might be different from the example output.
 +
 +
===Configure autoscaling on the cluster===
 +
 +
In this section, we will configure the cluster to automatically scale the sample application that we deployed earlier.
 +
 +
; Configure autoscaling
 +
 +
* Get the list of deployments to determine whether your sample web application is still running:
 +
<pre>
 +
$ kubectl get deployment
 +
NAME  DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
 +
web    1        1        1            1          94s
 +
</pre>
 +
 +
* To configure your sample application for autoscaling (and to set the maximum number of replicas to four and the minimum to one, with a CPU utilization target of 1%), execute the following command:
 +
<pre>
 +
$ kubectl autoscale deployment web --max 4 --min 1 --cpu-percent 1
 +
</pre>
 +
 +
When you use kubectl autoscale, you specify a maximum and minimum number of replicas for your application, as well as a CPU utilization target.
 +
 +
* Get the list of deployments to verify that there is still only one deployment of the web application:
 +
<pre>
 +
$ kubectl get deployment
 +
</pre>
 +
 +
; Inspect the HorizontalPodAutoscaler object
 +
 +
The kubectl autoscale command you used in the previous task creates a HorizontalPodAutoscaler object that targets a specified resource, called the scale target, and scales it as needed. The autoscaler periodically adjusts the number of replicas of the scale target to match the average CPU utilization that you specify when creating the autoscaler.
 +
 +
* To get the list of HorizontalPodAutoscaler resources, execute the following command:
 +
<pre>
 +
$ kubectl get hpa
 +
NAME  REFERENCE        TARGETS  MINPODS  MAXPODS  REPLICAS  AGE
 +
web    Deployment/web  1%/1%    1        4        1          50s
 +
</pre>
 +
 +
* To inspect the configuration of HorizontalPodAutoscaler in YAML form, execute the following command:
 +
<pre>
 +
$ kubectl describe horizontalpodautoscaler web
 +
<pre>
 +
Name:                                                  web
 +
Namespace:                                            default
 +
Labels:                                                <none>
 +
Annotations:                                          <none>
 +
CreationTimestamp:                                    Thu, 15 Aug 2019 12:32:37 -0700
 +
Reference:                                            Deployment/web
 +
Metrics:                                              ( current / target )
 +
  resource cpu on pods  (as a percentage of request):  1% (1m) / 1%
 +
Min replicas:                                          1
 +
Max replicas:                                          4
 +
Deployment pods:                                      1 current / 1 desired
 +
Conditions:
 +
  Type            Status  Reason              Message
 +
  ----            ------  ------              -------
 +
  AbleToScale    True    ReadyForNewScale    recommended size matches current size
 +
  ScalingActive  True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
 +
  ScalingLimited  False  DesiredWithinRange  the desired count is within the acceptable range
 +
Events:          <none>
 +
</pre>
 +
 +
; Test the autoscale configuration
 +
 +
You need to create a heavy load on the web application to force it to scale out. You create a configuration file that defines a deployment of four containers that run an infinite loop of HTTP queries against the sample application web server.
 +
 +
You create the load on your web application by deploying the loadgen application using the loadgen.yaml file that has been provided for you.
 +
<pre>
 +
apiVersion: apps/v1
 +
kind: Deployment
 +
metadata:
 +
  name: loadgen
 +
spec:
 +
  replicas: 4
 +
  selector:
 +
    matchLabels:
 +
      app: loadgen
 +
  template:
 +
    metadata:
 +
      labels:
 +
        app: loadgen
 +
    spec:
 +
      containers:
 +
      - name: loadgen
 +
        image: k8s.gcr.io/busybox
 +
        args:
 +
        - /bin/sh
 +
        - -c
 +
        - while true; do wget -q -O- http://web:8080; done
 +
</pre>
 +
 +
* Get the list of deployments to verify that the load generator is running:
 +
<pre>
 +
$ kubectl get deployment
 +
NAME      DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
 +
loadgen  4        4        4            4          11s
 +
web      1        1        1            1          9m9s
 +
</pre>
 +
 +
* Inspect HorizontalPodAutoscaler:
 +
<pre>
 +
$ kubectl get hpa
 +
NAME  REFERENCE        TARGETS  MINPODS  MAXPODS  REPLICAS  AGE
 +
web    Deployment/web  20%/1%    1        4        1          7m58s
 +
</pre>
 +
 +
Once the loadgen Pod starts to generate traffic, the web deployment CPU utilization begins to increase. In the example output, the targets are now at 35% CPU utilization compared to the 1% CPU threshold.
 +
 +
* After a few minutes, inspect the HorizontalPodAutoscaler again:
 +
<pre>
 +
$ kubectl get hpa
 +
NAME  REFERENCE        TARGETS  MINPODS  MAXPODS  REPLICAS  AGE
 +
web    Deployment/web  68%/1%    1        4        4          9m39s
 +
 +
$ kubectl get deployment
 +
NAME      DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
 +
loadgen  4        4        4            4          2m44s
 +
web      4        4        4            3          11m
 +
</pre>
 +
 +
* To stop the load on the web application, scale the loadgen deployment to zero replicas.
 +
<pre>
 +
$ kubectl scale deployment loadgen --replicas 0
 +
</pre>
 +
 +
* Get the list of deployments to verify that loadgen has scaled down.
 +
<pre>
 +
$ kubectl get deployment
 +
NAME      DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
 +
loadgen  0        0        0            0          3m25s
 +
web      4        4        4            3          12m
 +
</pre>
 +
The loadgen deployment should have zero replicas.
 +
 +
Wait 2 to 3 minutes, and then get the list of deployments again to verify that the web application has scaled down to the minimum value of 1 replica that you configured when you deployed the autoscaler.
 +
 +
<pre>
 +
$ kubectl get deployment
 +
NAME      DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
 +
loadgen  0        0        0            0          4m
 +
web      1        1        1            1          15m
 +
</pre>
 +
 +
You should now have one deployment of the web application.
  
 
==External links==
 
==External links==

Revision as of 19:50, 23 August 2019

Google Kubernetes Engine (GKE) is a managed, production-ready environment for deploying containerized applications in Kubernetes.

Deployments

A Deployment's rollout is triggered if and only if the Deployment's Pod template (that is, .spec.template) is changed, for example, if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.


Trigger a deployment rollout
  • To update the version of nginx in the deployment, execute the following command:
$ kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record
$ kubectl rollout status deployment.v1.apps/nginx-deployment
$ kubectl rollout history deployment nginx-deployment
Trigger a deployment rollback

To roll back an object's rollout, you can use the kubectl rollout undo command.

To roll back to the previous version of the nginx deployment, execute the following command:

$ kubectl rollout undo deployments nginx-deployment
  • View the updated rollout history of the deployment.
$ kubectl rollout history deployment nginx-deployment

deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
2         kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
3         <none>
  • View the details of the latest deployment revision:
$ kubectl rollout history deployment/nginx-deployment --revision=3

The output should look like the example. Your output might not be an exact match but it will show that the current revision has rolled back to nginx:1.7.9.

deployments "nginx-deployment" with revision #3
Pod Template:
  Labels:       app=nginx
        pod-template-hash=3123191453
  Containers:
   nginx:
    Image:      nginx:1.7.9
    Port:       80/TCP
    Host Port:  0/TCP
    Environment:        <none>
    Mounts:     <none>
  Volumes:      <none>

Perform a canary deployment

A canary deployment is a separate deployment used to test a new version of your application. A single service targets both the canary and the normal deployments. And it can direct a subset of users to the canary version to mitigate the risk of new releases. The manifest file nginx-canary.yaml that is provided for you deploys a single pod running a newer version of nginx than your main deployment. In this task, you create a canary deployment using this new deployment file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-canary
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        track: canary
        Version: 1.9.1
    spec:
      containers:
      - name: nginx
        image: nginx:1.9.1
        ports:
        - containerPort: 80

The manifest for the nginx Service you deployed in the previous task uses a label selector to target the Pods with the app: nginx label. Both the normal deployment and this new canary deployment have the app: nginx label. Inbound connections will be distributed by the service to both the normal and canary deployment Pods. The canary deployment has fewer replicas (Pods) than the normal deployment, and thus it is available to fewer users than the normal deployment.

  • Create the canary deployment based on the configuration file.
$ kubectl apply -f nginx-canary.yaml

When the deployment is complete, verify that both the nginx and the nginx-canary deployments are present.

$ kubectl get deployments

Switch back to the browser tab that is connected to the external LoadBalancer service ip and refresh the page. You should continue to see the standard "Welcome to nginx" page.

Switch back to the Cloud Shell and scale down the primary deployment to 0 replicas.

$ kubectl scale --replicas=0 deployment nginx-deployment

Verify that the only running replica is now the Canary deployment:

$ kubectl get deployments

Switch back to the browser tab that is connected to the external LoadBalancer service ip and refresh the page. You should continue to see the standard "Welcome to nginx" page showing that the Service is automatically balancing traffic to the canary deployment.

Note: Session affinity The Service configuration used in the lab does not ensure that all requests from a single client will always connect to the same Pod. Each request is treated separately and can connect to either the normal nginx deployment or to the nginx-canary deployment. This potential to switch between different versions may cause problems if there are significant changes in functionality in the canary release. To prevent this you can set the sessionAffinity field to ClientIP in the specification of the service if you need a client's first request to determine which Pod will be used for all subsequent connections.

For example:

apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: LoadBalancer
  sessionAffinity: ClientIP
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 60000
    targetPort: 80

Jobs and CronJobs

  • Simple example:
$ kubectl run pi --image perl --restart Never -- perl -Mbignum bpi -wle 'print bpi(2000)'
Parallel Job with fixed completion count
$ cat << EOF > my-app-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: my-app-job
spec:
  completions: 3
  parallelism: 2
  template:
    spec:
[...]
EOF
spec:
  backoffLimit: 4
  activeDeadlineSeconds: 300
Example#1
Create and run a Job

You will create a job using a sample deployment manifest called example-job.yaml that has been provided for you. This Job computes the value of Pi to 2,000 places and then prints the result.

apiVersion: batch/v1
kind: Job
metadata:
  # Unique key of the Job instance
  name: example-job
spec:
  template:
    metadata:
      name: example-job
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl"]
        args: ["-Mbignum=bpi", "-wle", "print bpi(2000)"]
      # Do not restart containers after they exit
      restartPolicy: Never

To create a Job from this file, execute the following command:

$ kubectl apply -f example-job.yaml
$ kubectl describe job
    Host Port:  <none>
    Command:
      perl
    Args:
      -Mbignum=bpi
      -wle
      print bpi(2000)
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  17s   job-controller  Created pod: example-job-gtf7w

$ kubectl get pods
NAME                READY   STATUS      RESTARTS   AGE
example-job-gtf7w   0/1     Completed   0          43s
Clean up and delete the Job

When a Job completes, the Job stops creating Pods. The Job API object is not removed when it completes, which allows you to view its status. Pods created by the Job are not deleted, but they are terminated. Retention of the Pods allows you to view their logs and to interact with them.

To get a list of the Jobs in the cluster, execute the following command:

$ kubectl get jobs

NAME          DESIRED   SUCCESSFUL   AGE
example-job   1         1            2m

To retrieve the log file from the Pod that ran the Job execute the following command. You must replace [POD-NAME] with the node name you recorded in the last task

$ kubectl logs [POD-NAME]
3.141592653589793238...

The output will show that the job wrote the first two thousand digits of pi to the Pod log.

To delete the Job, execute the following command:

$ kubectl delete job example-job

If you try to query the logs again the command will fail as the Pod can no longer be found.

Define and deploy a CronJob manifest

You can create CronJobs to perform finite, time-related tasks that run once or repeatedly at a time that you specify.

In this section, we will create and run a CronJob, and then clean up and delete the Job.

Create and run a CronJob

The CronJob manifest file example-cronjob.yaml has been provided for you. This CronJob deploys a new container every minute that prints the time, date and "Hello, World!".

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo "Hello, World!"
          restartPolicy: OnFailure

<block> Note

CronJobs use the required schedule field, which accepts a time in the Unix standard crontab format. All CronJob times are in UTC:

  • The first value indicates the minute (between 0 and 59).
  • The second value indicates the hour (between 0 and 23).
  • The third value indicates the day of the month (between 1 and 31).
  • The fourth value indicates the month (between 1 and 12).
  • The fifth value indicates the day of the week (between 0 and 6).

The schedule field also accepts * and ? as wildcard values. Combining / with ranges specifies that the task should repeat at a regular interval. In the example, */1 * * * * indicates that the task should repeat every minute of every day of every month. </block>

To create a Job from this file, execute the following command:

$ kubectl apply -f example-cronjob.yaml
<pre>

To check the status of this Job, execute the following command, where [job_name] is the name of your job:
<pre>
$ kubectl describe job [job_name]

    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Args:
      /bin/sh
      -c
      date; echo "Hello, World!"
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  35s   job-controller  Created pod: hello-1565824980-sgdnn

View the output of the Job by querying the logs for the Pod. Replace [POD-NAME] with the name of the Pod you recorded in the last step.

$ kubectl logs <pod-name>

Wed Aug 14 23:23:03 UTC 2019
Hello, World!

To view all job resources in your cluster, including all of the Pods created by the CronJob which have completed, execute the following command:

$ kubectl get jobs

NAME               COMPLETIONS   DURATION   AGE
hello-1565824980   1/1           2s         2m29s
hello-1565825040   1/1           2s         89s
hello-1565825100   1/1           2s         29s

Your job names might be different from the example output. By default, Kubernetes sets the Job history limits so that only the last three successful and last failed job are retained so this list will only contain the most recent three of four jobs.

Clean up and delete the Job

In order to stop the CronJob and clean up the Jobs associated with it you must delete the CronJob.

To delete all these jobs, execute the following command:

$ kubectl delete cronjob hello

To verify that the jobs were deleted, execute the following command:

$ kubectl get jobs
No resources found.

All the Jobs were removed.


Cluster scaling

Think of cluster scaling as a coarse-grain operation that should happen infrequently in pods scaling with deployments as a fine-grain operation that should happen frequently.

Pod conditions that prevent node deletion
  • Not run by a controller
    • e.g., Pods that are not set in a Deployment, ReplicaSet, Job, etc.
  • Has local storage
  • Restricted by constraint rules
  • Pods that have cluster-autoscaler.kubernetes.io/safe-to-evict annotation set to False
  • Pods that have the RestrictivePodDisruptionBudget
  • At the node-level, if the kubernetes.io/scale-down-disabled annotation is set to True
gcloud
  • Create a cluster with autoscaling enabled:
$ gcloud container clusters create <cluster-name> \
  --num-nodes 30 \
  --enable-autoscaling \
  --min-nodes 15 \
  --max-nodes 50 \
  [--zone <compute-zone>]
  • Add a node pool with autoscaling enabled:
$ gcloud container node-pools create <pool-name> \
  --cluster <cluster-name> \
  --enable-autoscaling \
  --min-nodes 15 \
  --max-nodes 50 \
  [--zone <compute-zone>]
  • Enable autoscaling for an existing node pool:
$ gcloud container clusters update \
  <cluster-name> \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --zone <compute-zone> \
  --node-pool <pool-name>
  • Disable autoscaling for an existing node pool:
$ gcloud container clusters update \
  <cluster-name> \
  --no-enable-autoscaling \
  --node-pool <pool-name> \
  [--zone <compute-zone> --project <project-id>]

Configuring Pod Autoscaling and NodePools

Create a GKE cluster

In Cloud Shell, type the following command to create environment variables for the GCP zone and cluster name that will be used to create the cluster for this lab.

export my_zone=us-central1-a
export my_cluster=standard-cluster-1
  • Configure tab completion for the kubectl command-line tool.
source <(kubectl completion bash)
  • Create a VPC-native Kubernetes cluster:
$ gcloud container clusters create $my_cluster \
   --num-nodes 2 --enable-ip-alias --zone $my_zone
  • Configure access to your cluster for kubectl:
$ gcloud container clusters get-credentials $my_cluster --zone $my_zone
Deploy a sample web application to your GKE cluster

Deploy a sample application to your cluster using the web.yaml deployment file that has been created for you:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      run: web
  template:
    metadata:
      labels:
        run: web
    spec:
      containers:
      - image: gcr.io/google-samples/hello-app:1.0
        name: web
        ports:
        - containerPort: 8080
          protocol: TCP

This manifest creates a deployment using a sample web application container image that listens on an HTTP server on port 8080.

  • To create a deployment from this file, execute the following command:
$ kubectl create -f web.yaml --save-config

  • Create a service resource of type NodePort on port 8080 for the web deployment:
$ kubectl expose deployment web --target-port=8080 --type=NodePort

  • Verify that the service was created and that a node port was allocated:
$ kubectl get service web
NAME   TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
web    NodePort   10.12.6.154   <none>        8080:30972/TCP   5m4s

Your IP address and port number might be different from the example output.

Configure autoscaling on the cluster

In this section, we will configure the cluster to automatically scale the sample application that we deployed earlier.

Configure autoscaling
  • Get the list of deployments to determine whether your sample web application is still running:
$ kubectl get deployment
NAME   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
web    1         1         1            1           94s
  • To configure your sample application for autoscaling (and to set the maximum number of replicas to four and the minimum to one, with a CPU utilization target of 1%), execute the following command:
$ kubectl autoscale deployment web --max 4 --min 1 --cpu-percent 1

When you use kubectl autoscale, you specify a maximum and minimum number of replicas for your application, as well as a CPU utilization target.

  • Get the list of deployments to verify that there is still only one deployment of the web application:
$ kubectl get deployment
Inspect the HorizontalPodAutoscaler object

The kubectl autoscale command you used in the previous task creates a HorizontalPodAutoscaler object that targets a specified resource, called the scale target, and scales it as needed. The autoscaler periodically adjusts the number of replicas of the scale target to match the average CPU utilization that you specify when creating the autoscaler.

  • To get the list of HorizontalPodAutoscaler resources, execute the following command:
$ kubectl get hpa
NAME   REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   1%/1%     1         4         1          50s
  • To inspect the configuration of HorizontalPodAutoscaler in YAML form, execute the following command:
$ kubectl describe horizontalpodautoscaler web
<pre>
Name:                                                  web
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 15 Aug 2019 12:32:37 -0700
Reference:                                             Deployment/web
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  1% (1m) / 1%
Min replicas:                                          1
Max replicas:                                          4
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>
Test the autoscale configuration

You need to create a heavy load on the web application to force it to scale out. You create a configuration file that defines a deployment of four containers that run an infinite loop of HTTP queries against the sample application web server.

You create the load on your web application by deploying the loadgen application using the loadgen.yaml file that has been provided for you.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loadgen
spec:
  replicas: 4
  selector:
    matchLabels:
      app: loadgen
  template:
    metadata:
      labels:
        app: loadgen
    spec:
      containers:
      - name: loadgen
        image: k8s.gcr.io/busybox
        args:
        - /bin/sh
        - -c
        - while true; do wget -q -O- http://web:8080; done
  • Get the list of deployments to verify that the load generator is running:
$ kubectl get deployment
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
loadgen   4         4         4            4           11s
web       1         1         1            1           9m9s
  • Inspect HorizontalPodAutoscaler:
$ kubectl get hpa
NAME   REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   20%/1%    1         4         1          7m58s

Once the loadgen Pod starts to generate traffic, the web deployment CPU utilization begins to increase. In the example output, the targets are now at 35% CPU utilization compared to the 1% CPU threshold.

  • After a few minutes, inspect the HorizontalPodAutoscaler again:
$ kubectl get hpa
NAME   REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   68%/1%    1         4         4          9m39s

$ kubectl get deployment
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
loadgen   4         4         4            4           2m44s
web       4         4         4            3           11m
  • To stop the load on the web application, scale the loadgen deployment to zero replicas.
$ kubectl scale deployment loadgen --replicas 0
  • Get the list of deployments to verify that loadgen has scaled down.
$ kubectl get deployment
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
loadgen   0         0         0            0           3m25s
web       4         4         4            3           12m

The loadgen deployment should have zero replicas.

Wait 2 to 3 minutes, and then get the list of deployments again to verify that the web application has scaled down to the minimum value of 1 replica that you configured when you deployed the autoscaler.

$ kubectl get deployment
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
loadgen   0         0         0            0           4m
web       1         1         1            1           15m

You should now have one deployment of the web application.

External links