Difference between revisions of "Kubernetes"

From Christoph's Personal Wiki
Jump to: navigation, search
(Run tcpdump on containers running in Pods)
(Release history)
(32 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
==Release history==
 
==Release history==
  
NOTE: There is no such thing as Kubernetes Long-Term-Support (LTS). There is a new "minor" release roughly every 3 months.
+
'''NOTE:''' I have been using Kubernetes since release 1.0 back in September 2015.
 +
 
 +
NOTE: There is no such thing as Kubernetes Long-Term-Support (LTS). There is a new "minor" release ''roughly'' every 3 months (note: changed to ''roughly'' every 4 months in 2020).
  
 
<div style="float:left; margin:0px 20px 20px 0px;">
 
<div style="float:left; margin:0px 20px 20px 0px;">
Line 56: Line 58:
 
|--bgcolor="#eeeeee"
 
|--bgcolor="#eeeeee"
 
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md 1.19] || 2020-08-26 ||align="right"| 154
 
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md 1.19] || 2020-08-26 ||align="right"| 154
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md 1.20] || 2020-12-08 ||align="right"| 104
 +
|--bgcolor="#eeeeee"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md 1.21] || 2021-04-08 ||align="right"| 121
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md 1.22] || 2021-08-04 ||align="right"| 118
 +
|--bgcolor="#eeeeee"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md 1.23] || 2021-12-07 ||align="right"| 125
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md 1.24] || 2022-05-03 ||align="right"| 147
 +
|--bgcolor="#eeeeee"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md 1.25] || 2022-08-23 ||align="right"| 112
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md 1.26] || 2023-01-18 ||align="right"| 148
 +
|--bgcolor="#eeeeee"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md 1.27] || 2023-04-11 ||align="right"| 83
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md 1.28] || 2023-08-15 ||align="right"| 126
 +
|- align="left"
 +
|[https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md 1.29] || 2023-12-13 ||align="right"| 120
 
|}
 
|}
 
</div>
 
</div>
Line 1,157: Line 1,179:
 
* Test install
 
* Test install
 
  $ minikube start
 
  $ minikube start
 +
#~OR~
 +
$ minikube start --memory 4096 # give it 4GB of RAM
 
  $ minikube status
 
  $ minikube status
 
  $ minikube dashboard
 
  $ minikube dashboard
 
  $ kubectl config view
 
  $ kubectl config view
 
  $ kubectl cluster-info
 
  $ kubectl cluster-info
 +
 +
NOTE: If you have an old version of minikube installed, you should probably do the following before upgrading to a much newer version:
 +
$ minikube delete --all --purge
  
 
Get the details on the CLI options for kubectl [https://kubernetes.io/docs/reference/kubectl/overview/ here].
 
Get the details on the CLI options for kubectl [https://kubernetes.io/docs/reference/kubectl/overview/ here].
Line 1,720: Line 1,747:
 
  $ kubectl get po --sort-by='{.firstTimestamp}'.
 
  $ kubectl get po --sort-by='{.firstTimestamp}'.
 
  $ kubectl get pods --all-namespaces --sort-by=.metadata.creationTimestamp
 
  $ kubectl get pods --all-namespaces --sort-by=.metadata.creationTimestamp
 +
 +
* Backup all primitives deployed in a given k8s cluster:
 +
<pre>
 +
$ kubectl api-resources --verbs=list --namespaced -o name \
 +
    | xargs -n1 -I{} bash -c "kubectl get {} --all-namespaces -oyaml && echo ---" \
 +
    > k8s_backup.yaml
 +
</pre>
 +
 +
===kubectl explain===
 +
 +
;List the fields for supported resources.
 +
 +
* Get the documentation of a resource (aka "kind") and its fields:
 +
<pre>
 +
$ kubectl explain deployment
 +
KIND:    Deployment
 +
VERSION:  apps/v1
 +
 +
DESCRIPTION:
 +
    Deployment enables declarative updates for Pods and ReplicaSets.
 +
 +
FIELDS:
 +
  apiVersion <string>
 +
    APIVersion defines the versioned schema of this representation of an
 +
    object. Servers should convert recognized schemas to the latest internal
 +
    value, and may reject unrecognized values. More info:
 +
    https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
 +
 +
  kind <string>
 +
    Kind is a string value representing the REST resource this object
 +
    represents. Servers may infer this from the endpoint the client submits
 +
    requests to. Cannot be updated. In CamelCase. More info:
 +
    https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
 +
 +
  metadata <Object>
 +
    Standard object metadata.
 +
 +
  spec <Object>
 +
    Specification of the desired behavior of the Deployment.
 +
 +
  status <Object>
 +
    Most recently observed status of the Deployment
 +
</pre>
 +
 +
* Get a list of all the resource types and their latest supported version:
 +
<pre>
 +
$ for kind in $(kubectl api-resources | tail +2 | awk '{print $1}'); do
 +
    kubectl explain ${kind};
 +
  done | grep -E "^KIND:|^VERSION:"
 +
 +
KIND:    Binding
 +
VERSION:  v1
 +
KIND:    ComponentStatus
 +
VERSION:  v1
 +
KIND:    ConfigMap
 +
VERSION:  v1
 +
...
 +
</pre>
 +
 +
* Get a list of ''all'' allowable fields for a given primitive:
 +
<pre>
 +
$ kubectl explain deployment --recursive | head
 +
KIND:    Deployment
 +
VERSION:  apps/v1
 +
 +
DESCRIPTION:
 +
    Deployment enables declarative updates for Pods and ReplicaSets.
 +
 +
FIELDS:
 +
  apiVersion <string>
 +
  kind <string>
 +
  metadata <Object>
 +
</pre>
 +
 +
* Get documentation ("man page"-style) for a given field in a given primitive:
 +
<pre>
 +
$ kubectl explain deployment.status.availableReplicas
 +
KIND:    Deployment
 +
VERSION:  apps/v1
 +
 +
FIELD:    availableReplicas <integer>
 +
 +
DESCRIPTION:
 +
    Total number of available pods (ready for at least minReadySeconds)
 +
    targeted by this deployment.
 +
</pre>
 +
 +
===Merge kubeconfig files===
 +
 +
* Reference which kubeconfig files you wish to merge:
 +
$ export KUBECONFIG=$HOME/.kube/dev.yaml:$HOME/.kube/prod.yaml
 +
 +
* Flatten them:
 +
$ kubectl config view --flatten >> $HOME/.kube/config
 +
 +
* Unset:
 +
$ unset KUBECONFIG
 +
 +
Merge complete.
  
 
==Namespaces==
 
==Namespaces==
Line 3,318: Line 3,444:
 
  #~OR~ (if using an older version of `jq`)
 
  #~OR~ (if using an older version of `jq`)
 
  $ kubectl get nodes -o json | jq '.items[].metadata.name' | tr -d '"'
 
  $ kubectl get nodes -o json | jq '.items[].metadata.name' | tr -d '"'
 +
 +
* Label a list of nodes:
 +
<pre>
 +
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
 +
  kubectl label nodes ${node} instancetype=ondemand;
 +
  kubectl label nodes ${node} "example.io/node-lifecycle"=od;
 +
done
 +
</pre>
 +
 +
* Delete a bunch of Pods in "Evicted" state:
 +
$ kubectl get pod -n develop | awk '/Evicted/{print $1}' | xargs kubectl delete pod -n develop
 +
#~OR~
 +
$ kubectl get po -a --all-namespaces -o json | \
 +
    jq  '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) |
 +
    "kubectl delete po \(.metadata.name) -n \(.metadata.namespace)"' | xargs -n 1 bash -c
  
 
* Get a random node:
 
* Get a random node:
Line 3,374: Line 3,515:
 
  $ kubectl get rolebindings --all-namepsaces -o go-template \
 
  $ kubectl get rolebindings --all-namepsaces -o go-template \
 
     --template='<nowiki>{{range .items}}{{println}}{{.metadata.namespace}}={{range .subjects}}{{if eq .kind "User"}}{{.name}} {{end}}{{end}}{{end}}</nowiki>'
 
     --template='<nowiki>{{range .items}}{{println}}{{.metadata.namespace}}={{range .subjects}}{{if eq .kind "User"}}{{.name}} {{end}}{{end}}{{end}}</nowiki>'
 +
 +
* Get the memory limit assigned to a container in a given Pod:
 +
<pre>
 +
$ kubectl get pod example-pod-name -n default \
 +
  -o jsonpath="{.spec.containers[*].resources.limits}"
 +
</pre>
  
 
* Get a Bash prompt of your current context and namespace:
 
* Get a Bash prompt of your current context and namespace:
Line 3,456: Line 3,603:
 
* Execute a command in every pod / replica:
 
* Execute a command in every pod / replica:
 
  $ for i in 0 1; do kubectl exec foo-$i -- sh -c 'echo $(hostname) > /usr/share/nginx/html/index.html'; done
 
  $ for i in 0 1; do kubectl exec foo-$i -- sh -c 'echo $(hostname) > /usr/share/nginx/html/index.html'; done
 +
 +
* Get a list of ''all'' container IDs running in ''all'' Pods in ''all'' namespaces for a given Kubernetes cluster:
 +
<pre>
 +
$ kubectl get pods --all-namespaces \
 +
    -o jsonpath='{range .items[*]}{"pod: "}{.metadata.name}{"\n"}{range .status.containerStatuses[*]}{"\tname: "}{.containerID}{"\n\timage: "}{.image}{"\n"}{end}'
 +
 +
# Example output:
 +
pod: cert-manager-848f547974-8m2k6
 +
        name: containerd://358415173310a528a36ca2c19cdc3319f8fd96634c09957977767333b104d387
 +
        image: quay.io/jetstack/cert-manager-controller:v1.5.3
 +
</pre>
  
 
===Manage resources===
 
===Manage resources===
Line 3,588: Line 3,746:
 
KIND:    Deployment
 
KIND:    Deployment
 
VERSION:  apps/v1
 
VERSION:  apps/v1
 +
</pre>
 +
 +
===kubectl-neat===
 +
 +
: See: https://github.com/itaysk/kubectl-neat
 +
: See: [[jq]]
 +
 +
* To easily copy a certificate secret from one namespace to another namespace run:
 +
<pre>
 +
$ SOURCE_NAMESPACE=<update-me>
 +
$ DESTINATION_NAMESPACE=<update-me>
 +
$ kubectl -n ${SOURCE_NAMESPACE} get secret kafka-client-credentials -o json |\
 +
    kubectl neat |\
 +
    jq 'del(.metadata["namespace"])' |\
 +
    kubectl apply -n ${DESTINATION_NAMESPACE} -f -
 +
</pre>
 +
 +
===Get CPU/memory for each node===
 +
 +
<pre>
 +
for node in $(kubectl get nodes -o=jsonpath='{.items[*].metadata.name}'); do
 +
  echo "NODE: ${node}"; kubectl describe node ${node} | grep -E '^  cpu |^  memory ';
 +
done
 +
</pre>
 +
 +
===Get vCPU capacity===
 +
 +
<pre>
 +
$ kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{\"\t\"} \
 +
    {.status.capacity.cpu}{\"\n\"}{end}"
 
</pre>
 
</pre>
  
Line 3,725: Line 3,913:
  
 
Sure enough. Each of the 3 Pods is serving the GET request roughly 33% of the time.
 
Sure enough. Each of the 3 Pods is serving the GET request roughly 33% of the time.
 +
 +
; Query selections
 +
 +
* Create a "query selection" file:
 +
<pre>
 +
$ cat << EOF >cluster-nodes-health.txt
 +
Name Kernel InternalIP MemoryPressure DiskPressure PIDPressure Ready
 +
.metadata.name .status.nodeInfo.kernelVersion .status.addresses[0].address .status.conditions[0].status .status.conditions[1].status .status.conditions[2].status .status.conditions[3].status
 +
EOF
 +
</pre>
 +
 +
* Use the above "query selection" file:
 +
<pre>
 +
$ kubectl get nodes -o custom-columns-file=cluster-nodes-health.txt
 +
Name          Kernel          InternalIP    MemoryPressure  DiskPressure  PIDPressure  Ready
 +
10.10.10.152  5.4.0-1084-aws  10.10.10.152  False            False          False        False
 +
10.10.11.12    5.4.0-1092-aws  10.10.11.12    False            False          False        False
 +
10.10.12.22    5.4.0-1039-aws  10.10.12.22    False            False          False        False
 +
</pre>
  
 
==Example YAML files==
 
==Example YAML files==
Line 3,814: Line 4,021:
 
       - name: pi
 
       - name: pi
 
         image: perl
 
         image: perl
         command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
+
         command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
 
       restartPolicy: Never
 
       restartPolicy: Never
 
   backoffLimit: 4
 
   backoffLimit: 4
Line 4,007: Line 4,214:
 
: FEATURE STATE: Kubernetes v1.15 (stable)
 
: FEATURE STATE: Kubernetes v1.15 (stable)
  
This guide shows you how to install and write extensions for kubectl. Usually called plugins or binary extensions, this feature allows you to extend the default set of commands available in kubectl by adding new subcommands to perform new tasks and extend the set of features available in the main distribution of kubectl.
+
This section shows you how to install and write extensions for <code>kubectl</code>. Usually called "plugins" or "binary extensions", this feature allows you to extend the default set of commands available in <code>kubectl</code> by adding new sub-commands to perform new tasks and extend the set of features available in the main distribution of <code>kubectl</code>.
  
 
Get code [https://github.com/kubernetes/kubernetes/tree/master/pkg/kubectl/plugins/examples from here].
 
Get code [https://github.com/kubernetes/kubernetes/tree/master/pkg/kubectl/plugins/examples from here].
Line 4,041: Line 4,248:
 
nginx-deployment-67594d6bf6-d8dwt: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 6 hours and 8 minutes
 
nginx-deployment-67594d6bf6-d8dwt: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 6 hours and 8 minutes
 
</pre>
 
</pre>
 +
 +
==Local Kubernetes==
 +
 +
<div style="float:left; margin:0px 20px 20px 0px;">
 +
{| align="center" style="border: 1px solid #999; background-color:#FFFFFF"
 +
|-
 +
! colspan="6" bgcolor="#EFEFEF" | '''Local Kubernetes Comparisons'''
 +
|-align="center" bgcolor="#1188ee"
 +
!Feature
 +
!kind
 +
!k3d
 +
!minikube
 +
!Docker Desktop
 +
!Rancher Desktop
 +
|-
 +
| Free || yes || yes || yes || Personal Small Business* || yes
 +
|--bgcolor="#eeeeee"
 +
| Install || easy || easy || easy || easy || medium (you may encounter odd scenarios)
 +
|-
 +
| Ease of Use || medium || medium || medium || easy || easy
 +
|--bgcolor="#eeeeee"
 +
| Stability || stable || stable || stable || stable || stable
 +
|-
 +
| Cross-platform || yes || yes || yes || yes || yes
 +
|--bgcolor="#eeeeee"
 +
| CI Usage || yes || yes || yes || no || no
 +
|-
 +
| Multiple clusters || yes || yes || yes || no || no
 +
|--bgcolor="#eeeeee"
 +
| Podman support || yes || yes || yes || no || no
 +
|-
 +
| Host volumes mount support || yes || yes || yes (with some performance limitations) || yes || yes (only pre-defined paths)
 +
|--bgcolor="#eeeeee"
 +
| Kubernetes service port-forwarding/mapping || yes || yes || yes || yes || yes
 +
|-
 +
| Pull-through Docker mirror/proxy || yes || yes || no || yes (can reference locally available images) || yes (can reference locally available images)
 +
|--bgcolor="#eeeeee"
 +
| Custom CNI || yes (ex: calico) || yes (ex: flannel) || yes (ex: calico) || no  || no
 +
|-
 +
| Features Gates || yes || yes || yes || yes (but not natively; requires hacky setup) || yes (but not natively; requires hacky setup)
 +
|}
 +
</div>
 +
<br clear="all"/>
 +
 +
[https://bmiguel-teixeira.medium.com/local-kubernetes-the-one-above-all-3aedbeb5f3f6 Source]
  
 
==See also==
 
==See also==
Line 4,069: Line 4,321:
  
 
===Training===
 
===Training===
 +
* [https://kubernetes.io/training/ Official Kubernetes Training Website]
 +
** Kubernetes and Cloud Native Associate (KCNA)
 +
** Certified Kubernetes Application Developer (CKAD)
 +
** Certified Kubernetes Administrator (CKA)
 +
** Certified Kubernetes Security Specialist (CKS) [note: Candidates for CKS must hold a current Certified Kubernetes Administrator (CKA) certification to demonstrate they possess sufficient Kubernetes expertise before sitting for the CKS.]
 
* [https://training.linuxfoundation.org/linux-courses/system-administration-training/kubernetes-fundamentals Kubernetes Fundamentals] (LFS258)
 
* [https://training.linuxfoundation.org/linux-courses/system-administration-training/kubernetes-fundamentals Kubernetes Fundamentals] (LFS258)
 
** ''[https://www.cncf.io/certification/expert/ Certified Kubernetes Administrator]'' (PKA) certification.
 
** ''[https://www.cncf.io/certification/expert/ Certified Kubernetes Administrator]'' (PKA) certification.
 +
* [https://killer.sh/ CKS / CKA / CKAD Simulator]
 
* [https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/ 11 Ways (Not) to Get Hacked]
 
* [https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/ 11 Ways (Not) to Get Hacked]
  

Revision as of 17:26, 19 January 2024

Kubernetes (also known by its numeronym k8s) is an open source container cluster manager. Kubernetes' primary goal is to provide a platform for automating deployment, scaling, and operations of application containers across a cluster of hosts. Kubernetes was released by Google on July 2015.

  • Get the latest stable release of k8s with:
$ curl -sSL https://dl.k8s.io/release/stable.txt

Contents

Release history

NOTE: I have been using Kubernetes since release 1.0 back in September 2015.

NOTE: There is no such thing as Kubernetes Long-Term-Support (LTS). There is a new "minor" release roughly every 3 months (note: changed to roughly every 4 months in 2020).

Kubernetes release history
Release Date Cadence (days)
1.0 2015-07-10
1.1 2015-11-09 122
1.2 2016-03-16 128
1.3 2016-07-01 107
1.4 2016-09-26 87
1.5 2016-12-12 77
1.6 2017-03-28 106
1.7 2017-06-30 94
1.8 2017-09-28 90
1.9 2017-12-15 78
1.10 2018-03-26 101
1.11 2018-06-27 93
1.12 2018-09-27 92
1.13 2018-12-03 67
1.14 2019-03-25 112
1.15 2019-06-17 84
1.16 2019-09-18 93
1.17 2019-12-09 82
1.18 2020-03-25 107
1.19 2020-08-26 154
1.20 2020-12-08 104
1.21 2021-04-08 121
1.22 2021-08-04 118
1.23 2021-12-07 125
1.24 2022-05-03 147
1.25 2022-08-23 112
1.26 2023-01-18 148
1.27 2023-04-11 83
1.28 2023-08-15 126
1.29 2023-12-13 120


See: The full-time job of keeping up with Kubernetes

Providers and installers

  • Vanilla Kubernetes
  • AWS:
    • Managed: EKS
    • Kops
    • Kube-AWS
    • Kismatic
    • Kubicorn
    • Stack Point Cloud
  • Google:
  • Azure AKS
  • Ubuntu UKS
  • VMware PKS
  • Rancher RKE
  • CoreOS Tectonic

Design overview

Kubernetes is built through the definition of a set of components (building blocks or "primitives") which, when used collectively, provide a method for the deployment, maintenance, and scalability of container-based application clusters.

These "primitives" are designed to be loosely coupled (i.e., where little to no knowledge of the other component definitions is needed to use) as well as easily extensible through an API. Both the internal components of Kubernetes as well as the extensions and containers make use of this API.

Components

The building blocks of Kubernetes are the following (note that these are also referred to as Kubernetes "Objects" or "API Primitives"):

Cluster 
A cluster is a set of machines (physical or virtual) on which your applications are managed and run. All machines are managed as a cluster (or set of clusters, depending on the topology used).
Nodes (minions) 
You can think of these as "container clients". These are the individual hosts (physical or virtual) that Docker is installed on and hosts the various containers within your managed cluster.
Each node will run etcd (a key-pair management and communication service, used by Kubernetes for exchanging messages and reporting on cluster status) as well as the Kubernetes Proxy.
Pods 
A pod consists of one or more containers. Those containers are guaranteed (by the cluster controller) to be located on the same host machine (aka "co-located") in order to facilitate sharing of resources. For an example, it makes sense to have database processes and data containers as close as possible. In fact, they really should be in the same pod.
Pods "work together", as in a multi-tiered application configuration. Each set of pods that define and implement a service (e.g., MySQL or Apache) are defined by the label selector (see below).
Pods are assigned unique IPs within each cluster. These allow an application to use ports without having to worry about conflicting port utilization.
Pods can contain definitions of disk volumes or shares, and then provide access from those to all the members (containers) within the pod.
Finally, pod management is done through the API or delegated to a controller.
Labels 
Clients can attach key-value pairs to any object in the system (e.g., Pods or Nodes). These become the labels that identify them in the configuration and management of them. The key-value pairs can be used to filter, organize, and perform mass operations on a set of resources.
Selectors 
Label Selectors represent queries that are made against those labels. They resolve to the corresponding matching objects. A Selector expression matches labels to filter certain resources. For example, you may want to search for all pods that belong to a certain service, or find all containers that have a specific tier Label value as "database". Labels and Selectors are inherently two sides of the same coin. You can use Labels to classify resources and use Selectors to find them and use them for certain actions.
These two items are the primary way that grouping is done in Kubernetes and determine which components that a given operation applies to when indicated.
Controllers 
These are used in the management of your cluster. Controllers are the mechanism by which your desired configuration state is enforced.
Controllers manage a set of pods and, depending on the desired configuration state, may engage other controllers to handle replication and scaling (Replication Controller) of X number of containers and pods across the cluster. It is also responsible for replacing any container in a pod that fails (based on the desired state of the cluster).
Replication Controllers (RC) are a subset of Controllers and are an abstraction used to manage pod lifecycles. One of the key uses of RCs is to maintain a certain number of running Pods (e.g., for scaling or ensuring that at least one Pod is running at all times, etc.). It is considered a "best practice" to use RCs to define Pod lifecycles, rather than creating Pods directly.
Other controllers that can be engaged include a DaemonSet Controller (enforces a 1-to-1 ratio of pods to Worker Nodes) and a Job Controller (that runs pods to "completion", such as in batch jobs).
Each set of pods any controller manages, is determined by the label selectors that are part of its definition.
Replica Sets
These define how many replicas of each Pod will be running. They also monitor and ensure the required number of Pods are running, replacing Pods that die. Replica Sets can act as replacements for Replication Controllers.
Services 
A Service is an abstraction on top of Pods, which provides a single IP address and DNS name by which the Pods can be accessed. This load balancing configuration is much easier to manage and helps scale Pods seamlessly.
Kubernetes can then provide service discovery and handle routing with the static IP for each pod as well as load balancing (round-robin based) connections to that service among the pods that match the label selector indicated.
By default, although a service is only exposed inside a cluster, it can also be exposed outside a cluster, as needed.
Volumes 
A Volume is a directory with data, which is accessible to a container. The volume co-terminates with the Pods that encloses it.
Name 
A name by which a resource is identified.
Namespace 
A Namespace provides additional qualification to a resource name. This is especially helpful when multiple teams/projects are using the same cluster and there is a potential for name collision. You can think of a Namespace as a virtual wall between multiple clusters.
Annotations 
An Annotation is a Label, but with much larger data capacity. Typically, this data is not readable by humans and is not easy to filter through. Annotation is useful only for storing data that may not be searched, but is required by the resource (e.g., storing strong keys, etc.).
Control Pane
API

Pods

A Pod is the smallest and simplest Kubernetes object. It is the unit of deployment in Kubernetes, which represents a single instance of the application. A Pod is a logical collection of one or more containers, which:

  • are scheduled together on the same host;
  • share the same network namespace; and
  • mount the same external storage (Volumes).

Pods are ephemeral in nature, and they do not have the capability to self-heal by themselves. That is why we use them with controllers, which can handle a Pod's replication, fault tolerance, self-heal, etc. Examples of controllers are Deployments, ReplicaSets, ReplicationControllers, etc. We attach the Pod's specification to other objects using Pod Templates (see below).

Labels

Labels are key-value pairs that can be attached to any Kubernetes object (e.g. Pods). Labels are used to organize and select a subset of objects, based on the requirements in place. Many objects can have the same label(s). Labels do not provide uniqueness to objects.

Label Selectors

With Label Selectors, we can select a subset of objects. Kubernetes supports two types of Selectors:

Equality-Based Selectors 
Equality-Based Selectors allow filtering of objects based on label keys and values. With this type of Selector, we can use the =, ==, or != operators. For example, with env==dev, we are selecting the objects where the "env" label is set to "dev".
Set-Based Selectors 
Set-Based Selectors allow filtering of objects based on a set of values. With this type of Selector, we can use the in, notin, and exist operators. For example, with env in (dev,qa), we are selecting objects where the "env" label is set to "dev" or "qa".

Replication Controllers

A ReplicationController (rc) is a controller that is part of the Master Node's Controller Manager. It makes sure the specified number of replicas for a Pod is running at any given point in time. If there are more Pods than the desired count, the ReplicationController would kill the extra Pods, and, if there are less Pods, then the ReplicationController would create more Pods to match the desired count. Generally, we do not deploy a Pod independently, as it would not be able to re-start itself if something goes wrong. We always use controllers like ReplicationController to create and manage Pods.

Replica Sets

A ReplicaSet (rs) is the next-generation ReplicationController. ReplicaSets support both equality- and set-based Selectors, whereas ReplicationControllers only support equality-based Selectors. As of January 2018, this is the only difference.

As an example, say you create a ReplicaSet where you defined a "desired replicas = 3" (and set "current==desired"), any time "current!=desired" (i.e., one of the Pods dies) the ReplicaSet will detect that the current state is no longer matching the desired state. So, in our given scenario, the ReplicaSet will create one more Pod, thus ensuring that the current state matches the desired state.

ReplicaSets can be used independently, but they are mostly used by Deployments to orchestrate the Pod creation, deletion, and updates. A Deployment automatically creates the ReplicaSets, and we do not have to worry about managing them.

Deployments

Deployment objects provide declarative updates to Pods and ReplicaSets. The DeploymentController is part of the Master Node's Controller Manager, and it makes sure that the current state always matches the desired state.

As an example, let's say we have a Deployment which creates a "ReplicaSet A". ReplicaSet A then creates 3 Pods. In each Pod, one of the containers uses the nginx:1.7.9 image.

Now, in the Deployment, we change the Pod's template and we update the image for the Nginx container from nginx:1.7.9 to nginx:1.9.1. As we have modified the Pod's template, a new "ReplicaSet B" gets created. This process is referred to as a "Deployment rollout". (A rollout is only triggered when we update the Pod's template for a deployment. Operations like scaling the deployment do not trigger the deployment.) Once ReplicaSet B is ready, the Deployment starts pointing to it.

On top of ReplicaSets, Deployments provide features like Deployment recording, with which, if something goes wrong, we can rollback to a previously known state.

Namespaces

If we have numerous users whom we would like to organize into teams/projects, we can partition the Kubernetes cluster into sub-clusters using Namespaces. The names of the resources/objects created inside a Namespace are unique, but not across Namespaces.

To list all the Namespaces, we can run the following command:

$ kubectl get namespaces
NAME          STATUS    AGE
default       Active    2h
kube-public   Active    2h
kube-system   Active    2h

Generally, Kubernetes creates two default namespaces: kube-system and default. The kube-system namespace contains the objects created by the Kubernetes system. The default namespace contains the objects which belong to any other namespace. By default, we connect to the default Namespace. kube-public is a special namespace, which is readable by all users and used for special purposes, like bootstrapping a cluster.

Using Resource Quotas, we can divide the cluster resources within Namespaces.

Component services

The component services running on a standard master/worker node(s) Kubernetes setup are as follows:

  • Kubernetes Master node(s)
    kube-apiserver 
    Exposes Kubernetes APIs
    kube-controller-manager 
    Runs controllers to handle nodes, endpoints, etc.
    kube-scheduler 
    Watches for new pods and assigns them nodes
    etcd 
    Distributed key-value store
    DNS 
    [optional] DNS for Kubernetes services
  • Worker node(s)
    kubelet 
    Manages pods on a node, volumes, secrets, creating new containers, health checks, etc.
    kube-proxy 
    Maintains network rules, port forwarding, etc.

Setup a Kubernetes cluster

IMPORTANT: The following is how to setup Kubernetes 1.2 that is, as of January 2018, a very old version. I will update this article with how to setup k8s using a much newer version (v1.9) when I have time.

In this section, I will show you how to setup a Kubernetes cluster with etcd and Docker. The cluster will consist of 1 master node and 3 worker nodes.

Setup VMs

For this demo, I will be creating 4 VMs via Vagrant (with VirtualBox).

  • Create Vagrant demo environment:
$ mkdir $HOME/dev/kubernetes && cd $_
  • Create Vagrantfile with the following contents:
# -*- mode: ruby -*-
# vi: set ft=ruby :

require 'yaml'
VAGRANTFILE_API_VERSION = "2"

$common_script = <<COMMON_SCRIPT
# Set verbose
set -v
# Set exit on error
set -e
echo -e "$(date) [INFO] Starting modified Vagrant..."
sudo yum update -y
# Timestamp provision
date > /etc/vagrant_provisioned_at
COMMON_SCRIPT

unless defined? CONFIG
  configuration_file = File.join(File.dirname(__FILE__), 'vagrant_config.yml')
  CONFIG = YAML.load(File.open(configuration_file, File::RDONLY).read)
end

CONFIG['box'] = {} unless CONFIG.key?('box')

def modifyvm_network(node)
  node.vm.provider "virtualbox" do |vbox|
    vbox.customize ["modifyvm", :id, "--nicpromisc1", "allow-all"]
    #vbox.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
    vbox.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
  end
end

def modifyvm_resources(node, memory, cpus)
  node.vm.provider "virtualbox" do |vbox|
    vbox.customize ["modifyvm", :id, "--memory", memory]
    vbox.customize ["modifyvm", :id, "--cpus", cpus]
  end
end

## START: Actual Vagrant process
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  config.vm.box = CONFIG['box']['name']

  # Uncomment the following line if you wish to be able to pass files from
  # your local filesystem directly into the vagrant VM:
  #config.vm.synced_folder "data", "/vagrant"

## VM: k8s master #############################################################
  config.vm.define "master" do |node|
    node.vm.hostname = "k8s.master.dev"
    node.vm.provision "shell", inline: $common_script
    #node.vm.network "forwarded_port", guest: 80, host: 8080
    node.vm.network "private_network", ip: CONFIG['host_groups']['master']

    # Uncomment the following if you wish to define CPU/memory:
    #node.vm.provider "virtualbox" do |vbox|
    #  vbox.customize ["modifyvm", :id, "--memory", "4096"]
    #  vbox.customize ["modifyvm", :id, "--cpus", "2"]
    #end
    #modifyvm_resources(node, "4096", "2")
  end
## VM: k8s minion1 ############################################################
  config.vm.define "minion1" do |node|
    node.vm.hostname = "k8s.minion1.dev"
    node.vm.provision "shell", inline: $common_script
    node.vm.network "private_network", ip: CONFIG['host_groups']['minion1']
  end
## VM: k8s minion2 ############################################################
  config.vm.define "minion2" do |node|
    node.vm.hostname = "k8s.minion2.dev"
    node.vm.provision "shell", inline: $common_script
    node.vm.network "private_network", ip: CONFIG['host_groups']['minion2']
  end
## VM: k8s minion3 ############################################################
  config.vm.define "minion3" do |node|
    node.vm.hostname = "k8s.minion3.dev"
    node.vm.provision "shell", inline: $common_script
    node.vm.network "private_network", ip: CONFIG['host_groups']['minion3']
  end
###############################################################################

end

The above Vagrantfile uses the following configuration file:

$ cat vagrant_config.yml
---
box:
  name: centos/7
  storage_controller: 'SATA Controller'
debug: false
development: false
network:
  dns1: 8.8.8.8
  dns2: 8.8.4.4
  internal:
    network: 192.168.200.0/24
  external:
    start: 192.168.100.100
    end: 192.168.100.200
    network: 192.168.100.0/24
    bridge: wlan0
    netmask: 255.255.255.0
    broadcast: 192.168.100.255
host_groups:
  master: 192.168.200.100
  minion1: 192.168.200.101
  minion2: 192.168.200.102
  minion3: 192.168.200.103
  • In the Vagrant Kubernetes directory (i.e., $HOME/dev/kubernetes), run the following command:
$ vagrant up

Setup hosts

Note: Run the following commands/steps on all hosts (master and minions).

  • Log into the k8s master host:
$ vagrant ssh master
  • Kubernetes cluster
$ cat << EOF >> /etc/hosts
192.168.200.100    k8s.master.dev
192.168.200.101    k8s.minion1.dev
192.168.200.102    k8s.minion2.dev
192.168.200.103    k8s.minion3.dev
EOF
  • Install, enable, and start NTP:
$ yum install -y ntp
$ systemctl enable ntpd && systemctl start ntpd
$ timedatectl
  • Disable any firewall rules (for now; we will add the rules back later):
$ systemctl stop firewalld && systemctl disable firewalld
$ systemctl stop iptables
  • Disable SELinux (for now; we will turn it on again later):
$ setenforce 0
$ sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/sysconfig/selinux
$ sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config
$ sestatus
  • Add the Docker repo and update yum:
$ cat << EOF > /etc/yum.repos.d/virt7-docker-common-release.repo
[virt7-docker-common-release]
name=virr7-docker-common-release
baseurl=http://cbs.centos.org/repos/virt7-docker-common-release/x86_64/os/
gpgcheck=0
EOF
$ yum update
  • Install Docker, Kubernetes, and etcd:
$ yum install -y --enablerepo=virt7-docker-common-release kubernetes docker etcd

Install and configure master controller

Note: Run the following commands on only the master host.

  • Edit /etc/kubernetes/config and add (or make changes to) the following lines:
KUBE_MASTER="--master=http://k8s.master.dev:8080"
KUBE_ETCD_SERVERS="--etcd-servers=http://k8s.master.dev:2379"
  • Edit /etc/etcd/etcd.conf and add (or make changes to) the following lines:
[member]
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
[cluster]
ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
  • Edit /etc/kubernetes/apiserver and add (or make changes to) the following lines:
# The address on the local server to listen to.
#KUBE_API_ADDRESS="--insecure-bind-address=127.0.0.1"
KUBE_API_ADDRESS="--address=0.0.0.0"

# The port on the local server to listen on.
KUBE_API_PORT="--port=8080"

# Port minions listen on
KUBELET_PORT="--kubelet-port=10250"

# Comma separated list of nodes in the etcd cluster
KUBE_ETCD_SERVERS="--etcd-servers=http://127.0.0.1:2379"

# Address range to use for services
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"

# default admission control policies
#KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"

# Add your own!
KUBE_API_ARGS=""
  • Enable and start the following etcd and Kubernetes services:
$ for SERVICE in etcd kube-apiserver kube-controller-manager kube-scheduler; do
      systemctl restart $SERVICE
      systemctl enable $SERVICE
      systemctl status $SERVICE 
  done
  • Check on the status of the above services (the following command should report 4 running services):
$ systemctl status etcd kube-apiserver kube-controller-manager kube-scheduler | grep "(running)" | wc -l # => 4
  • Check on the status of the Kubernetes API server:
$ kubectl cluster-info
Kubernetes master is running at http://localhost:8080
$ curl http://localhost:8080/version
#~OR~
$ curl http://k8s.master.dev:8080/version
{
  "major": "1",
  "minor": "2",
  "gitVersion": "v1.2.0",
  "gitCommit": "ec7364b6e3b155e78086018aa644057edbe196e5",
  "gitTreeState": "clean"
}
  • Get a list of Kubernetes API paths:
$ curl http://k8s.master.dev:8080/paths
{
  "paths": [
    "/api",
    "/api/v1",
    "/apis",
    "/apis/autoscaling",
    "/apis/autoscaling/v1",
    "/apis/batch",
    "/apis/batch/v1",
    "/apis/extensions",
    "/apis/extensions/v1beta1",
    "/healthz",
    "/healthz/ping",
    "/logs/",
    "/metrics",
    "/resetMetrics",
    "/swagger-ui/",
    "/swaggerapi/",
    "/ui/",
    "/version"
  ]
}
  • List all available paths (key-value stores) known to ectd:
$ etcdctl ls / --recursive

The master controller in a Kubernetes cluster must have the following services running to function as the master host in the cluster:

  • ntpd
  • etcd
  • kube-controller-manager
  • kube-apiserver
  • kube-scheduler

Note: The Docker daemon should not be running on the master host.

Install and configure the minions

Note: Run the following commands/steps on all minion hosts.

  • Log into the k8s minion hosts:
$ vagrant ssh minion1  # do the same for minion2 and minion3
  • Edit /etc/kubernetes/config and add (or make changes to) the following lines:
KUBE_MASTER="--master=http://k8s.master.dev:8080"
KUBE_ECTD_SERVERS="--etcd-servers=http://k8s.master.dev:2379"
  • Edit /etc/kubernetes/kubelet and add (or make changes to) the following lines:
###
# kubernetes kubelet (minion) config

# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0"

# The port for the info server to serve on
KUBELET_PORT="--port=10250"

# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=k8s.minion1.dev"  # ***CHANGE TO CORRECT MINION HOSTNAME***

# location of the api-server
KUBELET_API_SERVER="--api-servers=http://k8s.master.dev:8080"

# pod infrastructure container
#KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest"

# Add your own!
KUBELET_ARGS=""
  • Enable and start the following services:
$ for SERVICE in kube-proxy kubelet docker; do
      systemctl restart $SERVICE
      systemctl enable $SERVICE
      systemctl status $SERVICE
  done
  • Test that Docker is running and can start containers:
$ docker info
$ docker pull hello-world
$ docker run hello-world

Each minion in a Kubernetes cluster must have the following services running to function as a member of the cluster (i.e., a "Ready" node):

  • ntpd
  • kubelet
  • kube-proxy
  • docker

Kubectl: Exploring our environment

Note: Run all of the following commands on the master host.

  • Get a list of nodes with kubectl:
$ kubectl get nodes
NAME              STATUS    AGE
k8s.minion1.dev   Ready     20m
k8s.minion2.dev   Ready     12m
k8s.minion3.dev   Ready     12m
  • Describe nodes with kubectl:
$ kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'
$ kubectl get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | tr ';' "\n"
k8s.minion1.dev:OutOfDisk=False
Ready=True
k8s.minion2.dev:OutOfDisk=False
Ready=True
k8s.minion3.dev:OutOfDisk=False
Ready=True
  • Get the man page for kubectl:
$ man kubectl-get

Working with our Kubernetes cluster

Note: The following section will be working from within the Kubernetes cluster we created above.

Create and deploy pod definitions

  • Turn off nodes 1 and 2:
minion{1,2}$ systemctl stop kubelet kube-proxy
master$ kubectl get nodes
NAME              STATUS     AGE
k8s.minion1.dev   Ready      1h
k8s.minion2.dev   NotReady   37m
k8s.minion3.dev   NotReady   39m
  • Check for any k8s Pods (there should be none):
master$ kubectl get pods
  • Create a builds directory for our Pods:
master$ mkdir builds && cd $_
  • Create a Pod running Nginx inside a Docker container:
master$ kubectl create -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
EOF
  • Check on Pod creation status:
master$ kubectl get pods
NAME      READY     STATUS              RESTARTS   AGE
nginx     0/1       ContainerCreating   0          2s
master$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
nginx     1/1       Running   0          3m
minion1$ docker ps
CONTAINER ID        IMAGE        COMMAND                 CREATED        STATUS        PORTS  NAMES
a718c6c0355d        nginx:1.7.9  "nginx -g 'daemon off"  3 minutes ago  Up 3 minutes         k8s_nginx.4580025_nginx_default_699e...
master$ kubectl describe pod nginx
master$ kubectl run busybox --image=busybox --restart=Never --tty -i --generator=run-pod/v1
busybox$ wget -qO- 172.17.0.2
master$ kubectl delete pod busybox
master$ kubectl delete pod nginx
  • Port forwarding:
master$ kubectl create -f nginx.yml  # see above for YAML
master$ kubectl port-forward nginx :80 &
I1020 23:12:29.478742   23394 portforward.go:213] Forwarding from [::1]:40065 -> 80
master$ curl -I localhost:40065

Tags, labels, and selectors

master$ cat << EOF > nginx-pod-label.yml
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
EOF
master$ kubectl create -f nginx-pod-label.yml
master$ kubectl get pods -l app=nginx
master$ kubectl describe pods -l app=nginx
  • Add labels or overwrite existing ones:
master$ kubectl label pods nginx new-label=mynginx
master$ kubectl describe pods/nginx | awk '/^Labels/{print $2}'
new-label=nginx
master$ kubectl label pods nginx new-label=foo
master$ kubectl describe pods/nginx | awk '/^Labels/{print $2}'
new-label=foo

Deployments

master$ cat << EOF > nginx-deployment-dev.yml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment-dev
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-deployment-dev
    spec:
      containers:
      - name: nginx-deployment-dev
        image: nginx:1.7.9
        ports:
        - containerPort: 80
EOF
master$ cat << EOF > nginx-deployment-prod.yml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment-prod
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-deployment-prod
    spec:
      containers:
      - name: nginx-deployment-prod
        image: nginx:1.7.9
        ports:
        - containerPort: 80
EOF
master$ kubectl create --validate -f nginx-deployment-dev.yml
master$ kubectl create --validate -f nginx-deployment-prod.yml
master$ kubectl get pods
NAME                                     READY     STATUS    RESTARTS   AGE
nginx-deployment-dev-104434401-jiiic     1/1       Running   0          5m
nginx-deployment-prod-3051195443-hj9b1   1/1       Running   0          12m
master$ kubectl describe deployments -l app=nginx-deployment-dev
Name:                   nginx-deployment-dev
Namespace:              default
CreationTimestamp:      Thu, 20 Oct 2016 23:48:46 +0000
Labels:                 app=nginx-deployment-dev
Selector:               app=nginx-deployment-dev
Replicas:               1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
OldReplicaSets:         <none>
NewReplicaSet:          nginx-deployment-dev-2568522567 (1/1 replicas created)
...
master$ kubectl get deployments
NAME                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment-prod   1         1         1            1           44s
master$ cat << EOF > nginx-deployment-dev-update.yml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment-dev
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-deployment-dev
    spec:
      containers:
      - name: nginx-deployment-dev
        image: nginx:1.8  # ***CHANGED***
        ports:
        - containerPort: 80
EOF
master$ kubectl apply -f nginx-deployment-dev-update.yml
master$ kubectl get pods -l app=nginx-deployment-dev
NAME                                   READY     STATUS              RESTARTS   AGE
nginx-deployment-dev-104434401-jiiic   0/1       ContainerCreating   0          27s
master$ kubectl get pods -l app=nginx-deployment-dev
NAME                                   READY     STATUS    RESTARTS   AGE
nginx-deployment-dev-104434401-jiiic   1/1       Running   0          6m
  • Cleanup:
master$ kubectl delete deployment nginx-deployment-dev
master$ kubectl delete deployment nginx-deployment-prod

Multi-Pod (container) replication controller

  • Start the other two nodes (the ones we previously stopped):
minion2$ systemctl start kubelet kube-proxy
minion3$ systemctl start kubelet kube-proxy
master$ kubectl get nodes
NAME              STATUS    AGE
k8s.minion1.dev   Ready     2h
k8s.minion2.dev   Ready     2h
k8s.minion3.dev   Ready     2h
master$ cat << EOF > nginx-multi-node.yml
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx-www
spec:
  replicas: 3
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
EOF
master$ kubectl create -f nginx-multi-node.yml
master$ kubectl get pods
NAME              READY     STATUS              RESTARTS   AGE
nginx-www-2evxu   0/1       ContainerCreating   0          10s
nginx-www-416ct   0/1       ContainerCreating   0          10s
nginx-www-ax41w   0/1       ContainerCreating   0          10s
master$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
nginx-www-2evxu   1/1       Running   0          1m
nginx-www-416ct   1/1       Running   0          1m
nginx-www-ax41w   1/1       Running   0          1m
master$ kubectl describe pods | awk '/^Node/{print $2}'
k8s.minion2.dev/192.168.200.102
k8s.minion1.dev/192.168.200.101
k8s.minion3.dev/192.168.200.103
minion1$ docker ps # 1 nginx container running
minion2$ docker ps # 1 nginx container running
minion3$ docker ps # 1 nginx container running
minion3$ docker ps --format "{{.Image}}"
nginx
gcr.io/google_containers/pause:2.0
master$ kubectl describe replicationcontroller
Name:       nginx-www
Namespace:  default
Image(s):   nginx
Selector:   app=nginx
Labels:     app=nginx
Replicas:   3 current / 3 desired
Pods Status:    3 Running / 0 Waiting / 0 Succeeded / 0 Failed
...
  • Attempt to delete one of the three pods:
master$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
nginx-www-2evxu   1/1       Running   0          11m
nginx-www-416ct   1/1       Running   0          11m
nginx-www-ax41w   1/1       Running   0          11m
master$ kubectl delete pod nginx-www-2evxu
master$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
nginx-www-3cck4   1/1       Running   0          12s
nginx-www-416ct   1/1       Running   0          11m
nginx-www-ax41w   1/1       Running   0          11m

A new pod (nginx-www-3cck4) automatically started up. This is because the expected state, as defined in our YAML file, is for there to be 3 pods running at all times. Thus, if one or more of the pods were to go down, a new pod (or pods) will automatically start up to bring the state back to the expected state.

  • To force-delete all pods:
master$ kubectl delete replicationcontroller nginx-www
master$ kubectl get pods  # nothing

Create and deploy service definitions

master$ cat << EOF > nginx-service.yml
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  ports:
  - port: 8000
    targetPort: 80
    protocol: TCP
  selector:
    app: nginx
EOF
master$ kubectl get services
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes      10.254.0.1       <none>        443/TCP    3h
master$ kubectl create -f nginx-service.yml
master$ kubectl get services
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes      10.254.0.1       <none>        443/TCP    3h
nginx-service   10.254.110.127   <none>        8000/TCP   10s
master$ kubectl run busybox --generator=run-pod/v1 --image=busybox --restart=Never --tty -i
busybox$ wget -qO- 10.254.110.127:8000  # works
  • Cleanup
master$ kubectl delete pod busybox
master$ kubectl delete service nginx-service
master$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
nginx-www-jh2e9   1/1       Running   0          13m
nginx-www-jir2g   1/1       Running   0          13m
nginx-www-w91uw   1/1       Running   0          13m
master$ kubectl delete replicationcontroller nginx-www
master$ kubectl get pods  # nothing

Creating temporary Pods at the CLI

  • Make sure we have no Pods running:
master$ kubectl get pods
  • Create temporary deployment pod:
master$ kubectl run mysample --image=foobar/apache
master$ kubectl get pods
NAME                        READY     STATUS              RESTARTS   AGE
mysample-1424711890-fhtxb   0/1       ContainerCreating   0          1s
master$ kubectl get deployment 
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
mysample   1         1         1            0           7s
  • Create a temporary deployment pod (where we know it will fail):
master$ kubectl run myexample --image=christophchamp/ubuntu_sysadmin
master$ kubectl -o wide get pods
NAME                         READY     STATUS             RESTARTS   AGE       NODE
myexample-3534121234-mpr35   0/1       CrashLoopBackOff   12         39m       k8s.minion3.dev
mysample-2812764540-74c5h    1/1       Running            0          41m       k8s.minion2.dev
  • Check on why the "myexample" pod is in status "CrashLoopBackOff":
master$ kubectl describe pods/myexample-3534121234-mpr35
master$ kubectl describe deployments/mysample
master$ kubectl describe pods/mysample-2812764540-74c5h | awk '/^Node/{print $2}'
k8s.minion2.dev/192.168.200.102
master$ kubectl delete deployment mysample
  • Run multiple replicas of the same pod:
master$ kubectl run myreplicas --image=latest123/apache --replicas=2 --labels=app=myapache,version=1.0.0
master$ kubectl describe deployment myreplicas 
Name:           myreplicas
Namespace:      default
CreationTimestamp:  Fri, 21 Oct 2016 19:10:30 +0000
Labels:         app=myapache,version=1.0.0
Selector:       app=myapache,version=1.0.0
Replicas:       2 updated | 2 total | 1 available | 1 unavailable
StrategyType:       RollingUpdate
MinReadySeconds:    0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
OldReplicaSets:     <none>
NewReplicaSet:      myreplicas-2209834598 (2/2 replicas created)
...
master$ kubectl get pods -o wide
NAME                          READY     STATUS             RESTARTS   AGE       NODE
myreplicas-2209834598-5iyer   1/1       Running            0          1m        k8s.minion1.dev
myreplicas-2209834598-cslst   1/1       Running            0          1m        k8s.minion2.dev
master$ kubectl describe pods -l version=1.0.0
  • Cleanup:
master$ kubectl delete deployment myreplicas

Interacting with Pod containers

  • Create example Apache pod definition file:
master$ cat << EOF > apache.yml
---
apiVersion: v1
kind: Pod
metadata:
  name: apache
spec:
  containers:
  - name: apache
    image: latest123/apache
    ports:
    - containerPort: 80
EOF
master$ kubectl create -f apache.yml
master$ kubectl get pods -o wide
NAME                          READY     STATUS    RESTARTS   AGE       NODE
apache                        1/1       Running   0          12m       k8s.minion3.dev
  • Test pod and make some basic configuration changes:
master$ kubectl exec apache date
master$ kubectl exec mypod -i -t -- cat /var/www/html/index.html  # default apache HTML
master$ kubectl exec apache -i -t -- /bin/bash
container$ export TERM=xterm
container$ echo "xtof test" > /var/www/html/index.html
minion3$ curl 172.17.0.2
xtof test
container$ exit
master$ kubectl get pods -o wide
NAME                          READY     STATUS    RESTARTS   AGE       NODE
apache                        1/1       Running   0          12m       k8s.minion3.dev

Pod/container is still running even after we exited (as expected).

  • Cleanup:
master$ kubectl delete pod apache

Logs

  • Start our example Apache pod to use for checking Kubernetes logging features:
master$ kubectl create -f apache.yml 
master$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
apache    1/1       Running   0          9s
master$ kubectl logs apache
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
master$ kubectl logs --tail=10 apache
master$ kubectl logs --since=24h apache  # or 10s, 2m, etc.
master$ kubectl logs -f apache  # follow the logs
master$ kubectl logs -f -c apache apache  # where -c is the container ID
  • Cleanup:
master$ kubectl delete pod apache

Autoscaling and scaling Pods

master$ kubectl run myautoscale --image=latest123/apache --port=80 --labels=app=myautoscale
master$ kubectl get pods -o wide
NAME                           READY     STATUS    RESTARTS   AGE       NODE
myautoscale-3243017378-kq4z7   1/1       Running   0          47s       k8s.minion3.dev
  • Create an autoscale definition:
master$ kubectl autoscale deployment myautoscale --min=2 --max=6 --cpu-percent=80
master$ kubectl get deployments
NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
myautoscale   2         2         2            2           4m
master$ kubectl get pods -o wide
NAME                           READY     STATUS    RESTARTS   AGE       NODE
myautoscale-3243017378-kq4z7   1/1       Running   0          3m        k8s.minion3.dev
myautoscale-3243017378-r2f3d   1/1       Running   0          4s        k8s.minion2.dev
  • Scale up an already autoscaled deployment:
master$ kubectl scale --current-replicas=2 --replicas=4 deployment/myautoscale
master$ kubectl get deployments
NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
myautoscale   4         4         4            4           8m
master$ kubectl get pods -o wide
NAME                           READY     STATUS    RESTARTS   AGE       NODE
myautoscale-3243017378-2rxhp   1/1       Running   0          8s        k8s.minion1.dev
myautoscale-3243017378-kq4z7   1/1       Running   0          7m        k8s.minion3.dev
myautoscale-3243017378-ozxs8   1/1       Running   0          8s        k8s.minion3.dev
myautoscale-3243017378-r2f3d   1/1       Running   0          4m        k8s.minion2.dev
  • Scale down:
master$ kubectl scale --current-replicas=4 --replicas=2 deployment/myautoscale

Note: You can not scale down past the original minimum number of pods/containers specified in the original autoscale deployment (i.e., min=2 in our example).

  • Cleanup:
master$ kubectl delete deployment myautoscale

Failure and recovery

master$ kubectl run myrecovery --image=latest123/apache --port=80 --replicas=2 --labels=app=myrecovery
master$ kubectl get deployments
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
myrecovery   2         2         2            2           6s
master$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
myrecovery-563119102-5xu8f   1/1       Running   0          12s       k8s.minion1.dev
myrecovery-563119102-zw6wp   1/1       Running   0          12s       k8s.minion2.dev
  • Now stop Kubernetes- and Docker-related services on one of the minions/nodes (so we have a total of 2 nodes online):
minion1$ systemctl stop docker kubelet kube-proxy
master$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
myrecovery-563119102-qyi04   1/1       Running   0          7m        k8s.minion3.dev
myrecovery-563119102-zw6wp   1/1       Running   0          14m       k8s.minion2.dev

Pod switch from minion1 to minion3.

  • Now stop Kubernetes- and Docker-related services on one of the remaining online minions/nodes (so we have a total of 1 node online):
minion2$ systemctl stop docker kubelet kube-proxy
master$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
myrecovery-563119102-b5tim   1/1       Running   0          2m        k8s.minion3.dev
myrecovery-563119102-qyi04   1/1       Running   0          17m       k8s.minion3.dev

Both Pods are now running on minion3, the only available node.

  • Start up Kubernetes- and Docker-related services again on minion1 and delete one of the Pods:
minion1$ systemctl start docker kubelet kube-proxy
master$ kubectl delete pod myrecovery-563119102-b5tim
master$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
myrecovery-563119102-8unzg   1/1       Running   0          1m        k8s.minion1.dev
myrecovery-563119102-qyi04   1/1       Running   0          20m       k8s.minion3.dev

Pods are now running on separate nodes.

  • Cleanup:
master$ kubectl delete deployments/myrecovery

Minikube

Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kubernetes cluster inside a VM on your laptop for users looking to try out Kubernetes or develop with it day-to-day.

  • Install Minikube:
$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
    && chmod +x minikube && sudo mv minikube /usr/local/bin/
  • Install kubectl
$ curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl \
    && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
  • Test install
$ minikube start
#~OR~
$ minikube start --memory 4096 # give it 4GB of RAM
$ minikube status
$ minikube dashboard
$ kubectl config view
$ kubectl cluster-info

NOTE: If you have an old version of minikube installed, you should probably do the following before upgrading to a much newer version:

$ minikube delete --all --purge

Get the details on the CLI options for kubectl here.

Using the `kubectl proxy` command, kubectl will authenticate with the API Server on the Master Node and would make the dashboard available on http://localhost:8001/ui:

$ kubectl proxy
Starting to serve on 127.0.0.1:8001

After running the above command, we can access the dashboard at http://127.0.0.1:8001/ui.

Once the kubectl proxy is configured, we can send requests to localhost on the proxy port:

$ curl http://localhost:8001/
$ curl http://localhost:8001/version
{
  "major": "1",
  "minor": "8",
  "gitVersion": "v1.8.0",
  "gitCommit": "0b9efaeb34a2fc51ff8e4d34ad9bc6375459c4a4",
  "gitTreeState": "clean",
  "buildDate": "2017-11-29T22:43:34Z",
  "goVersion": "go1.9.1",
  "compiler": "gc",
  "platform": "linux/amd64"
}

Without kubectl proxy configured, we can get the Bearer Token using kubectl, and then send it with the API request. A Bearer Token is an access token which is generated by the authentication server (the API server on the Master Node) and given back to the client. Using that token, the client can connect back to the Kubernetes API server without providing further authentication details, and then, access resources.

  • Get the k8s token:
$ TOKEN=$(kubectl describe secret $(kubectl get secrets | awk '/^default/{print $1}') | awk '/^token/{print $2}')
  • Get the k8s API server endpoint:
$ APISERVER=$(kubectl config view | awk '/https/{print $2}')
  • Access the API Server:
$ curl -k -H "Authorization: Bearer ${TOKEN}" ${APISERVER}

Using Minikube as a local Docker registry

Sometimes it is useful to have a local Docker registry for Kubernetes to pull images from. As the Minikube README describes, you can reuse the Docker daemon running within Minikube with eval $(minikube docker-env) to build and pull images from.

To use an image without uploading it to some external resgistry (e.g., Docker Hub), you can follow these steps:

  • Set the environment variables with eval $(minikube docker-env)
  • Build the image with the Docker daemon of Minikube (e.g., docker build -t my-image .)
  • Set the image in the pod spec like the build tag (e.g., my-image)
  • Set the imagePullPolicy to Never, otherwise Kubernetes will try to download the image.

Important note: You have to run eval $(minikube docker-env) on each terminal you want to use since it only sets the environment variables for the current shell session.

Working with our Minikube-based Kubernetes cluster

Kubernetes Object Model

Kubernetes has a very rich object model, with which it represents different persistent entities in the Kubernetes cluster. Those entities describe:

  • What containerized applications we are running and on which node
  • Application resource consumption
  • Different policies attached to applications, like restart/upgrade policies, fault tolerance, etc.

With each object, we declare our intent or desired state using the spec field. The Kubernetes system manages the status field for objects, in which it records the actual state of the object. At any given point in time, the Kubernetes Control Plane tries to match the object's actual state to the object's desired state.

Examples of Kubernetes objects are Pods, Deployments, ReplicaSets, etc.

To create an object, we need to provide the spec field to the Kubernetes API Server. The spec field describes the desired state, along with some basic information, like the name. The API request to create the object must have the spec field, as well as other details, in a JSON format. Most often, we provide an object's definition in a YAML file, which is converted by kubectl in a JSON payload and sent to the API Server.

Below is an example of a Deployment object:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

With the apiVersion field in the example above, we mention the API endpoint on the API Server which we want to connect to. Note that you can see what API version to use with the following call to the API server:

$ curl -k -H "Authorization: Bearer ${TOKEN}" ${APISERVER}/apis/apps

Use the preferredVersion for most cases.

With the kind field, we mention the object type — in our case, we have Deployment. With the metadata field, we attach the basic information to objects, like the name. Notice that in the above we have two spec fields (spec and spec.template.spec). With spec, we define the desired state of the deployment. In our example, we want to make sure that, at any point in time, at least 3 Pods are running, which are created using the Pod template defined in spec.template. In spec.template.spec, we define the desired state of the Pod (here, our Pod would be created using nginx:1.7.9).

Once the object is created, the Kubernetes system attaches the status field to the object.

Connecting users to Pods

To access the application, a user/client needs to connect to the Pods. As Pods are ephemeral in nature, resources like IP addresses allocated to it cannot be static. Pods could die abruptly or be rescheduled based on existing requirements.

As an example, consider a scenario in which a user/client is connecting to a Pod using its IP address. Unexpectedly, the Pod to which the user/client is connected dies and a new Pod is created by the controller. The new Pod will have a new IP address, which will not be known automatically to the user/client of the earlier Pod. To overcome this situation, Kubernetes provides a higher-level abstraction called Service, which logically groups Pods and a policy to access them. This grouping is achieved via Labels and Selectors (see above).

So, for our example, we would use Selectors (e.g., "app==frontend" and "app==db") to group our Pods into two logical groups. We can assign a name to the logical grouping, referred to as a "service name". In our example, we have created two Services, frontend-svc and db-svc, and they have the "app==frontend" and the "app==db" Selectors, respectively.

The following is an example of a Service object:

kind: Service
apiVersion: v1
metadata:
  name: frontend-svc
spec:
  selector:
    app: frontend
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000

in which we are creating a frontend-svc Service by selecting all the Pods that have the Label "app" equal to "frontend". By default, each Service also gets an IP address, which is routable only inside the cluster. In our case, we have 172.17.0.4 and 172.17.0.5 IP addresses for our frontend-svc and db-svc Services, respectively. The IP address attached to each Service is also known as the ClusterIP for that Service.

+------------------------------------+
| select: app==frontend              |          container (app:frontend; 10.0.1.3)
|  service=frontend-svc (172.17.0.4) |------>   container (app:frontend; 10.0.1.4)
+------------------------------------+          container (app:frontend; 10.0.1.5)
              ^
             /
            /
user/client
            \
             \
              v
+------------------------------------+
| select: app==db                    |------>   container (app:db; 10.0.1.10)
|  service=db-svc (172.17.0.5)       |
+------------------------------------+

The user/client now connects to a Service via its IP address, which forwards the traffic to one of the Pods attached to it. A Service does the load balancing while selecting the Pods for forwarding the data/traffic.

While forwarding the traffic from the Service, we can select the target port on the Pod. In our example, for frontend-svc, we will receive requests from the user/client on port 80. We will then forward these requests to one of the attached Pods on port 5000. If the target port is not defined explicitly, then traffic will be forwarded to Pods on the port on which the Service receives traffic.

A tuple of Pods, IP addresses, along with the targetPort is referred to as a Service Endpoint. In our case, frontend-svc has 3 Endpoints: 10.0.1.3:5000, 10.0.1.4:5000, and 10.0.1.5:5000.

kube-proxy

All of the Worker Nodes run a daemon called kube-proxy, which watches the API Server on the Master Node for the addition and removal of Services and endpoints. For each new Service, on each node, kube-proxy configures the IPtables rules to capture the traffic for its ClusterIP and forwards it to one of the endpoints. When the Service is removed, kube-proxy removes the IPtables rules on all nodes as well.

Service discovery

As Services are the primary mode of communication in Kubernetes, we need a way to discover them at runtime. Kubernetes supports two methods of discovering a Service:

Environment Variables 
As soon as the Pod starts on any Worker Node, the kubelet daemon running on that node adds a set of environment variables in the Pod for all active Services. For example, if we have an active Service called redis-master, which exposes port 6379, and its ClusterIP is 172.17.0.6, then, on a newly created Pod, we can see the following environment variables:
REDIS_MASTER_SERVICE_HOST=172.17.0.6
REDIS_MASTER_SERVICE_PORT=6379
REDIS_MASTER_PORT=tcp://172.17.0.6:6379
REDIS_MASTER_PORT_6379_TCP=tcp://172.17.0.6:6379
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
REDIS_MASTER_PORT_6379_TCP_PORT=6379
REDIS_MASTER_PORT_6379_TCP_ADDR=172.17.0.6

With this solution, we need to be careful while ordering our Services, as the Pods will not have the environment variables set for Services which are created after the Pods are created.

DNS 
Kubernetes has an add-on for DNS, which creates a DNS record for each Service and its format is like my-svc.my-namespace.svc.cluster.local. Services within the same namespace can reach other services with just their name. For example, if we add a Service redis-master in the my-ns Namespace, then all the Pods in the same Namespace can reach to the redis Service just by using its name, redis-master. Pods from other Namespaces can reach the Service by adding the respective Namespace as a suffix, like redis-master.my-ns.
This is the most common and highly recommended solution. For example, in the previous section's image, we have seen that an internal DNS is configured, which maps our services frontend-svc and db-svc to 172.17.0.4 and 172.17.0.5, respectively.

Service Type

While defining a Service, we can also choose its access scope. We can decide whether the Service:

  • is only accessible within the cluster;
  • is accessible from within the cluster and the external world; or
  • maps to an external entity which resides outside the cluster.

Access scope is decided by ServiceType, which can be mentioned when creating the Service.

ClusterIP 
(the default ServiceType.) A Service gets its Virtual IP address using the ClusterIP. That IP address is used for communicating with the Service and is accessible only within the cluster.
NodePort 
With this ServiceType, in addition to creating a ClusterIP, a port from the range 30000-32767 is mapped to the respective service from all the Worker Nodes. For example, if the mapped NodePort is 32233 for the service frontend-svc, then, if we connect to any Worker Node on port 32233, the node would redirect all the traffic to the assigned ClusterIP (172.17.0.4).
By default, while exposing a NodePort, a random port is automatically selected by the Kubernetes Master from the port range 30000-32767. If we do not want to assign a dynamic port value for NodePort, then, while creating the Service, we can also give a port number from the earlier specific range.
The NodePort ServiceType is useful when we want to make our services accessible from the external world. The end-user connects to the Worker Nodes on the specified port, which forwards the traffic to the applications running inside the cluster. To access the application from the external world, administrators can configure a reverse proxy outside the Kubernetes cluster and map the specific endpoint to the respective port on the Worker Nodes.
LoadBalancer
With this ServiceType, we have the following:
  • NodePort and ClusterIP Services are automatically created, and the external load balancer will route to them;
  • The Services are exposed at a static port on each Worker Node; and
  • The Service is exposed externally using the underlying Cloud provider's load balancer feature.
The LoadBalancer ServiceType will only work if the underlying infrastructure supports the automatic creation of Load Balancers and have the respective support in Kubernetes, as is the case with the Google Cloud Platform and AWS.
ExternalIP 
A Service can be mapped to an ExternalIP address if it can route to one or more of the Worker Nodes. Traffic that is ingressed into the cluster with the ExternalIP (as destination IP) on the Service port, gets routed to one of the the Service endpoints. (Note that ExternalIPs are not managed by Kubernetes. The cluster administrator(s) must have configured the routing to map the ExternalIP address to one of the nodes.)
ExternalName 
a special ServiceType, which has no Selectors and does not define any endpoints. When accessed within the cluster, it returns a CNAME record of an externally configured service.
The primary use case of this ServiceType is to make externally configured services like my-database.example.com available inside the cluster, using just the name, like my-database, to other services inside the same Namespace.

Deploying a application

$ kubectl create -f - <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: webserver
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: webserver
    spec:
      containers:
      - name: webserver
        image: nginx:alpine
        ports:
        - containerPort: 80
EOF
$ kubectl create -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: web-service
  labels:
    run: web-service
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: webserver
EOF
$ kubectl get service
NAME          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes    ClusterIP   10.96.0.1        <none>        443/TCP        6h
web-service   NodePort    10.104.107.132   <none>        80:32610/TCP   7m

Note that "32610" port.

  • Get the IP address of your Minikube k8s cluster
$ minikube ip
192.168.99.100
#~OR~
$ minikube service web-service --url
http://192.168.99.100:32610
  • Now, check that your web service is serving up a default Nginx website:
$ curl -I http://192.168.99.100:32610
HTTP/1.1 200 OK
Server: nginx/1.13.8
Date: Thu, 11 Jan 2018 00:27:51 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Wed, 10 Jan 2018 04:10:03 GMT
Connection: keep-alive
ETag: "5a55921b-264"
Accept-Ranges: bytes

Looks good!

Finally, destroy the webserver deployment:

$ kubectl delete deployments webserver

Using Ingress with Minikube

  • First check that the Ingress add-on is enabled:
$ minikube addons list | grep ingress
- ingress: disabled

If it is not, enable it with:

$ minikube addons enable ingress
$ minikube addons list | grep ingress
- ingress: enabled
  • Create an Echo Server Deployment:
$ cat << EOF >deploy-echoserver.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: echoserver
  name: echoserver
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: echoserver
  template:
    metadata:
      labels:
        run: echoserver
    spec:
      containers:
      - image: gcr.io/google_containers/echoserver:1.4
        imagePullPolicy: IfNotPresent
        name: echoserver
        ports:
        - containerPort: 8080
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
$ kubectl create --validate -f deploy-echoserver.yml
  • Create the Cheddar cheese Deployment:
$ cat << EOF >deploy-cheddar-cheese.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: cheddar-cheese
  name: cheddar-cheese
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: cheddar-cheese
  template:
    metadata:
      labels:
        run: cheddar-cheese
    spec:
      containers:
      - image: errm/cheese:cheddar
        imagePullPolicy: IfNotPresent
        name: cheddar-cheese
        ports:
        - containerPort: 80
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
$ kubectl create --validate -f deploy-cheddar-cheese.yml
  • Create the Stilton cheese Deployment:
$ cat << EOF >deploy-stilton-cheese.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: stilton-cheese
  name: stilton-cheese
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: stilton-cheese
  template:
    metadata:
      labels:
        run: stilton-cheese
    spec:
      containers:
      - image: errm/cheese:stilton
        imagePullPolicy: IfNotPresent
        name: stilton-cheese
        ports:
        - containerPort: 80
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
  • Create the Echo Server Service:
$ cat << EOF >svc-echoserver.yml
apiVersion: v1
kind: Service
metadata:
  labels:
    run: echoserver
  name: echoserver
  namespace: default
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 31116
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    run: echoserver
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
$ kubectl create --validate -f svc-echoserver.yml
  • Create the Cheddar cheese Service:
$ cat << EOF >svc-cheddar-cheese.yml
apiVersion: v1
kind: Service
metadata:
  labels:
    run: cheddar-cheese
  name: cheddar-cheese
  namespace: default
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 32467
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: cheddar-cheese
  sessionAffinity: None
  type: NodePort
$ kubectl create --validate -f svc-cheddar-cheese.yml
  • Create the Stilton cheese Service:
$ cat << EOF >svc-stilton-cheese.yml
apiVersion: v1
kind: Service
metadata:
  labels:
    run: stilton-cheese
  name: stilton-cheese
  namespace: default
spec:
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30197
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: stilton-cheese
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
$ kubectl create --validate -f svc-stilton-cheese.yml
  • Create the Ingress for the above Services:
$ cat << EOF >ingress-cheese.yml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-cheese
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  backend:
    serviceName: default-http-backend
    servicePort: 80
  rules:
  - host: myminikube.info
    http:
      paths:
      - path: /
        backend:
          serviceName: echoserver
          servicePort: 8080
  - host: cheeses.all
    http:
      paths:
      - path: /stilton
        backend:
          serviceName: stilton-cheese
          servicePort: 80
      - path: /cheddar
        backend:
          serviceName: cheddar-cheese
          servicePort: 80
$ kubectl create --validate -f ingress-cheese.yml
  • Check that everything is up:
$ kubectl get all
NAME                                 READY     STATUS    RESTARTS   AGE
pod/cheddar-cheese-d6d6587c7-4bgcz   1/1       Running   0          12m
pod/echoserver-55f97d5bff-pdv65      1/1       Running   0          12m
pod/stilton-cheese-6d64cbc79-g7h4w   1/1       Running   0          12m

NAME                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/cheddar-cheese   NodePort    10.109.238.92    <none>        80:32467/TCP     12m
service/echoserver       NodePort    10.98.60.194     <none>        8080:31116/TCP   12m
service/kubernetes       ClusterIP   10.96.0.1        <none>        443/TCP          23h
service/stilton-cheese   NodePort    10.108.175.207   <none>        80:30197/TCP     12m

NAME                             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cheddar-cheese   1         1         1            1           12m
deployment.apps/echoserver       1         1         1            1           12m
deployment.apps/stilton-cheese   1         1         1            1           12m

NAME                                       DESIRED   CURRENT   READY     AGE
replicaset.apps/cheddar-cheese-d6d6587c7   1         1         1         12m
replicaset.apps/echoserver-55f97d5bff      1         1         1         12m
replicaset.apps/stilton-cheese-6d64cbc79   1         1         1         12m

$ kubectl get ing
NAME             HOSTS                         ADDRESS     PORTS     AGE
ingress-cheese   myminikube.info,cheeses.all   10.0.2.15   80        12m
  • Add your host aliases:
$ echo "$(minikube ip) myminikube.info cheeses.all" | sudo tee -a /etc/hosts
  • Now, either using your browser or curl, check that you can reach all of the endpoints defined in the Ingress:
$ curl -sI -w "%{http_code}\n" -o /dev/null cheeses.all/cheddar/  # Should return '200'
$ curl -sI -w "%{http_code}\n" -o /dev/null cheeses.all/stilton/  # Should return '200'
$ curl -sI -w "%{http_code}\n" -o /dev/null myminikube.info       # Should return '200'
  • You can also see the Nginx logs for the above requests with:
$ kubectl --namespace kube-system logs \
    --selector app.kubernetes.io/name=nginx-ingress-controller
  • You can also view the Nginx configuration file (and the settings created by the above Ingress) with:
$ NGINX_POD=$(kubectl --namespace kube-system get pods \
     --selector app.kubernetes.io/name=nginx-ingress-controller \
     --output jsonpath='{.items[0].metadata.name}')
$ kubectl --namespace kube-system exec -it ${NGINX_POD} -- cat /etc/nginx/nginx.conf
  • Get the version of the Nginx Ingress controller installed:
$ kubectl --namespace kube-system exec -it ${NGINX_POD} -- /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.19.0
  Build:      git-05025d6
  Repository: https://github.com/kubernetes/ingress-nginx.git
-------------------------------------------------------------------------------

Kubectl

kubectl controls the Kubernetes cluster manager.

  • View your current configuration:
$ kubectl config view
  • Switch between clusters:
$ kubectl config use-context <context_name>
  • Remove a cluster:
$ kubectl config unset contexts.<context_name>
$ kubectl config unset users.<user_name>
$ kubectl config unset clusters.<cluster_name>
  • Sort Pods by age:
$ kubectl get po --sort-by='{.firstTimestamp}'.
$ kubectl get pods --all-namespaces --sort-by=.metadata.creationTimestamp
  • Backup all primitives deployed in a given k8s cluster:
$ kubectl api-resources --verbs=list --namespaced -o name \
    | xargs -n1 -I{} bash -c "kubectl get {} --all-namespaces -oyaml && echo ---" \
    > k8s_backup.yaml

kubectl explain

List the fields for supported resources.
  • Get the documentation of a resource (aka "kind") and its fields:
$ kubectl explain deployment
KIND:     Deployment
VERSION:  apps/v1

DESCRIPTION:
     Deployment enables declarative updates for Pods and ReplicaSets.

FIELDS:
   apiVersion	<string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind	<string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata	<Object>
     Standard object metadata.

   spec	<Object>
     Specification of the desired behavior of the Deployment.

   status	<Object>
     Most recently observed status of the Deployment
  • Get a list of all the resource types and their latest supported version:
$ for kind in $(kubectl api-resources | tail +2 | awk '{print $1}'); do
    kubectl explain ${kind};
  done | grep -E "^KIND:|^VERSION:"

KIND:     Binding
VERSION:  v1
KIND:     ComponentStatus
VERSION:  v1
KIND:     ConfigMap
VERSION:  v1
...
  • Get a list of all allowable fields for a given primitive:
$ kubectl explain deployment --recursive | head
KIND:     Deployment
VERSION:  apps/v1

DESCRIPTION:
     Deployment enables declarative updates for Pods and ReplicaSets.

FIELDS:
   apiVersion	<string>
   kind	<string>
   metadata	<Object>
  • Get documentation ("man page"-style) for a given field in a given primitive:
$ kubectl explain deployment.status.availableReplicas
KIND:     Deployment
VERSION:  apps/v1

FIELD:    availableReplicas <integer>

DESCRIPTION:
     Total number of available pods (ready for at least minReadySeconds)
     targeted by this deployment.

Merge kubeconfig files

  • Reference which kubeconfig files you wish to merge:
$ export KUBECONFIG=$HOME/.kube/dev.yaml:$HOME/.kube/prod.yaml
  • Flatten them:
$ kubectl config view --flatten >> $HOME/.kube/config
  • Unset:
$ unset KUBECONFIG

Merge complete.

Namespaces

See: Namespaces in the official documentation.

Create a Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: dev

Pods

Create a Pod that has an Init Container

In this example, I will create a Pod that has one application Container and one Init Container. The init container runs to completion before the application container starts.

$ cat << EOF >init-demo.yml
apiVersion: v1
kind: Pod
metadata:
  name: init-demo
  labels:
    app: demo
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    volumeMounts:
    - name: workdir
      mountPath: /usr/share/nginx/html
  # These containers are run during pod initialization
  initContainers:
  - name: install
    image: busybox
    command:
    - wget
    - "-O"
    - "/work-dir/index.html"
    - https://example.com
    volumeMounts:
    - name: workdir
      mountPath: "/work-dir"
  dnsPolicy: Default
  volumes:
  - name: workdir
    emptyDir: {}
EOF

The above Pod YAML will first create the init container using the busybox image, which will download the HTML of the example.com website and save it to a file (index.html) on the Pod volume called "workdir". After the init container completes, the Nginx container starts and presents the index.html on port 80 (the file is located at /usr/share/nginx/index.html inside the Nginx container as a volume mount).

  • Now, create this Pod:
$ kubectl create --validate -f init-demo.yml
  • Create a Service:
$ cat << EOF >example.yml
kind: Service
apiVersion: v1
metadata:
  name: example
spec:
  ports:
  - port: 8000
    targetPort: 80
    protocol: TCP
  selector:
    app: demo
  • Check that we can get the header of https://example.com:
$ curl -sI $(kubectl get svc/foo-svc -o jsonpath='{.spec.clusterIP}'):8000 | grep ^HTTP
HTTP/1.1 200 OK

Deployments

A Deployment controller provides declarative updates for Pods and ReplicaSets.

You describe a desired state in a Deployment object, and the Deployment controller changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments.

Creating a Deployment

The following is an example of a Deployment. It creates a ReplicaSet to bring up three Nginx Pods:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
  • Check the syntax of the Deployment (YAML):
$ kubectl create -f nginx-deployment.yml --dry-run
deployment.apps/nginx-deployment created (dry run)
  • Create the Deployment:
$ kubectl create --record -f nginx-deployment.yml 
deployment "nginx-deployment" created

Note: By appending --record to the above command, we are telling the API to record the current command in the annotations of the created or updated resource. This is useful for future review, such as investigating which commands were executed in each Deployment revision.

  • Get information about our Deployment:
$ kubectl get deployments
NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3         3         3            3           24s

$ kubectl describe deployment/nginx-deployment
Name:                   nginx-deployment
Namespace:              default
CreationTimestamp:      Tue, 30 Jan 2018 23:28:43 +0000
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision=1
                        kubernetes.io/change-cause=kubectl create --record=true --filename=nginx-deployment.yml
Selector:               app=nginx
Replicas:               3 desired | 3 updated | 3 total | 0 available | 3 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
  Containers:
   nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   nginx-deployment-6c54bd5869 (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  28s   deployment-controller  Scaled up replica set nginx-deployment-6c54bd5869 to 3
  • Get information about the ReplicaSet created by the above Deployment:
$ kubectl get rs
NAME                          DESIRED   CURRENT   READY     AGE
nginx-deployment-6c54bd5869   3         3         3         3m

$ kubectl describe rs/nginx-deployment-6c54bd5869
Name:           nginx-deployment-6c54bd5869
Namespace:      default
Selector:       app=nginx,pod-template-hash=2710681425
Labels:         app=nginx
                pod-template-hash=2710681425
Annotations:    deployment.kubernetes.io/desired-replicas=3
                deployment.kubernetes.io/max-replicas=4
                deployment.kubernetes.io/revision=1
                kubernetes.io/change-cause=kubectl create --record=true --filename=nginx-deployment.yml
Controlled By:  Deployment/nginx-deployment
Replicas:       3 current / 3 desired
Pods Status:    3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=nginx
           pod-template-hash=2710681425
  Containers:
   nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  4m    replicaset-controller  Created pod: nginx-deployment-6c54bd5869-k9mh4
  Normal  SuccessfulCreate  4m    replicaset-controller  Created pod: nginx-deployment-6c54bd5869-pphjt
  Normal  SuccessfulCreate  4m    replicaset-controller  Created pod: nginx-deployment-6c54bd5869-n4fj5
  • Get information about the Pods created by this Deployment:
$ kubectl get pods --show-labels -l app=nginx -o wide
NAME                               READY  STATUS   RESTARTS  AGE  IP          NODE               LABELS
nginx-deployment-6c54bd5869-k9mh4  1/1    Running  0         5m   10.244.1.5  k8s.worker1.local  app=nginx,pod-template-hash=2710681425
nginx-deployment-6c54bd5869-n4fj5  1/1    Running  0         5m   10.244.1.6  k8s.worker2.local  app=nginx,pod-template-hash=2710681425
nginx-deployment-6c54bd5869-pphjt  1/1    Running  0         5m   10.244.1.7  k8s.worker3.local  app=nginx,pod-template-hash=2710681425
Updating a Deployment

Note: A Deployment's rollout is triggered if, and only if, the Deployment's pod template (that is, .spec.template) is changed (for example, if the labels or container images of the template are updated). Other updates, such as scaling the Deployment, do not trigger a rollout.

Suppose that we want to update the Nginx Pods in the above Deployment to use the nginx:1.9.1 image instead of the nginx:1.7.9 image.

$ kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1
deployment "nginx-deployment" image updated

Alternatively, we can edit the Deployment and change .spec.template.spec.containers[0].image from nginx:1.7.9 to nginx:1.9.1:

$ kubectl edit deployment/nginx-deployment
deployment "nginx-deployment" edited
  • Check on the rollout status:
$ kubectl rollout status deployment/nginx-deployment
Waiting for rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for rollout to finish: 1 old replicas are pending termination...
Waiting for rollout to finish: 1 old replicas are pending termination...
deployment "nginx-deployment" successfully rolled out
  • Get information about the updated Deployment:
$ kubectl get deploy
NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   3         3         3            3           18m

$ kubectl get rs
NAME                          DESIRED   CURRENT   READY     AGE
nginx-deployment-5964dfd755   3         3         3         1m   # <- new ReplicaSet using nginx:1.9.1
nginx-deployment-6c54bd5869   0         0         0         17m  # <- old ReplicaSet using nginx:1.7.9

$ kubectl rollout history deployment/nginx-deployment
deployments "nginx-deployment"
REVISION  CHANGE-CAUSE
1         kubectl create --record=true --filename=nginx-deployment.yml
2         kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1
$ kubectl rollout history deployment/nginx-deployment --revision=2

deployments "nginx-deployment" with revision #2
Pod Template:
  Labels:	app=nginx
	pod-template-hash=1520898311
  Annotations:	kubernetes.io/change-cause=kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1
  Containers:
   nginx:
    Image:	nginx:1.9.1
    Port:	80/TCP
    Environment:	<none>
    Mounts:	<none>
  Volumes:	<none>
Rolling back to a previous revision

Undo the current rollout and rollback to the previous revision:

$ kubectl rollout undo deployment/nginx-deployment
deployment "nginx-deployment" rolled back

Alternatively, you can rollback to a specific revision by specify that in --to-revision:

$ kubectl rollout undo deployment/nginx-deployment --to-revision=1
deployment "nginx-deployment" rolled back

Volume management

On-disk files in a container are ephemeral, which presents some problems for non-trivial applications when running in containers. First, when a container crashes, kubelet will restart it, but the files will be lost (i.e., the container starts with a clean state). Second, when running containers together in a Pod it is often necessary to share files between those containers. The Kubernetes Volumes abstraction solves both of these problems. A Volume is essentially a directory backed by a storage medium. The storage medium and its content are determined by the Volume Type.

In Kubernetes, a Volume is attached to a Pod and shared among the containers of that Pod. The Volume has the same life span as the Pod, and it outlives the containers of the Pod — this allows data to be preserved across container restarts.

Kubernetes resolves the problem of persistent storage with the Persistent Volume subsystem, which provides APIs for users and administrators to manage and consume storage. To manage the Volume, it uses the PersistentVolume (PV) API resource type, and to consume it, it uses the PersistentVolumeClaim (PVC) API resource type.

PersistentVolume (PV) 
a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
PersistentVolumeClaim (PVC) 
a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Persistent Volume Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).

A Persistent Volume is a network-attached storage in the cluster, which is provisioned by the administrator.

Persistent Volumes can be provisioned statically by the administrator, or dynamically, based on the StorageClass resource. A StorageClass contains pre-defined provisioners and parameters to create a Persistent Volume.

A PersistentVolumeClaim (PVC) is a request for storage by a user. Users request Persistent Volume resources based on size, access modes, etc. Once a suitable Persistent Volume is found, it is bound to a Persistent Volume Claim. After a successful bind, the Persistent Volume Claim resource can be used in a Pod. Once a user finishes its work, the attached Persistent Volumes can be released. The underlying Persistent Volumes can then be reclaimed and recycled for future usage. See Persistent Volumes for details.

Access Modes
  • Each of the following access modes must be supported by storage resource provider (e.g., NFS, AWS EBS, etc.) if they are to be used.
  • ReadWriteOnce (RWO) — volume can be mounted as read/write by one node only.
  • ReadOnlyMany (ROX) — volume can be mounted read-only by many nodes.
  • ReadWriteMany (RWX) — volume can be mounted read/write by many nodes.

A volume can only be mounted using one access mode at a time, regardless of the modes that are supported.

Example #1 - Using Host Volumes

As an example of how to use volumes, we can modify our previous "webserver" Deployment (see above) to look like the following:

$ cat webserver.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: webserver
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: webserver
    spec:
      containers:
      - name: webserver
        image: nginx:alpine
        ports:
        - containerPort: 80
        volumeMounts:
        - name: hostvol
          mountPath: /usr/share/nginx/html
      volumes:
      - name: hostvol
        hostPath:
          path: /home/docker/vol

And use the same Service:

$ cat webserver-svc.yml
apiVersion: v1
kind: Service
metadata:
  name: web-service
  labels:
    run: web-service
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: webserver

Then create the deployment and service:

$ kubectl create -f webserver.yml
$ kubectl create -f webserver-svc.yml

Then, SSH into the webserver and run the following commands

$ minikube ssh
minikube> mkdir -p /home/docker/vol
minikube> echo "Christoph testing" > /home/docker/vol/index.html
minikube> exit

Get the webserver IP and port:

$ minikube ip
192.168.99.100
$ kubectl get svc/web-service -o json | jq '.spec.ports[].nodePort'
32610
# OR
$ minikube service web-service --url
http://192.168.99.100:32610
$ curl http://192.168.99.100:32610
Christoph testing
Example #2 - Using NFS
  • First, create a server to host your NFS server (e.g., `sudo apt-get install -y nfs-kernel-server`).
  • On your NFS server, do the following:
$ mkdir -p /var/nfs/general
$ cat << EOF >>/etc/exports
/var/nfs/general  10.100.1.2(rw,sync,no_subtree_check) 10.100.1.3(rw,sync,no_subtree_check) 10.100.1.4(rw,sync,no_subtree_check)
EOF

where the 10.x IPs are the private IPs of your k8s nodes (both Master and Worker nodes).

  • Make sure to install nfs-common on each of the k8s nodes that will be connecting to the NFS server.

Now, on the k8s Master node, create a Persistent Volume (PV) and Persistent Volume Claim (PVC):

  • Create a Persistent Volume (PV):
$ cat << EOF >pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mypv
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    path: /var/nfs/general
    server: 10.100.1.10  # NFS Server's private IP
    readOnly: false
EOF
$ kubectl create --validate -f pv.yml
$ kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
mypv      1Gi        RWX            Recycle          Available
  • Create a Persistent Volume Claim (PVC):
$ cat << EOF >pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
EOF
$ kubectl create --validate -f pvc.yml
$ kubectl get pvc
NAME      STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-pvc   Bound     mypv      1Gi        RWX
$ kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM             STORAGECLASS   REASON    AGE
mypv      1Gi        RWX            Recycle          Bound     default/nfs-pvc                            11m
  • Create a Pod:
$ cat << EOF >nfs-pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nfs-pod
  labels:
    name: nfs-pod
spec:
  containers:
  - name: nfs-ctn
    image: busybox
    command:
      - sleep
      - "3600"
    volumeMounts:
    - name: nfsvol
      mountPath: /tmp
  restartPolicy: Always
  securityContext:
    fsGroup: 65534
    runAsUser: 65534
  volumes:
    - name: nfsvol
      persistentVolumeClaim:
        claimName: nfs-pvc
EOF
$ kubectl create --validate -f nfs-pod.yml
$ kubectl get pods -o wide
NAME     READY     STATUS    RESTARTS   AGE       IP            NODE
busybox  1/1       Running   9          2d        10.244.2.22   k8s.worker01.local
  • Get a shell from the nfs-pod Pod:
$ kubectl exec -it nfs-pod -- sh
/ $ df -h
Filesystem                Size      Used Available Use% Mounted on
172.31.119.58:/var/nfs/general
                         19.3G      1.8G     17.5G   9% /tmp
...
/ $ touch /tmp/this-is-from-the-pod
  • On the NFS server:
$ ls -l /var/nfs/general/
total 0
-rw-r--r-- 1 nobody nogroup 0 Jan 18 23:32 this-is-from-the-pod

It works!

ConfigMaps and Secrets

While deploying an application, we may need to pass such runtime parameters like configuration details, passwords, etc. For example, let's assume we need to deploy ten different applications for our customers, and, for each customer, we just need to change the name of the company in the UI. Instead of creating ten different Docker images for each customer, we can just use the template image and pass the customers' names as a runtime parameter. In such cases, we can use the ConfigMap API resource. Similarly, when we want to pass sensitive information, we can use the Secret API resource. Think Secrets (for confidential data) and ConfigMaps (for non-confidential data).

ConfigMaps allow you to decouple configuration artifacts from image content to keep containerized applications portable. Using ConfigMaps, we can pass configuration details as key-value pairs, which can be later consumed by Pods or any other system components, such as controllers. We can create ConfigMaps in two ways:

  • From literal values; and
  • From files.


ConfigMaps
  • Create a ConfigMap:
$ kubectl create configmap my-config --from-literal=key1=value1 --from-literal=key2=value2
configmap "my-config" created
$ kubectl get configmaps my-config -o yaml
apiVersion: v1
data:
  key1: value1
  key2: value2
kind: ConfigMap
metadata:
  creationTimestamp: 2018-01-11T23:57:44Z
  name: my-config
  namespace: default
  resourceVersion: "117110"
  selfLink: /api/v1/namespaces/default/configmaps/my-config
  uid: 37a43e39-f72b-11e7-8370-08002721601f
$ kubectl describe configmap/my-config
Name:         my-config
Namespace:    default
Labels:       <none>
Annotations:  <none>

Data
====
key2:
----
value2
key1:
----
value1
Events:  <none>
Create a ConfigMap from a configuration file
$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: customer1
data:
  TEXT1: Customer1_Company
  TEXT2: Welcomes You
  COMPANY: Customer1 Company Technology, LLC.
EOF

We can get the values of the given key as environment variables inside a Pod. In the following example, while creating the Deployment, we are assigning values for environment variables from the customer1 ConfigMap:

....
 containers:
      - name: my-app
        image: foobar
        env:
        - name: MONGODB_HOST
          value: mongodb
        - name: TEXT1
          valueFrom:
            configMapKeyRef:
              name: customer1
              key: TEXT1
        - name: TEXT2
          valueFrom:
            configMapKeyRef:
              name: customer1
              key: TEXT2
        - name: COMPANY
          valueFrom:
            configMapKeyRef:
              name: customer1
              key: COMPANY
....

With the above, we will get the TEXT1 environment variable set to Customer1_Company, TEXT2 environment variable set to Welcomes You, and so on.

We can also mount a ConfigMap as a Volume inside a Pod. For each key, we will see a file in the mount path and the content of that file become the respective key's value. For details, see here.

You can also use ConfigMaps to configure your cluster to use, as an example, 8.8.8.8 and 8.8.4.4 as its upstream DNS server:

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-dns
  namespace: kube-system
data:
  upstreamNameservers: |
    ["8.8.8.8", "8.8.4.4"]
Secrets

Objects of type Secret are intended to hold sensitive information, such as passwords, OAuth tokens, and ssh keys. Putting this information in a Secret is safer and more flexible than putting it verbatim in a pod definition or in a docker image.

As an example, assume that we have a Wordpress blog application, in which our wordpress frontend connects to the MySQL database backend using a password. While creating the Deployment for wordpress, we can put the MySQL password in the Deployment's YAML file, but the password would not be protected. The password would be available to anyone who has access to the configuration file.

In situations such as the one we just mentioned, the Secret object can help. With Secrets, we can share sensitive information like passwords, tokens, or keys in the form of key-value pairs, similar to ConfigMaps; thus, we can control how the information in a Secret is used, reducing the risk for accidental exposures. In Deployments or other system components, the Secret object is referenced, without exposing its content.

It is important to keep in mind that the Secret data is stored as plain text inside etcd. Administrators must limit the access to the API Server and etcd.

To create a Secret using the `kubectl create secret` command, we need to first create a file with a password, and then pass it as an argument.

  • Create a file with your MySQL password:
$ echo mysqlpasswd | tr -d '\n' > password.txt
  • Create the Secret:
$ kubectl create secret generic mysql-passwd --from-file=password.txt
$ kubectl describe secret/mysql-passwd
Name:         mysql-passwd
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
password.txt:  11 bytes

We can also create a Secret manually, using the YAML configuration file. With Secrets, each object data must be encoded using base64. If we want to have a configuration file for our Secret, we must first get the base64 encoding for our password:

$ cat password.txt | base64
bXlzcWxwYXNzd2Q==

and then use it in the configuration file:

apiVersion: v1
kind: Secret
metadata:
  name: mysql-passwd
type: Opaque
data:
  password: bXlzcWxwYXNzd2Q=

Note that base64 encoding does not do any encryption and anyone can easily decode it:

$ echo "bXlzcWxwYXNzd2Q=" | base64 -d  # => mysqlpasswd

Therefore, make sure you do not commit a Secret's configuration file in the source code.

We can get Secrets to be used by containers in a Pod by mounting them as data volumes, or by exposing them as environment variables.

We can reference a Secret and assign the value of its key as an environment variable (WORDPRESS_DB_PASSWORD):

.....
    spec:
      containers:
      - image: wordpress:4.7.3-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST
          value: wordpress-mysql
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: my-password
              key: password.txt
.....

Or, we can also mount a Secret as a Volume inside a Pod. A file would be created for each key mentioned in the Secret, whose content would be the respective value. See here for details.

Ingress

Among the ServiceTypes mentioned earlier, NodePort and LoadBalancer are the most often used. For the LoadBalancer ServiceType, we need to have the support from the underlying infrastructure. Even after having the support, we may not want to use it for every Service, as LoadBalancer resources are limited and they can increase costs significantly. Managing the NodePort ServiceType can also be tricky at times, as we need to keep updating our proxy settings and keep track of the assigned ports. In this section, we will explore the Ingress API object, which is another method we can use to access our applications from the external world.

An Ingress is a collection of rules that allow inbound connections to reach the cluster Services. With Services, routing rules are attached to a given Service. They exist for as long as the Service exists. If we can somehow decouple the routing rules from the application, we can then update our application without worrying about its external access. This can be done using the Ingress resource. Ingress can provide load balancing, SSL/TLS termination, and name-based virtual hosting and/or routing.

To allow the inbound connection to reach the cluster Services, Ingress configures a Layer 7 HTTP load balancer for Services and provides the following:

  • TLS (Transport Layer Security)
  • Name-based virtual hosting
  • Path-based routing
  • Custom rules.

With Ingress, users do not connect directly to a Service. Users reach the Ingress endpoint, and, from there, the request is forwarded to the respective Service. You can see an example of an example Ingress definition below:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: web-ingress
spec:
  rules:
  - host: blue.example.com
    http:
      paths:
      - backend: 
          serviceName: blue-service
          servicePort: 80
  - host: green.example.com
    http:
      paths:
      - backend:
          serviceName: green-service
          servicePort: 80

According to the example just provided, users requests to both blue.example.com and green.example.com would go to the same Ingress endpoint, and, from there, they would be forwarded to blue-service, and green-service, respectively. Here, we have seen an example of a Name-Based Virtual Hosting Ingress rule.

We can also have Fan Out Ingress rules, in which we send requests like example.com/blue and example.com/green, which would be forwarded to blue-service and green-service, respectively.

To secure an Ingress, you must create a Secret. The TLS secret must contain keys named tls.crt and tls.key, which contain the certificate and private key to use for TLS.

The Ingress resource does not do any request forwarding by itself. All of the magic is done using the Ingress Controller.

Ingress Controller

An Ingress Controller is an application which watches the Master Node's API Server for changes in the Ingress resources and updates the Layer 7 load balancer accordingly. Kubernetes has different Ingress Controllers, and, if needed, we can also build our own. GCE L7 Load Balancer and Nginx Ingress Controller are examples of Ingress Controllers.

Minikube v0.14.0 and above ships the Nginx Ingress Controller setup as an add-on. It can be easily enabled by running the following command:

$ minikube addons enable ingress

Once the Ingress Controller is deployed, we can create an Ingress resource using the kubectl create command. For example, if we create an example-ingress.yml file with the content above, then, we can use the following command to create an Ingress resource:

$ kubectl create -f example-ingress.yml

With the Ingress resource we just created, we should now be able to access the blue-service or green-service services using blue.example.com and green.example.com URLs. As our current setup is on minikube, we will need to update the host configuration file on our workstation to the minikube's IP for those URLs:

$ cat /etc/hosts
127.0.0.1        localhost
::1              localhost
192.168.99.100   blue.example.com green.example.com 

Once this is done, we can now open blue.example.com and green.example.com in a browser and access the application.

Labels and Selectors

Labels are key-value pairs that are attached to objects, such as pods. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects. Labels can be attached to objects at creation time and subsequently added and modified at any time. Each object can have a set of key-value labels defined. Each key must be unique for a given object.

"labels": {
  "key1" : "value1",
  "key2" : "value2"
}
Syntax and character set

Labels are key-value pairs. Valid label keys have two segments: an optional prefix and name, separated by a slash (/). The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/). If the prefix is omitted, the label key is presumed to be private to the user. Automated system components (e.g. kube-scheduler, kube-controller-manager, kube-apiserver, kubectl, or other third-party automation) which add labels to end-user objects must specify a prefix. The kubernetes.io/ prefix is reserved for Kubernetes core components.

Valid label values must be 63 characters or less and must be empty or begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.

Label selectors

Unlike names and UIDs, labels do not provide uniqueness. In general, we expect many objects to carry the same label(s).

Via a label selector, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes.

The API currently supports two types of selectors: equality-based and set-based. A label selector can be made of multiple requirements which are comma-separated. In the case of multiple requirements, all must be satisfied so the comma separator acts as a logical AND (&&) operator.

An empty label selector (that is, one with zero requirements) selects every object in the collection.

A null label selector (which is only possible for optional selector fields) selects no objects.

Note: the label selectors of two controllers must not overlap within a namespace, otherwise they will fight with each other. Note that labels are not restricted to pods. You can apply them to all sorts of objects, such as nodes or services.

Examples
  • Label a given node:
$ kubectl label node k8s.worker1.local network=gigabit
  • With Equality-based, one may write:
$ kubectl get pods -l environment=production,tier=frontend
  • Using set-based requirements:
$ kubectl get pods -l 'environment in (production),tier in (frontend)'
  • Implement the OR operator on values:
$ kubectl get pods -l 'environment in (production, qa)'
  • Restricting negative matching via exists operator:
$ kubectl get pods -l 'environment,environment notin (frontend)'
  • Show the current labels on your pods:
$ kubectl get pods --show-labels
NAME      READY     STATUS    RESTARTS   AGE       LABELS
busybox   1/1       Running   25         9d        <none>
nfs-pod   1/1       Running   16         6d        name=nfs-pod
  • Add a label to an already running/existing pod:
$ kubectl label pods busybox owner=christoph
pod "busybox" labeled
$ kubectl get pods --show-labels
NAME      READY     STATUS    RESTARTS   AGE       LABELS
busybox   1/1       Running   25         9d        owner=christoph
nfs-pod   1/1       Running   16         6d        name=nfs-pod
  • Select a pod by its label:
$ kubectl get pods --selector owner=christoph
#~OR~
$ kubectl get pods -l owner=christoph
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   25         9d
  • Delete/remove a given label from a given pod:
$ kubectl label pod busybox owner-
pod "busybox" labeled
$ kubectl get pods --show-labels
NAME      READY     STATUS    RESTARTS   AGE       LABELS
busybox   1/1       Running   25         9d        <none>
  • Get all pods that belong to both the production and the development environments:
$ kubectl get pods -l 'env in (production, development)'
Using Labels to select a Node on which to schedule a Pod
  • Label a Node that uses SSDs as its primary HDD:
$ kubectl label node k8s.worker1.local hdd=ssd
$ cat << EOF >busybox.yml
kind: Pod
apiVersion: v1
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox
    command:
      - sleep
      - "300"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  nodeSelector: 
    hdd: ssd
EOF

Annotations

With Annotations, we can attach arbitrary, non-identifying metadata to objects, in a key-value format:

"annotations": {
  "key1" : "value1",
  "key2" : "value2"
}

The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels.

In contrast to Labels, annotations are not used to identify and select objects. Annotations can be used to:

  • Store build/release IDs, which git branch, etc.
  • Phone numbers of persons responsible or directory entries specifying where such information can be found
  • Pointers to logging, monitoring, analytics, audit repositories, debugging tools, etc.
  • Etc.

For example, while creating a Deployment, we can add a description like the one below:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: webserver
  annotations:
    description: Deployment based PoC dates 12 January 2018
....
....

We can look at annotations while describing an object:

$ kubectl describe deployment webserver
Name:                webserver
Namespace:           default
CreationTimestamp:   Fri, 12 Jan 2018 13:18:23 -0800
Labels:              app=webserver
Annotations:         deployment.kubernetes.io/revision=1
                     description=Deployment based PoC dates 12 January 2018
...
...

Jobs and CronJobs

Jobs

A Job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the Job itself is complete. Deleting a Job will cleanup the pods it created.

A simple case is to create one Job object in order to reliably run one Pod to completion. The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot).

A Job can also be used to run multiple Pods in parallel.

Example
  • Below is an example Job config. It computes π to 2000 places and prints it out. It takes around 10 seconds to complete.
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4
$ kubctl create -f ./job-pi.yml
job "pi" created
$ kubectl describe jobs/pi
Name:           pi
Namespace:      default
Selector:       controller-uid=19aa42d0-f7df-11e7-8370-08002721601f
Labels:         controller-uid=19aa42d0-f7df-11e7-8370-08002721601f
                job-name=pi
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Fri, 12 Jan 2018 13:25:23 -0800
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=19aa42d0-f7df-11e7-8370-08002721601f
           job-name=pi
  Containers:
   pi:
    Image:  perl
    Port:   <none>
    Command:
      perl
      -Mbignum=bpi
      -wle
      print bpi(2000)
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  8s    job-controller  Created pod: pi-rfvvw
  • Get the result of the Job run (i.e., the value of π):
$ pods=$(kubectl get pods --show-all --selector=job-name=pi --output=jsonpath={.items..metadata.name})
$ echo $pods
pi-rfvvw
$ kubectl logs ${pods}
3.1415926535897932384626433832795028841971693...

CronJobs

Support for creating Jobs at specified times/dates (i.e. cron) is available in Kubernetes 1.4. See here for details.

Below is an example CronJob. Every minute, it runs a simple job to print current time and then echo a "hello" string:

$ cat << EOF >cronjob.yml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure
EOF

$ kubectl create -f cronjob.yml
cronjob "hello" created

$ kubectl get cronjob hello
NAME      SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
hello     */1 * * * *   False     0         <none>          11s

$ kubectl get jobs --watch
NAME               DESIRED   SUCCESSFUL   AGE
hello-1515793140   1         1            7s

$ kubectl get cronjob hello
NAME      SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
hello     */1 * * * *   False     0         22s             48s

$ pods=$(kubectl get pods -a --selector=job-name=hello-1515793140 --output=jsonpath={.items..metadata.name})
$ echo $pods
hello-1515793140-plp8g

$ kubectl logs $pods
Fri Jan 12 21:39:07 UTC 2018
Hello from the Kubernetes cluster
  • Cleanup
$ kubectl delete cronjob hello

Quota Management

When there are many users sharing a given Kubernetes cluster, there is always a concern for fair usage. To address this concern, administrators can use the ResourceQuota object, which provides constraints that limit aggregate resource consumption per Namespace.

We can have the following types of quotas per Namespace:

  • Compute Resource Quota: We can limit the total sum of compute resources (CPU, memory, etc.) that can be requested in a given Namespace.
  • Storage Resource Quota: We can limit the total sum of storage resources (PersistentVolumeClaims, requests.storage, etc.) that can be requested.
  • Object Count Quota: We can restrict the number of objects of a given type (pods, ConfigMaps, PersistentVolumeClaims, ReplicationControllers, Services, Secrets, etc.).

Daemon Sets

In some cases, like collecting monitoring data from all nodes, or running a storage daemon on all nodes, etc., we need a specific type of Pod running on all nodes at all times. A DaemonSet is the object that allows us to do just that.

Whenever a node is added to the cluster, a Pod from a given DaemonSet is created on it. When the node dies, the respective Pods are garbage collected. If a DaemonSet is deleted, all Pods it created are deleted as well.

Example DaemonSet:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: pause-ds
spec:
  selector:
    matchLabels:
      quiet: "pod"
  template:
    metadata:
      labels:
        quiet: pod
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: pause-container
        image: k8s.gcr.io/pause:2.0

Stateful Sets

The StatefulSet controller is used for applications which require a unique identity, such as name, network identifications, strict ordering, etc. For example, MySQL cluster, etcd cluster.

The StatefulSet controller provides identity and guaranteed ordering of deployment and scaling to Pods.

Note: Before Kubernetes 1.5, the StatefulSet controller was referred to as PetSet.

Role Based Access Control (RBAC)

Role-based access control (RBAC) is an authorization mechanism for managing permissions around Kubernetes resources.

Using the RBAC API, we define a role which contains a set of additive permissions. Within a Namespace, a role is defined using the Role object. For a cluster-wide role, we need to use the ClusterRole object.

Once the roles are defined, we can bind them to a user or a set of users using RoleBinding and ClusterRoleBinding.

Using RBAC with minikube

  • Start up minikube with RBAC support:
$ minikube start --kubernetes-version=v1.9.0 --extra-config=apiserver.Authorization.Mode=RBAC
  • Setup RBAC:
$ cat rbac-cluster-role-binding.yml
# kubectl create clusterrolebinding add-on-cluster-admin \
#   --clusterrole=cluster-admin --serviceaccount=kube-system:default
#
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: kube-system-sa
subjects:
- kind: Group
  name: system:sericeaccounts:kube-system
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
$ cat rbac-setup.yml 
apiVersion: v1
kind: Namespace
metadata:
  name: rbac

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: viewer
  namespace: rbac

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin
  namespace: rbac
  • Create a Role Binding:
# kubectl create rolebinding reader-binding \
#  --clusterrole=reader \
#  --user=serviceaccount:reader \
#  --namespace:rbac
#
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: rbac
  name: reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: reader
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: ServiceAccount
  name: reader
  • Create a Role:
$ cat rbac-role.yml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  namespace: default
  name: reader
rules:
- apiGroups: [""]
  resources: ["*"]
  verbs: ["get", "watch", "list"]
  • Create an RBAC "core reader" Role with specific resources and "verbs" (i.e., the "core reader" role can "get"/"list"/etc. on specific resources (e.g., Pods, Jobs, Deployments, etc.):
$ cat rbac-role-core-reader.yml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: core-reader
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - configmaps
  - secrets
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - batch
  - extensions
  resources:
  - jobs
  - deployments
  verbs:
  - get
  - watch
  - list
  • "Gotchas":
$ cat rbac-gotcha-1.yml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: gotcha-1
rules:
- nonResourceURLs:
  - /healthz
  verbs:
  - get
  - post
- apiGroups:
  - batch
  - extensions
  resources:
  - deployments
  verbs:
  - "*"
$ cat rbac-gotcha-2.yml 
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: gotcha-2
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - "*"
  resourceNames:
  - "my_secret"
- apiGroups:
  - ""
  resources:
  - pods/logs
  verbs:
  - "get"
Privilege escalation
  • You cannot create a Role or ClusterRole that grants permissions you do not have.
  • You cannot create a RoleBinding or ClusterRoleBinding that binds to a Role with permissions you do not have (unless you have been explicitly given "bind" permission on the role).
  • Grant explicit bind access:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: role-grantor
rules:
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["rolebindings"]
  verbs: ["create"]
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["clusterroles"]
  verbs: ["bind"]
  resourceNames: ["admin", "edit", "view"]

Testing RBAC permissions

  • Example of RBAC not allowing a verb-noun:
$ kubectl auth can-i create pods
no - Required "container.pods.create" permission.
  • Example of RBAC allowing a verb-noun:
$ kubectl auth can-i create pods
yes
  • A more complex example:
$ kubectl auth can-i update deployments.apps \
  --subresource="scale" --as-group="$group" --as="$user" -n $ns

Federation

With the Kubernetes Cluster Federation we can manage multiple Kubernetes clusters from a single control plane. We can sync resources across the clusters, and have cross cluster discovery. This allows us to do Deployments across regions and access them using a global DNS record.

Federation is very useful when we want to build a hybrid solution, in which we can have one cluster running inside our private datacenter and another one on the public cloud. We can also assign weights for each cluster in the Federation, to distribute the load as per our choice.

Helm

To deploy an application, we use different Kubernetes manifests, such as Deployments, Services, Volume Claims, Ingress, etc. Sometimes, it can be tiresome to deploy them one by one. We can bundle all those manifests after templatizing them into a well-defined format, along with other metadata. Such a bundle is referred to as Chart. These Charts can then be served via repositories, such as those that we have for rpm and deb packages.

Helm is a package manager (analogous to yum and apt) for Kubernetes, which can install/update/delete those Charts in the Kubernetes cluster.

Helm has two components:

  • A client called helm, which runs on your user's workstation; and
  • A server called tiller, which runs inside your Kubernetes cluster.

The client helm connects to the server tiller to manage Charts. Charts submitted for Kubernetes are available here.

Monitoring and logging

In Kubernetes, we have to collect resource usage data by Pods, Services, nodes, etc, to understand the overall resource consumption and to take decisions for scaling a given application. Two popular Kubernetes monitoring solutions are Heapster and Prometheus.

Heapster is a cluster-wide aggregator of monitoring and event data, which is natively supported on Kubernetes.

Prometheus, now part of CNCF (Cloud Native Computing Foundation), can also be used to scrape the resource usage from different Kubernetes components and objects. Using its client libraries, we can also instrument the code of our application.

Another important aspect for troubleshooting and debugging is Logging, in which we collect the logs from different components of a given system. In Kubernetes, we can collect logs from different cluster components, objects, nodes, etc. The most common way to collect the logs is using Elasticsearch, which uses fluentd with custom configuration as an agent on the nodes. fluentd is an open source data collector, which is also part of CNCF.

cAdvisor is an open source container resource usage and performance analysis agent. It auto-discovers all containers on a node and collects CPU, memory, file system, and network usage statistics. It provides overall machine usage by analyzing the "root" container on the machine. It exposes a simple UI for local containers on port 4194.

Security

Configure network policies

A Network Policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints.

NetworkPolicy resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.

  • Specification of how groups of pods may communicate
  • Use labels to select pods and define rules
  • Implemented by the network plugin
  • Pods are non-isolated by default
  • Pods are isolated when a Network Policy selects them
Example NetworkPolicy

Create a "default" isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any ingress traffic to those pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress

TLS certificates for cluster components

Get easy-rsa.

$ ./easyrsa init-pki
$ MASTER_IP=10.100.1.2
$ ./easyrsa --batch "--req-cn=${MASTER_IP}@`date +%s`" build-ca nopass
$ cat rsa-request.sh
#!/bin/bash
./easyrsa --subject-alt-name="IP:${MASTER_IP}," \
"DNS:kubernetes," \
"DNS:kubernetes.default," \
"DNS:kubernetes.default.svc," \
"DNS:kubernetes.default.svc.cluster," \
"DNS:kubernetes.default.svc.cluster.local" \
--days=10000 \
build-server-full server nopass
pki/
├── ca.crt
├── certs_by_serial
│   └── F3A6F7D34BC84330E7375FA20C8441DF.pem
├── index.txt
├── index.txt.attr
├── index.txt.old
├── issued
│   └── server.crt
├── private
│   ├── ca.key
│   └── server.key
├── reqs
│   └── server.req
├── serial
└── serial.old
  • Figure out what are the paths of the old TLS certs/keys with the following command:
$ ps aux | grep [a]piserver | sed -n -e 's/^.*\(kube-apiserver \)/\1/p' | tr ' ' '\n'
kube-apiserver
--admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota
--requestheader-extra-headers-prefix=X-Remote-Extra-
--advertise-address=172.31.118.138
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-username-headers=X-Remote-User
--service-cluster-ip-range=10.96.0.0/12
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--secure-port=6443
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--requestheader-group-headers=X-Remote-Group
--requestheader-allowed-names=front-proxy-client
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--insecure-port=0
--enable-bootstrap-token-auth=true
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--authorization-mode=Node,RBAC
--etcd-servers=http://127.0.0.1:2379

Security Contexts

A Security Context defines privilege and access control settings for a Pod or Container. Security context settings include:

  • Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).
  • Security Enhanced Linux (SELinux): Objects are assigned security labels.
  • Running as privileged or unprivileged.
  • Linux Capabilities: Give a process some privileges, but not all the privileges of the root user.
  • AppArmor: Use program profiles to restrict the capabilities of individual programs.
  • Seccomp: Limit a process's access to open file descriptors.
  • AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This boolean directly controls whether the no_new_privs flag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged; or 2) has CAP_SYS_ADMIN.
Example #1
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 2000
  volumes:
  - name: sec-ctx-vol
    emptyDir: {}
  containers:
  - name: sec-ctx-demo
    image: gcr.io/google-samples/node-hello:1.0
    volumeMounts:
    - name: sec-ctx-vol
      mountPath: /data/demo
    securityContext:
      allowPrivilegeEscalation: false

Taints and tolerations

Node affinity is a property of pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite – they allow a node to repel a set of pods.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks the node such that the node should not accept any pods that do not tolerate the taints. Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

Remove a node from a cluster

  • On the k8s Master Node:
k8s-master> $ kubectl drain k8s-worker-02 --ignore-daemonsets
  • On the k8s Worker Node (the one you wish to remove from the cluster):
k8s-worker-02> $ kubeadm reset
[preflight] Running pre-flight checks.
[reset] Stopping the kubelet service.
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers.
[reset] No etcd manifest found in "/etc/kubernetes/manifests/etcd.yaml". Assuming external etcd.
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

Networking

Useful network ranges
  • Choose ranges for the Pods and Service CIDR blocks
  • Generally, any of the RFC-1918 ranges work well
    • 10.0.0.0/8
    • 172.0.0.0/11
    • 192.168.0.0/16

Every Pod can communicate directly with every other Pod

K8s Node
  • A general purpose compute that has at least one interface
    • The host OS will have a real-world IP for accessing the machine
    • K8s Pods are given virtual interfaces connected to an internal
    • Each nodes has a running network stack
  • Kube-proxy runs in the OS to control IPtables for:
    • Services
    • NodePorts
Networking substrate
  • Most k8s network stacks allocate subnets for each node
    • The network stack is responsible for arbitration of subnets and IPs
    • The network stack is also responsible for moving packets around the network
  • Pods have a unique, routable IP on the Pod CIDR block
    • The CIDR block is not accessed from outside the k8s cluster
    • The magic of IPtables allows the Pods to make outgoing connections
  • Ensure that k8s has the correct Pods and Service CIDR blocks

The Pod network is not seen on the physical network (i.e., it is encapsulated; you will not be able to use tcpdump on it from the physical network)

Making the setup easier — CNI
  • Use the Container Network Interface (CNI)
  • Relieves k8s from having to have a specific network configuration
  • It is activated by supplying --network-plugin=cni, --cni-conf-dir, --cni-bin-dir to kubelet
    • Typical configuration directory: /etc/cni/net.d
    • Typical bin directory: /opt/cni/bin
  • Allows for multiple backends to be used: linux-bridge, macvlan, ipvlan, Open vSwitch, network stacks
Kubernetes services
  • Services are crucial for service discovery and distributing traffic to Pods
  • Services act as simple internal load balancers with VIPs
    • No access controls
    • No traffic controls
  • IPtables magically route to virtual IPs
  • Internally, Services are used as inter-Pod service discovery
    • Kube-DNS publishes DNS record (i.e., nginx.default.svc.cluster.local)
  • Services can be exposed in three different ways:
    1. ClusterIP
    2. LoadBalancer
    3. NodePort
kube-proxy
  • Each k8s node in the cluster runs a kube-proxy
  • Two modes: userspace and iptables
    • iptables is much more performant (userspace should no longer be used
  • kube-proxy has the task of configuring iptables to expose each k8s service
    • iptables rules distributes traffic randomly across the endpoints

Network providers

In order for a CNI plugin to be considered a "Network Provider", it must provide (at the very least) the following:

  1. All containers can communicate with all other containers without NAT
  2. All nodes can communicate with all containers (and vice versa) without NAT
  3. The IP that a containers sees itself as is the same IP that others see it as

Linux namespaces

  • Control group (cgroups)
  • Union File Systems

Kubernetes inbound node port requirements

Protocol Direction Port range Purpose Used by Notes
Master node(s)
TCP Inbound 4149 Default cAdvisor port used to query container metrics (optional) Security risk
TCP Inbound 6443* Kubernetes API server All
TCP Inbound 2379-2380 etcd server client API kube-apiserver, etcd
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 10251 kube-scheduler Self
TCP Inbound 10252 kube-controller-manager Self
TCP Inbound 10255 Read-only Kubelet API (optional) Security risk
Worker node(s)
TCP Inbound 4149 Default cAdvisor port used to query container metrics (optional) Security risk
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 10255 Read-only Kubelet API (optional) Security risk
TCP Inbound 30000-32767 NodePort Services** All


** Default port range for NodePort Services.

Any port numbers marked with * are overridable, so you will need to ensure any custom ports you provide are also open.

Although etcd ports are included in master nodes, you can also host your own etcd cluster externally or on custom ports.

The pod network plugin you use (see below) may also require certain ports to be open. Since this differs with each pod network plugin, please see the documentation for the plugins about what port(s) those need.

API versions

Below is a table showing which value to use for the apiVersion key for a given k8s primitive (note: all values are for k8s 1.8.0, unless otherwise specified):

Primitive apiVersion
Pod v1
Deployment apps/v1beta2
Service v1
Job batch/v1
Ingress extensions/v1beta1
CronJob batch/v1beta1
ConfigMap v1
DaemonSet apps/v1
ReplicaSet apps/v1beta2
NetworkPolicy networking.k8s.io/v1


You can get a list of all of the API versions supported by your k8s install with:

$ kubectl api-versions

Troubleshooting

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
$ kubectl logs ${POD_NAME} ${CONTAINER_NAME}
  • If your container has previously crashed, you can access the previous container’s crash log with:
$ kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
$ kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}

Miscellaneous commands

  • Simple workflow (not a best practice; use manifest files {YAML} instead):
$ kubectl run nginx --image=nginx:1.10.0
$ kubectl expose deployment nginx --port 80 --type LoadBalancer
$ kubectl get services  # <- wait until public IP is assigned
$ kubectl scale deployment nginx --replicas 3
  • Create an Nginx deployment with three replicas without using YAML:
$ kubectl run nginx --image=nginx --replicas=3
  • Take a node out of service for maintenance:
$ kubectl cordon k8s.worker1.local
$ kubectl drain k8s.worker1.local --ignore-daemonsets
  • Return a given node to a service after cordoning and "draining" it (e.g., after a maintenance):
$ kubectl uncordon k8s.worker1.local
  • Get a list of nodes in a format useful for scripting:
$ kubectl get nodes -o jsonpath='{.items[*].metadata.name}'
#~OR~
$ kubectl get nodes -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'
#~OR~
$ kubectl get nodes -o json | jq -crM '.items[].metadata.name'
#~OR~ (if using an older version of `jq`)
$ kubectl get nodes -o json | jq '.items[].metadata.name' | tr -d '"'
  • Label a list of nodes:
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
  kubectl label nodes ${node} instancetype=ondemand;
  kubectl label nodes ${node} "example.io/node-lifecycle"=od;
done
  • Delete a bunch of Pods in "Evicted" state:
$ kubectl get pod -n develop | awk '/Evicted/{print $1}' | xargs kubectl delete pod -n develop
#~OR~
$ kubectl get po -a --all-namespaces -o json | \
    jq  '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | 
    "kubectl delete po \(.metadata.name) -n \(.metadata.namespace)"' | xargs -n 1 bash -c
  • Get a random node:
$ NODES=($(kubectl get nodes -o json | jq -crM '.items[].metadata.name'))
$ NUMNODES=${#NODES[@]}
$ echo ${NODES[$[ $RANDOM % $NUMNODES ]]}
  • Get all recent events sorted by their timestamps:
$ kubectl get events --sort-by='.metadata.creationTimestamp'
  • Get a list of all Pods in the default namespace sorted by Node:
$ kubectl get po -o wide --sort-by=.spec.nodeName
  • Get the cluster IP for a service named "foo":
$ kubectl get svc/foo -o jsonpath='{.spec.clusterIP}'
  • List all Services in a cluster and their node ports:
$ kubectl get --all-namespaces svc -o json |\
    jq -r '.items[] | [.metadata.name,([.spec.ports[].nodePort | tostring ] | join("|"))] | @csv'
  • Print just the Pod names of those Pods with the label app=nginx:
$ kubectl get --no-headers=true pods -l app=nginx -o custom-columns=:metadata.name
#~OR~
$ kubectl get pods -l app=nginx -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'
#~OR~
$ kubectl get --no-headers=true pods -l app=nginx -o name | awk -F "/" '{print $2}'
#~OR~
$ kubectl get pods -l app=nginx -o jsonpath='{.items[*].metadata.name}'
#~OR~
$ kubectl get pods -l app=nginx -o json | jq -crM '.items [] | .metadata.name'
  • Get a list of all container images used by the Pods in your default namespace:
$ kubectl get pods -o go-template --template='{{range .items}}{{racontainers}}{{.image}}{{"\n"}}{{end}}{{end}}'
#~OR~
$ kubectl get pods -o go-template="{{range .items}}{{range .spec.containers}}{{.image}}|{{end}}{{end}}" | tr '|' '\n'
  • Get a list of Pods sorted by Node name:
$ kubectl get po -o json | jq -r '.items | sort_by(.spec.nodeName)[] | [.spec.nodeName,.metadata.name] | @tsv'
  • List all Services in a cluster with their endpoints:
$ kubectl get --all-namespaces svc -o json | \
   jq -r '.items[] | [.metadata.name,([.spec.ports[].nodePort | tostring ] | join("|"))] | @csv'
  • Get status transitions of each Pod in the default namespace:
$ export tpl='{range .items[*]}{"\n"}{@.metadata.name}{range @.status.conditions[*]}{"\t"}{@.type}={@.status}{end}{end}'
$ kubectl get po -o jsonpath="${tpl}" && echo

cheddar-cheese-d6d6587c7-4bgcz	Initialized=True	Ready=True	PodScheduled=True
echoserver-55f97d5bff-pdv65	Initialized=True	Ready=True	PodScheduled=True
stilton-cheese-6d64cbc79-g7h4w	Initialized=True	Ready=True	PodScheduled=True
  • Get a list of all Pods in status "Failed":
$ kubectl get pods -o go-template='{{range .items}}{{if eq .status.phase "Failed"}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}'
  • Get all users in all namespaces:
$ kubectl get rolebindings --all-namepsaces -o go-template \
   --template='{{range .items}}{{println}}{{.metadata.namespace}}={{range .subjects}}{{if eq .kind "User"}}{{.name}} {{end}}{{end}}{{end}}'
  • Get the memory limit assigned to a container in a given Pod:
$ kubectl get pod example-pod-name -n default \
  -o jsonpath="{.spec.containers[*].resources.limits}" 
  • Get a Bash prompt of your current context and namespace:
NORMAL="\[\033[00m\]"
BLUE="\[\033[01;34m\]"
RED="\[\e[1;31m\]"
YELLOW="\[\e[1;33m\]"
GREEN="\[\e[1;32m\]"
PS1_WORKDIR="\w"
PS1_HOSTNAME="\h"
PS1_USER="\u"

__kube_ps1()
{
    CONTEXT=$(kubectl config current-context)
    NAMESPACE=$(kubectl config view -o jsonpath="{.contexts[?(@.name==\"${CONTEXT}\")].context.namespace}")
    if [ -z "$NAMESPACE"]; then
        NAMESPACE="default"
    fi
    if [ -n "$CONTEXT" ]; then
        case "$CONTEXT" in
          *prod*)
            echo "${RED}(⎈ ${CONTEXT} - ${NAMESPACE})"
            ;;
          *test*)
            echo "${YELLOW}(⎈ ${CONTEXT} - ${NAMESPACE})"
            ;;
          *)
            echo "${GREEN}(⎈ ${CONTEXT} - ${NAMESPACE})"
            ;;
        esac
    fi
}

export PROMPT_COMMAND='PS1="${GREEN}${PS1_USER}@${PS1_HOSTNAME}${NORMAL}:$(__kube_ps1)${BLUE}${PS1_WORKDIR}${NORMAL}\$ "'

Client configuration

  • Setup autocomplete in bash; bash-completion package should be installed first:
$ source <(kubectl completion bash)
  • View Kubernetes config:
$ kubectl config view
  • View specific config items by JSON path:
$ kubectl config view -o jsonpath='{.users[?(@.name == "k8s")].user.password}'
  • Set credentials for foo.kuberntes.com:
$ kubectl config set-credentials kubeuser/foo.kubernetes.com --username=kubeuser --password=kubepassword

Viewing / finding resources

  • List all services in the namespace:
$ kubectl get services
  • List all pods in all namespaces in wide format:
$ kubectl get pods -o wide --all-namespaces
  • List all pods in JSON (or YAML) format:
$ kubectl get pods -o json
  • Describe resource details (node, pod, svc):
$ kubectl describe nodes my-node
  • List services sorted by name:
$ kubectl get services --sort-by=.metadata.name
  • List pods sorted by restart count:
$ kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'
  • Rolling update pods for frontend-v1:
$ kubectl rolling-update frontend-v1 -f frontend-v2.json
  • Scale a ReplicaSet named "foo" to 3:
$ kubectl scale --replicas=3 rs/foo
  • Scale a resource specified in "foo.yaml" to 3:
$ kubectl scale --replicas=3 -f foo.yaml
  • Execute a command in every pod / replica:
$ for i in 0 1; do kubectl exec foo-$i -- sh -c 'echo $(hostname) > /usr/share/nginx/html/index.html'; done
  • Get a list of all container IDs running in all Pods in all namespaces for a given Kubernetes cluster:
$ kubectl get pods --all-namespaces \
    -o jsonpath='{range .items[*]}{"pod: "}{.metadata.name}{"\n"}{range .status.containerStatuses[*]}{"\tname: "}{.containerID}{"\n\timage: "}{.image}{"\n"}{end}'

# Example output:
pod: cert-manager-848f547974-8m2k6
        name: containerd://358415173310a528a36ca2c19cdc3319f8fd96634c09957977767333b104d387
        image: quay.io/jetstack/cert-manager-controller:v1.5.3

Manage resources

  • Get documentation for pod or service:
$ kubectl explain pods,svc
  • Create resource(s) like pods, services or DaemonSets:
$ kubectl create -f ./my-manifest.yaml
  • Apply a configuration to a resource:
$ kubectl apply -f ./my-manifest.yaml
  • Start a single instance of Nginx:
$ kubectl run nginx --image=nginx
  • Create a secret with several keys:
$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Secret
metadata:
 name: mysecret
type: Opaque
data:
 password: $(echo "s33msi4" | base64)
 username: $(echo "jane"| base64)
EOF
  • Delete a resource:
$ kubectl delete -f ./my-manifest.yaml

Monitoring and logging

  • Deploy Heapster from Github repository:
$ kubectl create -f deploy/kube-config/standalone/
  • Show metrics for nodes:
$ kubectl top node
  • Show metrics for pods:
$ kubectl top pod
  • Show metrics for a given pod and its containers:
$ kubectl top pod pod_name --containers
  • Dump pod logs (STDOUT):
$ kubectl logs pod_name
  • Stream pod container logs (STDOUT, multi-container case):
$ kubectl logs -f pod_name -c my-container


Run tcpdump on containers running in Pods

  • Find which node/host/IP the Pod in question is running on and also get the container ID:
$ kubectl describe pod busybox | grep -E "^Node:|Container ID: "
Node:         worker2/10.39.32.122
    Container ID:  docker://a42cd31e62a905739b52d36b30eca5521fd250ac54280b43423027426b031a03

#~OR~

$ containerID=$(kubectl get po busybox -o jsonpath='{.status.containerStatuses[*].containerID}' | sed -e 's|docker://||g')
$ hostIP=$(kubectl get po busybox -o jsonpath='{.status.hostIP}')

Log into the node/host running the Pod in question and then perform the following steps.

  • Get the virtual interface ID (note it will depend on which Container Network Interface you are using {e.g., veth, cali, etc.}):
$ docker exec a42cd31e62a905739b52d36b30eca5521fd250ac54280b43423027426b031a03 /bin/sh -c 'cat /sys/class/net/eth0/iflink'
12

# List all non-virtual interfaces:
$ for iface in $(find /sys/class/net/ -type l ! -lname '*/devices/virtual/net/*' -printf '%f '); do echo "$iface is not virtual"; done
ens192 is not virtual

# Check if we are using veth or cali or something else:
$ ls -1 /sys/class/net/ | awk '!/docker|lo|ens/{print substr($0,0,4);exit}'
cali

$ for i in /sys/class/net/veth*/ifindex; do grep -l 12 $i; done
#~OR~
$ for i in /sys/class/net/cali*/ifindex; do grep -l 12 $i; done
/sys/class/net/cali12d4a061371/ifindex
#~OR~
echo $(find /sys/class/net/ -type l -lname '*/devices/virtual/net/*' -exec grep -l 12 {}/ifindex \;) | awk -F'/' '{print $5}'
cali12d4a061371
#~OR~
$ ip link | grep ^12
12: cali12d4a061371@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP mode DEFAULT group default
#~OR~
$ ip link | awk '/^12/{print $2}' | awk -F'@' '{print $1}'
cali12d4a061371
  • Now run tcpdump on this virtual interface (note: make sure you are running tcpdump on the same host as the Pod is running on):
$ sudo tcpdump -i cali12d4a061371
Self-signed certificates

If you are using the latest version of kubectl and are running it against a k8s cluster built with a self-signed cert, you can get around any "x509" errors with:

$ export GODEBUG=x509ignoreCN=0

API resources

  • Get a list of all the resource types and their latest supported version:
$ time for kind in $(kubectl api-resources | tail +2 | awk '{print $1}'); do
    kubectl explain ${kind};
  done | grep -E "^KIND:|^VERSION:"

KIND:     Binding
VERSION:  v1
KIND:     ComponentStatus
VERSION:  v1
KIND:     ConfigMap
VERSION:  v1
...

real	1m20.014s
user	0m52.732s
sys	0m17.751s
  • Note: if you just want a version for a single/given kind:
$ kubectl explain deploy | head -2
KIND:     Deployment
VERSION:  apps/v1

kubectl-neat

See: https://github.com/itaysk/kubectl-neat
See: jq
  • To easily copy a certificate secret from one namespace to another namespace run:
$ SOURCE_NAMESPACE=<update-me>
$ DESTINATION_NAMESPACE=<update-me>
$ kubectl -n ${SOURCE_NAMESPACE} get secret kafka-client-credentials -o json |\
    kubectl neat |\
    jq 'del(.metadata["namespace"])' |\
    kubectl apply -n ${DESTINATION_NAMESPACE} -f -

Get CPU/memory for each node

for node in $(kubectl get nodes -o=jsonpath='{.items[*].metadata.name}'); do
  echo "NODE: ${node}"; kubectl describe node ${node} | grep -E '^  cpu |^  memory ';
done

Get vCPU capacity

$ kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{\"\t\"} \
    {.status.capacity.cpu}{\"\n\"}{end}"

Miscellaneous examples

  • Create a Namespace:
kind: Namespace
apiVersion: v1
metadata:
  name: my-namespace
Testing the load balancing capabilities of a Service
  • Create a Deployment with two replicas of Nginx (i.e., 2 x Pods with identical containers, configuration, etc.):
$ cat << EOF >nginx-deploy.yml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx-deploy
spec:
  replicas: 2
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
EOF
$ kubectl create --validate -f nginx-deploy.yml
$ kubectl get deploy
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx-deploy   2         2         2            2           1h
$ kubectl get po
NAME                           READY     STATUS    RESTARTS   AGE
nginx-deploy-8d68fb6cc-bspt8   1/1       Running   1          1h
nginx-deploy-8d68fb6cc-qdvhg   1/1       Running   1          1h
  • Create a Service:
$ cat <<EOF | kubectl create -f -
kind: Service
apiVersion: v1
metadata:
  name: nginx-svc
spec:
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
  selector:
    app: nginx
EOF

$ kubectl get svc/nginx-svc
NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nginx-svc   ClusterIP   10.101.133.100   <none>        8080/TCP   1h
  • Overwrite the default index.html file (note: This is not persistent. The original default index.html file will be restored if the Pod fails and the Deployment brings up a new Pod and/or if you modify your Deployment {e.g., upgrade Nginx}. This is just for demonstration purposes):
$ kubectl exec -it nginx-8d68fb6cc-bspt8 -- sh -c 'echo "pod-01" > /usr/share/nginx/html/index.html'
$ kubectl exec -it nginx-8d68fb6cc-qdvhg -- sh -c 'echo "pod-02" > /usr/share/nginx/html/index.html'
  • Get the HTTP status code and server value from the header of a request to the Service endpoint:
$ curl -Is 10.101.133.100:8080 | grep -E '^HTTP|Server'
HTTP/1.1 200 OK
Server: nginx/1.7.9  # <- This is the version of Nginx we defined in the Deployment above
  • Perform a GET request on the Service endpoint (ClusterIP+Port):
$ for i in $(seq 1 10); do curl -s 10.101.133.100:8080; done
pod-02
pod-01
pod-02
pod-02
pod-02
pod-01
pod-02
pod-02
pod-02
pod-02

Sometimes pod-01 responded; sometimes pod-02 responded.

  • Perform a GET on the Service endpoint 10,000 times and sum up which Pod responded for each request:
$ time for i in $(seq 1 10000); do curl -s 10.101.133.100:8080; done | sort | uniq -c
   5018 pod-01  # <- number of times pod-01 responded to the request
   4982 pod-02  # <- number of times pod-02 responded to the request

real	1m0.639s
user	0m29.808s
sys	0m11.692s
$ awk 'BEGIN{print 5018/(5018+4982);}'
0.5018
$ awk 'BEGIN{print 4982/(5018+4982);}'
0.4982

So, our Service is "load balancing" our two Nginx Pods in a roughly 50/50 fashion.

In order to double-check that the Service is randomly selecting a Pod to serve the GET request, let's scale our Deployment from 2 to 3 replicas:

$ kubectl scale deploy/nginx-deploy --replicas=3
$ time for i in $(seq 1 10000); do curl -s 10.101.133.100:8080; done | sort | uniq -c
   3392 pod-01
   3335 pod-02
   3273 pod-03

real	0m59.537s
user	0m25.932s
sys	0m9.656s
$ awk 'BEGIN{print 3392/(3392+3335+3273);}'
0.3392
$ awk 'BEGIN{print 3335/(3392+3335+3273);}'
0.3335
$ awk 'BEGIN{print 3273/(3392+3335+3273);}'
0.3273

Sure enough. Each of the 3 Pods is serving the GET request roughly 33% of the time.

Query selections
  • Create a "query selection" file:
$ cat << EOF >cluster-nodes-health.txt
Name Kernel InternalIP MemoryPressure DiskPressure PIDPressure Ready
.metadata.name .status.nodeInfo.kernelVersion .status.addresses[0].address .status.conditions[0].status .status.conditions[1].status .status.conditions[2].status .status.conditions[3].status
EOF
  • Use the above "query selection" file:
$ kubectl get nodes -o custom-columns-file=cluster-nodes-health.txt
Name           Kernel           InternalIP     MemoryPressure   DiskPressure   PIDPressure   Ready
10.10.10.152   5.4.0-1084-aws   10.10.10.152   False            False          False         False
10.10.11.12    5.4.0-1092-aws   10.10.11.12    False            False          False         False
10.10.12.22    5.4.0-1039-aws   10.10.12.22    False            False          False         False

Example YAML files

  • Basic Pod using busybox:
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  • Basic Pod using busybox, which also prints out environment variables (including the ones defined in the YAML):
apiVersion: v1
kind: Pod
metadata:
  name: env-dump
spec:
  containers:
  - name: busybox
    image: busybox
    command:
      - env
    env:
    - name: USERNAME
      value: "Christoph"
    - name: PASSWORD
      value: "mypassword"
$ kubectl logs env-dump
...
PASSWORD=mypassword
USERNAME=Christoph
...
  • Basic Pod using alpine:
kind: Pod
apiVersion: v1
metadata:
  name: alpine
  namespace: default
spec:
  containers:
  - name: alpine
    image: alpine
    command:
      - /bin/sh
      - "-c"
      - "sleep 60m"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
  • Basic Pod running Nginx:
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  -  name: nginx
     image: nginx
  restartPolicy: Always
  • Create a Job that calculates pi up to 2000 decimal places:
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4
  • Create a Deployment with two replicas of Nginx running:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.9.1
        ports:
        - containerPort: 80
  • Create a basic Persistent Volume, which uses NFS:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mypv
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    path: /var/nfs/general
    server: 172.31.119.58
    readOnly: false
  • Create a Persistent Volume Claim against the above PV:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  • Create a Pod using a customer scheduler (i.e., not the default one):
apiVersion: v1
kind: Pod
metadata:
  name: my-custom-scheduler
  annotations:
    scheduledBy: custom-scheduler
spec:
  schedulerName: custom-scheduler
  containers:
  - name: pod-container
    image: k8s.gcr.io/pause:2.0

Install k8s cluster manually in the Cloud

Note: For this example, I will be using AWS and I will assume you already have 3 x EC2 instances running CentOS 7 in your AWS account. I will install Kubernetes 1.10.x.

  • Disable services not supported (yet) by Kubernetes:
$ sudo setenforce 0 # NOTE: Not persistent!
#~OR~ Make persistent:
$ sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config

$ sudo systemctl stop firewalld
$ sudo systemctl mask firewalld
$ sudo yum install -y iptables-services
  • Disable swap:
$ sudo swapoff -a  # NOTE: Not persistent!
#~OR~ Make persistent:
$ sudo vi /etc/fstab  # comment out swap line
$ sudo mount -a
  • Make sure routed traffic does not bypass iptables:
$ cat << EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
$ sudo sysctl --system
  • Install kubelet, kubeadm, and kubectl on all nodes in your cluster (both Master and Worker nodes):
$ cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
$ sudo yum install -y kubelet kubeadm kubectl
$ sudo systemctl enable kubelet && sudo systemctl start kubelet
  • Configure cgroup driver used by kubelet on all nodes (both Master and Worker nodes):

Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. Verify that your Docker cgroup driver matches the kubelet config:

$ docker info | grep -i cgroup
$ grep -i cgroup /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

If the Docker cgroup driver and the kubelet config do not match, change the kubelet config to match the Docker cgroup driver. The flag you need to change is --cgroup-driver. If it is already set, you can update like so:

$ sudo sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Otherwise, you will need to open the systemd file and add the flag to an existing environment line.

Then restart kubelet:

$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet
  • Run kubeadm on Master node:

K8s requires a pod network to function. We are going to use Flannel, so we need to pass in a flag to the deployment script so k8s knows how to configure itself:

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Note: This command might take a fair amount of time to complete.

Once it has completed, make note of the "join" command output by kubeadm init that looks something like the following (DO NOT RUN THE FOLLOWING COMMAND YET!):

# kubeadm join --token --discovery-token-ca-cert-hash sha256:

You will run that command on the other non-master nodes (aka the "Worker Nodes") to allow them to join the cluster. However, do not run that command on the worker nodes until you have completed all of the following steps.

  • Create a directory:
$ mkdir -p $HOME/.kube
  • Copy the configuration files to a location usable by the local user:
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config 
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • In order for your pods to communicate with one another, you will need to install pod networking. We are going to use Flannel for our Container Network Interface (CNI) because it is easy to install and reliable.
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
  • Make sure everything is coming up properly:
$ kubectl get pods --all-namespaces --watch

Once the kube-dns-xxxx containers are up (i.e., in Status "Running"), your cluster is ready to accept worker nodes.

  • On each of the Worker nodes, run the sudo kubeadm join ... command that kubeadm init created for you (see above).
  • On the Master Node, run the following command:
$ kubectl get nodes --watch

Once the Status of the Worker Nodes returns "Ready", your k8s cluster is ready to use.

  • Example output of successful Kubernetes cluster:
$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
k8s-01    Ready     master    13m       v1.10.1
k8s-02    Ready     <none>    12m       v1.10.1
k8s-03    Ready     <none>    12m       v1.10.1

That's it! You are now ready to start deploying Pods, Deployments, Services, etc. in your Kubernetes cluster!

Bash completion

Note: The following only works on newer versions. I have tested that this works on version 1.9.1.

Add the following line to your ~/.bashrc file:

source <(kubectl completion bash)

Kubectl plugins

SEE: Extend kubectl with plugins for details.

FEATURE STATE: Kubernetes v1.11 (alpha)
FEATURE STATE: Kubernetes v1.15 (stable)

This section shows you how to install and write extensions for kubectl. Usually called "plugins" or "binary extensions", this feature allows you to extend the default set of commands available in kubectl by adding new sub-commands to perform new tasks and extend the set of features available in the main distribution of kubectl.

Get code from here.

.kube/
└── plugins
    └── aging
        ├── aging.rb
        └── plugin.yaml
$ chmod 0700 .kube/plugins/aging/aging.rb
  • See options:
$ kubectl plugin aging --help
Aging shows pods from the current namespace by age.

Usage:
  kubectl plugin aging [flags] [options]
  • Usage:
$ kubectl plugin aging
The Magnificent Aging Plugin.

nginx-deployment-67594d6bf6-5t8m9: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 6 hours and 8 minutes

nginx-deployment-67594d6bf6-6kw9j: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 6 hours and 8 minutes

nginx-deployment-67594d6bf6-d8dwt: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 6 hours and 8 minutes

Local Kubernetes

Local Kubernetes Comparisons
Feature kind k3d minikube Docker Desktop Rancher Desktop
Free yes yes yes Personal Small Business* yes
Install easy easy easy easy medium (you may encounter odd scenarios)
Ease of Use medium medium medium easy easy
Stability stable stable stable stable stable
Cross-platform yes yes yes yes yes
CI Usage yes yes yes no no
Multiple clusters yes yes yes no no
Podman support yes yes yes no no
Host volumes mount support yes yes yes (with some performance limitations) yes yes (only pre-defined paths)
Kubernetes service port-forwarding/mapping yes yes yes yes yes
Pull-through Docker mirror/proxy yes yes no yes (can reference locally available images) yes (can reference locally available images)
Custom CNI yes (ex: calico) yes (ex: flannel) yes (ex: calico) no no
Features Gates yes yes yes yes (but not natively; requires hacky setup) yes (but not natively; requires hacky setup)


Source

See also

External links

Playgrounds

Tools

  • minikube — Run Kubernetes locally
  • kindKubernetes IN Docker (local clusters for testing Kubernetes)
  • kops — Kubernetes Operations (kops) - Production Grade K8s Installation, Upgrades, and Management
  • kube-aws — a command-line tool to create/update/destroy Kubernetes clusters on AWS
  • kubespray — Deploy a production ready kubernetes cluster
  • Rook.io — File, Block, and Object Storage Services for your Cloud-Native Environments

Resources

Training

Blog posts