Kuerbernetes (k8s) is an open source container cluster manager. Kubernetes' primary goal is to provide a platform for automating deployment, scaling, and operations of application containers across a luster of hosts. Kubernetes was released by Google on July 2015.
- 1 Design overview
- 2 Components
- 3 Setup a Kubernetes cluster
- 4 Working with our Kubernetes cluster
- 4.1 Create and deploy pod definitions
- 4.2 Tags, labels, and selectors
- 4.3 Deployments
- 4.4 Multi-Pod (container) replication controller
- 4.5 Create and deploy service definitions
- 4.6 Creating temporary Pods at the CLI
- 4.7 Interacting with Pod containers
- 4.8 Logs
- 4.9 Autoscaling and scaling Pods
- 4.10 Failure and recovery
- 5 Minikube
- 6 External links
Design overview
Kubernetes is built through the definition of a set of components (building blocks or "primitives") which, when used collectively, provide a method for the deployment, maintenance, and scalability of container-based application clusters.
These "primitives" are designed to be loosely coupled (i.e., where little to no knowledge of the other component definitions is needed to use) as well as easily extensible through an API. Both the internal components of Kubernetes as well as the extensions and containers make use of this API.
The building blocks of Kubernetes are the following:
- Cluster
- A cluster is a set of machines (physical or virtual) on which your applications are managed and run. All machines are managed as a cluster (or set of clusters, depending on the topology used).
- Nodes (minions)
- You can think of these as "container clients". These are the individual hosts (physical or virtual) that Docker is installed on and hosts the various containers within your managed cluster.
- Each node will run etcd (a key-pair management and communication service, used by Kubernetes for exchanging messages and reporting on cluster status) as well as the Kubernetes Proxy.
- Pods
- A pod consists of one or more containers. Those containers are guaranteed (by the cluster controller) to be located on the same host machine (aka "co-located") in order to facilitate sharing of resources. For an example, it makes sense to have database processes and data containers as close as possible. In fact, they really should be in the same pod.
- Pods "work together", like in a multi-tiered application configuration. Each set of pods that define and implement a service (like MySQL or Apache) are defined by the label selector.
- Pods are assigned unique IPs within each cluster. These allow an application to use ports without having to worry about conflicting port utilization.
- Pods can contain definitions of disk volumes or shares, and then provide access from those to all the members (containers) within the pod.
- Finally, pod management is done through the API or delegated to a controller.
- Labels
- Clients can attach "key-value pairs" to any object in the system (like Pods or Nodes). These become the labels that identify them in the configuration and management of them. The key-value pairs can be used to filter, organize, and perform mass operations on a set of resources.
- Selectors
- Label Selectors represent queries that are made against those labels. They resolve to the corresponding matching objects. A Selector expression matches labels to filter certain resources. For example, you may want to search for all pods that belong to a certain service, or find all containers that have a specific tier Label value as "database". Labels and Selectors are inherently two sides of the same coin. You can use Labels to classify resources and use Selectors to find them and use them for certain actions.
- These two items are the primary way that grouping is done in Kubernetes and determine which components that a given operation applies to when indicated.
- Controllers
- These are used in the management of your cluster. Controllers are the mechanism by which your desired configuration state is enforced.
- Controllers manage a set of pods and, depending on the desired configuration state, may engage other controllers to handle replication and scaling (Replication Controller) of X number of containers and pods across the cluster. It is also responsible for replacing any container in a pod that fails (based on the desired state of the cluster).
- Replication Controllers (RC) are a subset of Controllers and are an abstraction used to manage pod lifecycles. One of the key uses of RCs is to maintain a certain number of running Pods (e.g., for scaling or ensuring that at least one Pod is running at all times, etc.). It is considered a "best practice" to use RCs to define Pod lifecycles, rather than creating Pods directly.
- Other controllers that can be engaged include a DaemonSet Controller (enforces a 1-to-1 ratio of pods to minions) and a Job Controller (that runs pods to "completion", such as in batch jobs).
- Each set of pods any controller manages, is determined by the label selectors that are part of its definition.
- Replica Sets
- These define how many replicas of each Pod will be running. They also monitor and ensure the required number of Pods are running, replacing Pods that die. Replica Sets can act as replacements for Replication Controllers.
- Services
- A Service is an abstraction on top of Pods, which provides a single IP address and DNS name by which the Pods can be accessed. This load balancing configuration is much easier to manage and helps scale Pods seamlessly.
- Kubernetes can then provide service discovery and handle routing with the static IP for each pod as well as load balancing (round robin based) connections to that service among the pods that match the label selector indicated.
- By default, although a service is only exposed inside a cluster, it can also be exposed outside a cluster, as needed.
- Volumes
- A Volume is a directory with data, which is accessible to a container. The volume co-terminates with the Pods that encloses it.
- Name
- A name by which a resource is identified.
- Namespace
- A Namespace provides additional qualification to a resource name. This is especially helpful when multiple teams/projects are using the same cluster and there is a potential for name collision. You can think of a Namespace as a virtual wall between multiple clusters.
- Annotations
- An Annotation is a Label, but with much larger data capacity. Typically, this data is not readable by humans and is not easy to filter through. Annotation is useful only for storing data that may not be searched, but is required by the resource (e.g., storing strong keys, etc.).
- Control Pane
A Pod is the smallest and simplest Kubernetes object. It is the unit of deployment in Kubernetes, which represents a single instance of the application. A Pod is a logical collection of one or more containers, which:
- Are scheduled together on the same host
- Share the same network namespace
- Mount the same external storage (Volumes).
Pods are ephemeral in nature, and they do not have the capability to self-heal by themselves. That is why we use them with controllers, which can handle a Pod's replication, fault tolerance, self-heal, etc. Examples of controllers are Deployments, ReplicaSets, ReplicationControllers, etc. We attach the Pod's specification to other objects using Pod Templates (see below).
Labels are key-value pairs that can be attached to any Kubernetes object (e.g. Pods). Labels are used to organize and select a subset of objects, based on the requirements in place. Many objects can have the same label(s). Labels do not provide uniqueness to objects.
Label Selectors
With Label Selectors, we can select a subset of objects. Kubernetes supports two types of Selectors:
- Equality-Based Selectors
- Equality-Based Selectors allow filtering of objects based on label keys and values. With this type of Selectors, we can use the
, or!=
operators. For example, withenv==dev
, we are selecting the objects where the "env
" label is set to "dev
". - Set-Based Selectors
- Set-Based Selectors allow filtering of objects based on a set of values. With this type of Selectors, we can use the
, andexist
operators. For example, withenv in (dev,qa)
, we are selecting objects where the "env
" label is set to "dev
" or "qa
Replication Controllers
A ReplicationController (rc) is a controller that is part of the Master Node's Controller Manager. It makes sure the specified number of replicas for a Pod is running at any given point in time. If there are more Pods than the desired count, the ReplicationController would kill the extra Pods, and, if there are less Pods, then the ReplicationController would create more Pods to match the desired count. Generally, we do not deploy a Pod independently, as it would not be able to re-start itself, if something goes wrong. We always use controllers like ReplicationController to create and manage Pods.
Replica Sets
A ReplicaSet (rs) is the next-generation ReplicationController. ReplicaSets support both equality- and set-based Selectors, whereas ReplicationControllers only support equality-based Selectors. As of January 2018, this is the only difference.
As an example, say you create a ReplicaSet where you defined a "desired replicas = 3" (and set "current==desired
"), any time "current!=desired
" (i.e., one of the Pods dies) the ReplicaSet will detect that the current state is no longer matching the desired state. So, in our given scenario, the ReplicaSet will create one more Pod, thus ensuring that the current state matches the desired state.
ReplicaSets can be used independently, but they are mostly used by Deployments to orchestrate the Pod creation, deletion, and updates. A Deployment automatically creates the ReplicaSets, and we do not have to worry about managing them.
Deployment objects provide declarative updates to Pods and ReplicaSets. The DeploymentController is part of the Master Node's Controller Manager, and it makes sure that the current state always matches the desired state.
As an example, let's say we have a Deployment which creates a "ReplicaSet A". ReplicaSet A then creates 3 Pods. In each Pod, one of the containers uses the nginx:1.7.9
Now, in the Deployment, we change the Pod's template and we update the image for the Nginx container from nginx:1.7.9
to nginx:1.9.1
. As we have modified the Pod's template, a new "ReplicaSet B" gets created. This process is referred to as a "Deployment rollout". (A rollout is only triggered when we update the Pod's template for a deployment. Operations like scaling the deployment do not trigger the deployment.) Once ReplicaSet B is ready, the Deployment starts pointing to it.
On top of ReplicaSets, Deployments provide features like Deployment recording, with which, if something goes wrong, we can rollback to a previously known state.
If we have numerous users whom we would like to organize into teams/projects, we can partition the Kubernetes cluster into sub-clusters using Namespaces. The names of the resources/objects created inside a Namespace are unique, but not across Namespaces.
To list all the Namespaces, we can run the following command:
$ kubectl get namespaces NAME STATUS AGE default Active 2h kube-public Active 2h kube-system Active 2h
Generally, Kubernetes creates two default namespaces: kube-system
and default
. The kube-system
namespace contains the objects created by the Kubernetes system. The default
namespace contains the objects which belong to any other namespace. By default, we connect to the default
Namespace. kube-public
is a special namespace, which is readable by all users and used for special purposes, like bootstrapping a cluster.
Using Resource Quotas, we can divide the cluster resources within Namespaces.
Component services
The component services running on a standard master/node(s) Kubernetes setup are as follows:
- Kubernetes Master
- kube-apiserver
- Exposes Kubernetes APIs
- kube-controller-manager
- Runs controllers to handle nodes, endpoints, etc.
- kube-scheduler
- Watches for new pods and assigns them nodes
- etcd
- Distributed key-value store
- [optional] DNS for Kubernetes services
- Nodes
- kubelet
- Manages pods on node, volumes, secrets, creating new containers, health checks, etc.
- kube-proxy
- Maintains network rules, port forwarding, etc.
Setup a Kubernetes cluster
In this section, I will show you how to setup a Kubernetes cluster with etcd and Docker. The cluster will consist of 1 master host and 3 minions (aka nodes).
Setup VMs
For this demo, I will be creating 4 VMs via Vagrant (with VirtualBox).
- Create Vagrant demo environment:
$ mkdir $HOME/dev/kubernetes && cd $_
- Create Vagrantfile with the following contents:
# -*- mode: ruby -*- # vi: set ft=ruby : require 'yaml' VAGRANTFILE_API_VERSION = "2" $common_script = <<COMMON_SCRIPT # Set verbose set -v # Set exit on error set -e echo -e "$(date) [INFO] Starting modified Vagrant..." sudo yum update -y # Timestamp provision date > /etc/vagrant_provisioned_at COMMON_SCRIPT unless defined? CONFIG configuration_file = File.join(File.dirname(__FILE__), 'vagrant_config.yml') CONFIG = YAML.load(File.open(configuration_file, File::RDONLY).read) end CONFIG['box'] = {} unless CONFIG.key?('box') def modifyvm_network(node) node.vm.provider "virtualbox" do |vbox| vbox.customize ["modifyvm", :id, "--nicpromisc1", "allow-all"] #vbox.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] vbox.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"] end end def modifyvm_resources(node, memory, cpus) node.vm.provider "virtualbox" do |vbox| vbox.customize ["modifyvm", :id, "--memory", memory] vbox.customize ["modifyvm", :id, "--cpus", cpus] end end ## START: Actual Vagrant process Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.box = CONFIG['box']['name'] # Uncomment the following line if you wish to be able to pass files from # your local filesystem directly into the vagrant VM: #config.vm.synced_folder "data", "/vagrant" ## VM: k8s master ############################################################# config.vm.define "master" do |node| node.vm.hostname = "k8s.master.dev" node.vm.provision "shell", inline: $common_script #node.vm.network "forwarded_port", guest: 80, host: 8080 node.vm.network "private_network", ip: CONFIG['host_groups']['master'] # Uncomment the following if you wish to define CPU/memory: #node.vm.provider "virtualbox" do |vbox| # vbox.customize ["modifyvm", :id, "--memory", "4096"] # vbox.customize ["modifyvm", :id, "--cpus", "2"] #end #modifyvm_resources(node, "4096", "2") end ## VM: k8s minion1 ############################################################ config.vm.define "minion1" do |node| node.vm.hostname = "k8s.minion1.dev" node.vm.provision "shell", inline: $common_script node.vm.network "private_network", ip: CONFIG['host_groups']['minion1'] end ## VM: k8s minion2 ############################################################ config.vm.define "minion2" do |node| node.vm.hostname = "k8s.minion2.dev" node.vm.provision "shell", inline: $common_script node.vm.network "private_network", ip: CONFIG['host_groups']['minion2'] end ## VM: k8s minion3 ############################################################ config.vm.define "minion3" do |node| node.vm.hostname = "k8s.minion3.dev" node.vm.provision "shell", inline: $common_script node.vm.network "private_network", ip: CONFIG['host_groups']['minion3'] end ############################################################################### end
The above Vagrantfile uses the following configuration file:
$ cat vagrant_config.yml
--- box: name: centos/7 storage_controller: 'SATA Controller' debug: false development: false network: dns1: dns2: internal: network: external: start: end: network: bridge: wlan0 netmask: broadcast: host_groups: master: minion1: minion2: minion3:
- In the Vagrant Kubernetes directory (i.e.,
), run the following command:
$ vagrant up
Setup hosts
Note: Run the following commands/steps on all hosts (master and minions).
- Log into the k8s master host:
$ vagrant ssh master
- Kubernetes cluster
$ cat << EOF >> /etc/hosts k8s.master.dev k8s.minion1.dev k8s.minion2.dev k8s.minion3.dev EOF
- Install, enable, and start NTP:
$ yum install -y ntp $ systemctl enable ntpd && systemctl start ntpd $ timedatectl
- Disable any firewall rules (for now; we will add the rules back later):
$ systemctl stop firewalld && systemctl disable firewalld $ systemctl stop iptables
- Disable SELinux (for now; we will turn it on again later):
$ setenforce 0 $ sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/sysconfig/selinux $ sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config $ sestatus
- Add the Docker repo and update yum:
$ cat << EOF > /etc/yum.repos.d/virt7-docker-common-release.repo [virt7-docker-common-release] name=virr7-docker-common-release baseurl=http://cbs.centos.org/repos/virt7-docker-common-release/x86_64/os/ gpgcheck=0 EOF $ yum update
- Install Docker, Kubernetes, and etcd:
$ yum install -y --enablerepo=virt7-docker-common-release kubernetes docker etcd
Install and configure master controller
Note: Run the following commands on only the master host.
- Edit
and add (or make changes to) the following lines:
KUBE_MASTER="--master=http://k8s.master.dev:8080" KUBE_ETCD_SERVERS="--etcd-servers=http://k8s.master.dev:2379"
- Edit
and add (or make changes to) the following lines:
- Edit
and add (or make changes to) the following lines:
# The address on the local server to listen to. #KUBE_API_ADDRESS="--insecure-bind-address=" KUBE_API_ADDRESS="--address=" # The port on the local server to listen on. KUBE_API_PORT="--port=8080" # Port minions listen on KUBELET_PORT="--kubelet-port=10250" # Comma separated list of nodes in the etcd cluster KUBE_ETCD_SERVERS="--etcd-servers=" # Address range to use for services KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=" # default admission control policies #KUBE_ADMISSION_CONTROL="--admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota" # Add your own! KUBE_API_ARGS=""
- Enable and start the following etcd and Kubernetes services:
$ for SERVICE in etcd kube-apiserver kube-controller-manager kube-scheduler; do systemctl restart $SERVICE systemctl enable $SERVICE systemctl status $SERVICE done
- Check on the status of the above services (the following command should report 4 running services):
$ systemctl status etcd kube-apiserver kube-controller-manager kube-scheduler | grep "(running)" | wc -l # => 4
- Check on the status of the Kubernetes API server:
$ kubectl cluster-info Kubernetes master is running at http://localhost:8080 $ curl http://localhost:8080/version #~OR~ $ curl http://k8s.master.dev:8080/version
{ "major": "1", "minor": "2", "gitVersion": "v1.2.0", "gitCommit": "ec7364b6e3b155e78086018aa644057edbe196e5", "gitTreeState": "clean" }
- Get a list of Kubernetes API paths:
$ curl http://k8s.master.dev:8080/paths
{ "paths": [ "/api", "/api/v1", "/apis", "/apis/autoscaling", "/apis/autoscaling/v1", "/apis/batch", "/apis/batch/v1", "/apis/extensions", "/apis/extensions/v1beta1", "/healthz", "/healthz/ping", "/logs/", "/metrics", "/resetMetrics", "/swagger-ui/", "/swaggerapi/", "/ui/", "/version" ] }
- List all available paths (key-value stores) known to ectd:
$ etcdctl ls / --recursive
The master controller in a Kubernetes cluster must have the following services running to function as the master host in the cluster:
- ntpd
- etcd
- kube-controller-manager
- kube-apiserver
- kube-scheduler
Note: The Docker daemon should not be running on the master host.
Install and configure the minions
Note: Run the following commands/steps on all minion hosts.
- Log into the k8s minion hosts:
$ vagrant ssh minion1 # do the same for minion2 and minion3
- Edit
and add (or make changes to) the following lines:
KUBE_MASTER="--master=http://k8s.master.dev:8080" KUBE_ECTD_SERVERS="--etcd-servers=http://k8s.master.dev:2379"
- Edit
and add (or make changes to) the following lines:
### # kubernetes kubelet (minion) config # The address for the info server to serve on (set to or "" for all interfaces) KUBELET_ADDRESS="--address=" # The port for the info server to serve on KUBELET_PORT="--port=10250" # You may leave this blank to use the actual hostname KUBELET_HOSTNAME="--hostname-override=k8s.minion1.dev" # ***CHANGE TO CORRECT MINION HOSTNAME*** # location of the api-server KUBELET_API_SERVER="--api-servers=http://k8s.master.dev:8080" # pod infrastructure container #KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest" # Add your own! KUBELET_ARGS=""
- Enable and start the following services:
$ for SERVICE in kube-proxy kubelet docker; do systemctl restart $SERVICE systemctl enable $SERVICE systemctl status $SERVICE done
- Test that Docker is running and can start containers:
$ docker info $ docker pull hello-world $ docker run hello-world
Each minion in a Kubernetes cluster must have the following services running to function as a member of the cluster (i.e., a "Ready" node):
- ntpd
- kubelet
- kube-proxy
- docker
Kubectl: Exploring our environment
Note: Run all of the following commands on the master host.
- Get a list of nodes with
$ kubectl get nodes
NAME STATUS AGE k8s.minion1.dev Ready 20m k8s.minion2.dev Ready 12m k8s.minion3.dev Ready 12m
- Describe nodes with
$ kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' $ kubectl get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | tr ';' "\n"
k8s.minion1.dev:OutOfDisk=False Ready=True k8s.minion2.dev:OutOfDisk=False Ready=True k8s.minion3.dev:OutOfDisk=False Ready=True
- Get the man page for
$ man kubectl-get
Working with our Kubernetes cluster
Note: The following section will be working from within the Kubernetes cluster we created above.
Create and deploy pod definitions
- Turn off nodes 1 and 2:
minion{1,2}$ systemctl stop kubelet kube-proxy
master$ kubectl get nodes
NAME STATUS AGE k8s.minion1.dev Ready 1h k8s.minion2.dev NotReady 37m k8s.minion3.dev NotReady 39m
- Check for any k8s Pods (there should be none):
master$ kubectl get pods
- Create a builds directory for our Pods:
master$ mkdir builds && cd $_
- Create a Pod running Nginx inside a Docker container:
master$ kubectl create -f - <<EOF --- apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80 EOF
- Check on Pod creation status:
master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx 0/1 ContainerCreating 0 2s
master$ kubectl get pods
minion1$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a718c6c0355d nginx:1.7.9 "nginx -g 'daemon off" 3 minutes ago Up 3 minutes k8s_nginx.4580025_nginx_default_699e...
master$ kubectl describe pod nginx
master$ kubectl run busybox --image=busybox --restart=Never --tty -i --generator=run-pod/v1 busybox$ wget -qO- master$ kubectl delete pod busybox master$ kubectl delete pod nginx
- Port forwarding:
master$ kubectl create -f nginx.yml # see above for YAML master$ kubectl port-forward nginx :80 & I1020 23:12:29.478742 23394 portforward.go:213] Forwarding from [::1]:40065 -> 80 master$ curl -I localhost:40065
Tags, labels, and selectors
master$ cat << EOF > nginx-pod-label.yml --- apiVersion: v1 kind: Pod metadata: name: nginx labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80 EOF
master$ kubectl create -f nginx-pod-label.yml master$ kubectl get pods -l app=nginx master$ kubectl describe pods -l app=nginx2
- Add labels or overwrite existing ones:
master$ kubectl label pods nginx new-label=mynginx master$ kubectl describe pods/nginx | awk '/^Labels/{print $2}' new-label=nginx master$ kubectl label pods nginx new-label=foo master$ kubectl describe pods/nginx | awk '/^Labels/{print $2}' new-label=foo
master$ cat << EOF > nginx-deployment-dev.yml --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment-dev spec: replicas: 1 template: metadata: labels: app: nginx-deployment-dev spec: containers: - name: nginx-deployment-dev image: nginx:1.7.9 ports: - containerPort: 80 EOF
master$ cat nginx-deployment-prod.yml --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment-prod spec: replicas: 1 template: metadata: labels: app: nginx-deployment-prod spec: containers: - name: nginx-deployment-prod image: nginx:1.7.9 ports: - containerPort: 80
master$ kubectl create --validate -f nginx-deployment-dev.yml master$ kubectl create --validate -f nginx-deployment-prod.yml
master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-deployment-dev-104434401-jiiic 1/1 Running 0 5m nginx-deployment-prod-3051195443-hj9b1 1/1 Running 0 12m
master$ kubectl describe deployments -l app=nginx-deployment-dev
Name: nginx-deployment-dev Namespace: default CreationTimestamp: Thu, 20 Oct 2016 23:48:46 +0000 Labels: app=nginx-deployment-dev Selector: app=nginx-deployment-dev Replicas: 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 1 max surge OldReplicaSets: <none> NewReplicaSet: nginx-deployment-dev-2568522567 (1/1 replicas created) ...
master$ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE nginx-deployment-prod 1 1 1 1 44s
master$ cat << EOF > nginx-deployment-dev-update.yml --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment-dev spec: replicas: 1 template: metadata: labels: app: nginx-deployment-dev spec: containers: - name: nginx-deployment-dev image: nginx:1.8 # ***CHANGED*** ports: - containerPort: 80
master$ kubectl apply -f nginx-deployment-dev-update.yml master$ kubectl get pods -l app=nginx-deployment-dev
NAME READY STATUS RESTARTS AGE nginx-deployment-dev-104434401-jiiic 0/1 ContainerCreating 0 27s
master$ kubectl get pods -l app=nginx-deployment-dev
NAME READY STATUS RESTARTS AGE nginx-deployment-dev-104434401-jiiic 1/1 Running 0 6m
- Cleanup:
master$ kubectl delete deployment nginx-deployment-dev master$ kubectl delete deployment nginx-deployment-prod
Multi-Pod (container) replication controller
- Start the other two nodes (the ones we previously stopped):
minion2$ systemctl start kubelet kube-proxy minion3$ systemctl start kubelet kube-proxy master$ kubectl get nodes
NAME STATUS AGE k8s.minion1.dev Ready 2h k8s.minion2.dev Ready 2h k8s.minion3.dev Ready 2h
master$ cat << EOF > nginx-multi-node.yml --- apiVersion: v1 kind: ReplicationController metadata: name: nginx-www spec: replicas: 3 selector: app: nginx template: metadata: name: nginx labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80
master$ kubectl create -f nginx-multi-node.yml
master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-www-2evxu 0/1 ContainerCreating 0 10s nginx-www-416ct 0/1 ContainerCreating 0 10s nginx-www-ax41w 0/1 ContainerCreating 0 10s
master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-www-2evxu 1/1 Running 0 1m nginx-www-416ct 1/1 Running 0 1m nginx-www-ax41w 1/1 Running 0 1m
master$ kubectl describe pods | awk '/^Node/{print $2}'
k8s.minion2.dev/ k8s.minion1.dev/ k8s.minion3.dev/
minion1$ docker ps # 1 nginx container running minion2$ docker ps # 1 nginx container running minion3$ docker ps # 1 nginx container running minion3$ docker ps --format "{{.Image}}"
nginx gcr.io/google_containers/pause:2.0
master$ kubectl describe replicationcontroller
Name: nginx-www Namespace: default Image(s): nginx Selector: app=nginx Labels: app=nginx Replicas: 3 current / 3 desired Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed ...
- Attempt to delete one of the three pods:
master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-www-2evxu 1/1 Running 0 11m nginx-www-416ct 1/1 Running 0 11m nginx-www-ax41w 1/1 Running 0 11m
master$ kubectl delete pod nginx-www-2evxu master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-www-3cck4 1/1 Running 0 12s nginx-www-416ct 1/1 Running 0 11m nginx-www-ax41w 1/1 Running 0 11m
A new pod (nginx-www-3cck4
) automatically started up. This is because the expected state, as defined in our YAML file, is for there to be 3 pods running at all times. Thus, if one or more of the pods were to go down, a new pod (or pods) will automatically start up to bring the state back to the expected state.
- To force-delete all pods:
master$ kubectl delete replicationcontroller nginx-www master$ kubectl get pods # nothing
Create and deploy service definitions
master$ cat << EOF > nginx-service.yml --- apiVersion: v1 kind: Service metadata: name: nginx-service spec: ports: - port: 8000 targetPort: 80 protocol: TCP selector: app: nginx EOF
master$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes <none> 443/TCP 3h
master$ kubectl create -f nginx-service.yml
master$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes <none> 443/TCP 3h nginx-service <none> 8000/TCP 10s
master$ kubectl run busybox --generator=run-pod/v1 --image=busybox --restart=Never --tty -i busybox$ wget -qO- # works
- Cleanup
master$ kubectl delete pod busybox master$ kubectl delete service nginx-service master$ kubectl get pods
NAME READY STATUS RESTARTS AGE nginx-www-jh2e9 1/1 Running 0 13m nginx-www-jir2g 1/1 Running 0 13m nginx-www-w91uw 1/1 Running 0 13m
master$ kubectl delete replicationcontroller nginx-www master$ kubectl get pods # nothing
Creating temporary Pods at the CLI
- Make sure we have no Pods running:
master$ kubectl get pods
- Create temporary deployment pod:
master$ kubectl run mysample --image=foobar/apache master$ kubectl get pods
NAME READY STATUS RESTARTS AGE mysample-1424711890-fhtxb 0/1 ContainerCreating 0 1s
master$ kubectl get deployment
- Create a temporary deployment pod (where we know it will fail):
master$ kubectl run myexample --image=christophchamp/ubuntu_sysadmin master$ kubectl -o wide get pods
NAME READY STATUS RESTARTS AGE NODE myexample-3534121234-mpr35 0/1 CrashLoopBackOff 12 39m k8s.minion3.dev mysample-2812764540-74c5h 1/1 Running 0 41m k8s.minion2.dev
- Check on why the "myexample" pod is in status "CrashLoopBackOff":
master$ kubectl describe pods/myexample-3534121234-mpr35 master$ kubectl describe deployments/mysample master$ kubectl describe pods/mysample-2812764540-74c5h | awk '/^Node/{print $2}' k8s.minion2.dev/
master$ kubectl delete deployment mysample
- Run multiple replicas of the same pod:
master$ kubectl run myreplicas --image=latest123/apache --replicas=2 --labels=app=myapache,version=1.0.0 master$ kubectl describe deployment myreplicas
Name: myreplicas Namespace: default CreationTimestamp: Fri, 21 Oct 2016 19:10:30 +0000 Labels: app=myapache,version=1.0.0 Selector: app=myapache,version=1.0.0 Replicas: 2 updated | 2 total | 1 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 1 max surge OldReplicaSets: <none> NewReplicaSet: myreplicas-2209834598 (2/2 replicas created) ...
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myreplicas-2209834598-5iyer 1/1 Running 0 1m k8s.minion1.dev myreplicas-2209834598-cslst 1/1 Running 0 1m k8s.minion2.dev
master$ kubectl describe pods -l version=1.0.0
- Cleanup:
master$ kubectl delete deployment myreplicas
Interacting with Pod containers
- Create example Apache pod definition file:
master$ cat << EOF > apache.yml --- apiVersion: v1 kind: Pod metadata: name: apache spec: containers: - name: apache image: latest123/apache ports: - containerPort: 80 EOF
master$ kubectl create -f apache.yml master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE apache 1/1 Running 0 12m k8s.minion3.dev
- Test pod and make some basic configuration changes:
master$ kubectl exec apache date master$ kubectl exec mypod -i -t -- cat /var/www/html/index.html # default apache HTML master$ kubectl exec apache -i -t -- /bin/bash container$ export TERM=xterm container$ echo "xtof test" > /var/www/html/index.html minion3$ curl xtof test container$ exit
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE apache 1/1 Running 0 12m k8s.minion3.dev
Pod/container is still running even after we exited (as expected).
- Cleanup:
master$ kubectl delete pod apache
- Start our example Apache pod to use for checking Kubernetes logging features:
master$ kubectl create -f apache.yml master$ kubectl get pods
NAME READY STATUS RESTARTS AGE apache 1/1 Running 0 9s
master$ kubectl logs apache
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using Set the 'ServerName' directive globally to suppress this message
master$ kubectl logs --tail=10 apache master$ kubectl logs --since=24h apache # or 10s, 2m, etc. master$ kubectl logs -f apache # follow the logs master$ kubectl logs -f -c apache apache # where -c is the container ID
- Cleanup:
master$ kubectl delete pod apache
Autoscaling and scaling Pods
master$ kubectl run myautoscale --image=latest123/apache --port=80 --labels=app=myautoscale
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myautoscale-3243017378-kq4z7 1/1 Running 0 47s k8s.minion3.dev
- Create an autoscale definition:
master$ kubectl autoscale deployment myautoscale --min=2 --max=6 --cpu-percent=80
master$ kubectl get deployments
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myautoscale-3243017378-kq4z7 1/1 Running 0 3m k8s.minion3.dev myautoscale-3243017378-r2f3d 1/1 Running 0 4s k8s.minion2.dev
- Scale up an already autoscaled deployment:
master$ kubectl scale --current-replicas=2 --replicas=4 deployment/myautoscale
master$ kubectl get deployments
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myautoscale-3243017378-2rxhp 1/1 Running 0 8s k8s.minion1.dev myautoscale-3243017378-kq4z7 1/1 Running 0 7m k8s.minion3.dev myautoscale-3243017378-ozxs8 1/1 Running 0 8s k8s.minion3.dev myautoscale-3243017378-r2f3d 1/1 Running 0 4m k8s.minion2.dev
- Scale down:
master$ kubectl scale --current-replicas=4 --replicas=2 deployment/myautoscale
Note: You can not scale down past the original minimum number of pods/containers specified in the original autoscale deployment (i.e., min=2 in our example).
- Cleanup:
master$ kubectl delete deployment myautoscale
Failure and recovery
master$ kubectl run myrecovery --image=latest123/apache --port=80 --replicas=2 --labels=app=myrecovery master$ kubectl get deployments
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myrecovery-563119102-5xu8f 1/1 Running 0 12s k8s.minion1.dev myrecovery-563119102-zw6wp 1/1 Running 0 12s k8s.minion2.dev
- Now stop Kubernetes- and Docker-related services on one of the minions/nodes (so we have a total of 2 nodes online):
minion1$ systemctl stop docker kubelet kube-proxy
master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myrecovery-563119102-qyi04 1/1 Running 0 7m k8s.minion3.dev myrecovery-563119102-zw6wp 1/1 Running 0 14m k8s.minion2.dev
Pod switch from minion1 to minion3.
- Now stop Kubernetes- and Docker-related services on one of the remaining online minions/nodes (so we have a total of 1 node online):
minion2$ systemctl stop docker kubelet kube-proxy master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myrecovery-563119102-b5tim 1/1 Running 0 2m k8s.minion3.dev myrecovery-563119102-qyi04 1/1 Running 0 17m k8s.minion3.dev
Both Pods are now running on minion3, the only available node.
- Start up Kubernetes- and Docker-related services again on minion1 and delete on of the Pods:
minion1$ systemctl start docker kubelet kube-proxy master$ kubectl delete pod myrecovery-563119102-b5tim master$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE myrecovery-563119102-8unzg 1/1 Running 0 1m k8s.minion1.dev myrecovery-563119102-qyi04 1/1 Running 0 20m k8s.minion3.dev
Pods are now running on separate nodes.
- Cleanup:
master$ kubectl delete deployments/myrecovery
Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kubernetes cluster inside a VM on your laptop for users looking to try out Kubernetes or develop with it day-to-day.
- Install Minikube:
$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \ && chmod +x minikube && sudo mv minikube /usr/local/bin/
- Install kubectl
$ curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl \ && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
- Test install
$ minikube start $ minikube status $ minikube dashboard $ kubectl config view $ kubectl cluster-info
Get the details on the CLI options for kubectl here.
Using the `kubectl proxy`
command, kubectl will authenticate with the API Server on the Master Node and would make the dashboard available on http://localhost:8001/ui:
$ kubectl proxy Starting to serve on
After running the above command, we can access the dashboard at
Once the kubectl proxy is configured, we can send requests to localhost on the proxy port:
$ curl http://localhost:8001/ $ curl http://localhost:8001/version
{ "major": "1", "minor": "8", "gitVersion": "v1.8.0", "gitCommit": "0b9efaeb34a2fc51ff8e4d34ad9bc6375459c4a4", "gitTreeState": "clean", "buildDate": "2017-11-29T22:43:34Z", "goVersion": "go1.9.1", "compiler": "gc", "platform": "linux/amd64" }
Without kubectl proxy configured, we can get the Bearer Token using kubectl, and then send it with the API request. A Bearer Token is an access token which is generated by the authentication server (the API server on the Master Node) and given back to the client. Using that token, the client can connect back to the Kubernetes API server without providing further authentication details, and then, access resources.
- Get the k8s token:
$ TOKEN=$(kubectl describe secret $(kubectl get secrets | awk '/^default/{print $1}') | awk '/^token/{print $2}')
- Get the k8s API server endpoint:
$ APISERVER=$(kubectl config view | awk '/https/{print $2}')
- Access the API Server:
$ curl -k -H "Authorization: Bearer ${TOKEN}" ${APISERVER}
Working with our Minikube-based Kubernetes cluster
- Kubernetes Object Model
Kubernetes has a very rich object model, with which it represents different persistent entities in the Kubernetes cluster. Those entities describe:
- What containerized applications we are running and on which node
- Application resource consumption
- Different policies attached to applications, like restart/upgrade policies, fault tolerance, etc.
With each object, we declare our intent or desired state using the spec field. The Kubernetes system manages the status field for objects, in which it records the actual state of the object. At any given point in time, the Kubernetes Control Plane tries to match the object's actual state to the object's desired state.
Examples of Kubernetes objects are Pods, Deployments, ReplicaSets, etc.
To create an object, we need to provide the spec field to the Kubernetes API Server. The spec field describes the desired state, along with some basic information, like the name. The API request to create the object must have the spec field, as well as other details, in a JSON format. Most often, we provide an object's definition in a YAML file, which is converted by kubectl in a JSON payload and sent to the API Server.
Below is an example of a Deployment object:
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80
With the apiVersion field in the example above, we mention the API endpoint on the API Server which we want to connect to. Note that you can see what API version to use with the following call to the API server:
$ curl -k -H "Authorization: Bearer ${TOKEN}" ${APISERVER}/apis/apps
Use the preferredVersion for most cases.
With the kind field, we mention the object type — in our case, we have Deployment. With the metadata field, we attach the basic information to objects, like the name. You may have noticed that in above we have two spec fields (spec and spec.template.spec). With spec, we define the desired state of the deployment. In our example, we want to make sure that, at any point in time, at least 3 Pods are running, which are created using the Pod template defined in spec.template. In spec.template.spec, we define the desired state of the Pod (here, our Pod would be created using nginx:1.7.9).
Once the object is created, the Kubernetes system attaches the status field to the object.
- Connecting users to Pods
To access the application, a user/client needs to connect to the Pods. As Pods are ephemeral in nature, resources like IP addresses allocated to it cannot be static. Pods could die abruptly or be rescheduled based on existing requirements.
As an example, consider a scenario in which a user/client is connection to a Pod using its IP address. Unexpectedly, the Pot to which the user/client is connected dies, and a new Pod is created by the controller. The new Pod will have a new IP address, which will not be known automatically to the user/client of the earlier Pod. To overcome this situation, Kubernetes provides a higher-level abstraction called Service, which logically groups Pods and a policy to access them. This grouping is achieved via Labels and Selectors (see above).
So, for our example, we would use Selectors (e.g., "app==frontend
" and "app==db
") to group our Pods into two logical groups. We can assign a name to the logical grouping, referred to as a "service name". In our example, we have created two Services, frontend-svc
and db-svc
, and they have the "app==frontend
" and the "app==db
" Selectors, respectively.
The following is an example of a Service object:
kind: Service apiVersion: v1 metadata: name: frontend-svc spec: selector: app: frontend ports: - protocol: TCP port: 80 targetPort: 5000
in which we are creating a frontend-svc
Service by selecting all the Pods that have the Label "app
" set to the "frontend
". By default, each Service also gets an IP address, which is routable only inside the cluster. In our case, we have and IP addresses for our frontend-svc
and db-svc
Services, respectively. The IP address attached to each Service is also known as the ClusterIP for that Service.
+------------------------------------+ | select: app==frontend | container (app:frontend; | service=frontend-svc ( |------> container (app:frontend; +------------------------------------+ container (app:frontend; ^ / / user/client \ \ v +------------------------------------+ | select: app==db |------> container (app:db; | service=db-svc ( | +------------------------------------+
The user/client now connects to a Service via the IP address, which forwards the traffic to one of the Pods attached to it. A Service does the load balancing while selecting the Pods for forwarding the data/traffic.
While forwarding the traffic from the Service, we can select the target port on the Pod. In our example, for frontend-svc, we will receive requests from the user/client on Port 80. We will then forward these requests to one of the attached Pods on Port 5000. If the target port is not defined explicitly, then traffic will be forwarded to Pods on the Port on which the Service receives traffic.
A tuple of Pods, IP addresses, along with the targetPort is referred to as a Service Endpoint. In our case, frontend-svc has 3 Endpoints:,, and
All of the Worker Nodes run a daemon called kube-proxy, which watches the API Server on the Master Node for the addition and removal of Services and endpoints. For each new Service, on each node, kube-proxy configures the IPtables rules to capture the traffic for its ClusterIP and forwards it to one of the endpoints. When the Service is removed, kube-proxy removes the IPtables rules on all nodes as well.
Service discovery
As Services are the primary mode of communication in Kubernetes, we need a way to discover them at runtime. Kubernetes supports two methods of discovering a Service:
- Environment Variables
- As soon as the Pod starts on any Worker Node, the kubelet daemon running on that node adds a set of environment variables in the Pod for all active Services. For example, if we have an active Service called
, which exposes port 6379, and its ClusterIP is, then, on a newly created Pod, we can see the following environment variables:
With this solution, we need to be careful while ordering our Services, as the Pods will not have the environment variables set for Services which are created after the Pods are created.
- Kubernetes has an add-on for DNS, which creates a DNS record for each Service and its format is like
. Services within the same namespace can reach to other services with just their name. For example, if we add a Serviceredis-master
in themy-ns
Namespace, then all the Pods in the same Namespace can reach to the redis Service just by using its name,redis-master
. Pods from other Namespaces can reach the Service by adding the respective Namespace as a suffix, likeredis-master.my-ns
. - This is the most common and highly recommended solution. For example, in the previous section's image, we have seen that an internal DNS is configured, which maps our services
to and, respectively.
Service Type
While defining a Service, we can also choose its access scope. We can decide whether the Service:
- Is only accessible within the cluster;
- Is accessible from within the cluster and the external world; or
- Maps to an external entity which resides outside the cluster.
Access scope is decided by ServiceType, which can be mentioned when creating the Service.
ClusterIP is the default ServiceType. A Service gets its Virtual IP address using the ClusterIP. That IP address is used for communicating with the Service and is accessible only within the cluster.
With the NodePort ServiceType, in addition to creating a ClusterIP, a port from the range 30000-32767 is mapped to the respective service, from all the Worker Nodes. For example, if the mapped NodePort is 32233 for the service frontend-svc
, then, if we connect to any Worker Node on port 32233, the node would redirect all the traffic to the assigned ClusterIP (
By default, while exposing a NodePort, a random port is automatically selected by the Kubernetes Master from the port range 30000-32767. If we do not want to assign a dynamic port value for NodePort, then, while creating the Service, we can also give a port number from the earlier specific range.
The NodePort ServiceType is useful when we want to make our services accessible from the external world. The end-user connects to the Worker Nodes on the specified port, which forwards the traffic to the applications running inside the cluster. To access the application from the external world, administrators can configure a reverse proxy outside the Kubernetes cluster and map the specific endpoint to the respective port on the Worker Nodes.
With the LoadBalancer ServiceType:
- NodePort and ClusterIP Services are automatically created, and the external load balancer will route to them
- The Services are exposed at a static port on each Worker Node
- The Service is exposed externally using the underlying Cloud provider's load balancer feature.
The LoadBalancer ServiceType will only work if the underlying infrastructure supports the automatic creation of Load Balancers and have the respective support in Kubernetes, as is the case with the Google Cloud Platform and AWS.
A Service can be mapped to an ExternalIP address if it can route to one or more of the Worker Nodes. Traffic that is ingressed into the cluster with the ExternalIP (as destination IP) on the Service port, gets routed to one of the the Service endpoints. (Note that ExternalIPs are not managed by Kubernetes. The cluster administrators has configured the routing to map the ExternalIP address to one of the nodes.)
ExternalName is a special ServiceType that has no Selectors and does not define any endpoints. When accessed within the cluster, it returns a CNAME record of an externally configured service.
The primary use case of this ServiceType is to make externally configured services like my-database.example.com
available inside the cluster, using just the name, like my-database
, to other services inside the same Namespace.
External links
- Official website
- Kubernets code — via GitHub