Difference between revisions of "Rancher"
(→Rancher State File) |
(→Miscellaneous) |
||
Line 984: | Line 984: | ||
--set 'extraEnv[0].name=CATTLE_SYSTEM_DEFAULT_REGISTRY' \ | --set 'extraEnv[0].name=CATTLE_SYSTEM_DEFAULT_REGISTRY' \ | ||
--set 'extraEnv[0].value=<nowiki>http://private-registry.example.com/</nowiki>' | --set 'extraEnv[0].value=<nowiki>http://private-registry.example.com/</nowiki>' | ||
+ | |||
+ | * Get the randomly generated password the Rancher [[Terraform]] provider stores in the TF state file: | ||
+ | <pre> | ||
+ | jq -crM '.resources[] | select(.provider == "module.rancher.provider.rancher2.bootstrap") | {instances: .instances[]|.attributes.current_password} | .[]' terraform.tfstate | ||
+ | </pre> | ||
===Rancher State File=== | ===Rancher State File=== | ||
Line 1,016: | Line 1,021: | ||
python -c 'import sys,json,yaml;data=json.loads(sys.stdin.read());print(yaml.dump(yaml.load(json.dumps(data)),default_flow_style=False))' \ | python -c 'import sys,json,yaml;data=json.loads(sys.stdin.read());print(yaml.dump(yaml.load(json.dumps(data)),default_flow_style=False))' \ | ||
> rancher-cluster.rkestate_bkup_$(date +%f) 2>/dev/null | > rancher-cluster.rkestate_bkup_$(date +%f) 2>/dev/null | ||
− | |||
− | |||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
Latest revision as of 22:30, 20 August 2020
Rancher is a container management platform.
- Rancher 1.6 natively supports and manages all of your Cattle, Kubernetes, Mesos, and Swarm clusters. Note: Rancher 1.6 has been deprecated.
- Rancher 2.x is Kubernetes-as-a-Service.
Contents
Container management
- App Catalog
- Orchestration: Compose, Kubernetes, Marathon, etc.
- Scheduling: Swarm, Kubernetes, Mesos, etc.
- Monitoring: cAdvisor, Sysdig, Datadog, etc.
- Access Control: LDAP, AD, GitHub, etc.
- Registry: DockerHub, Quay.io, etc.
- Engine: Docker, Rkt, etc.
- Security: Notary, Vault, etc.
- Network: VXLAN, IPSEC, HAProxy, etc.
- Storage: Ceph, Gluster, Swift, etc.
- Distributed DB: Etcd, Consul, MongoDB, etc.
Setup Rancher HA with AWS
For my Rancher HA with AWS setup, I will use the following:
Virtual Private Cloud (VPC)
- Virtual Private Cloud (VPC): rancher-vpc (w/3 subnets)
- VPC CIDR: 172.22.0.0/16
- Rancher management subnet: 172.22.1.0/24 (us-west-2a)
Rancher management server nodes (EC2 instances)
- Rancher management server nodes (EC2 instances running CentOS 7):
- mgmt-host-1 (172.22.1.210)
- mgmt-host-2 (172.22.1.211)
- mgmt-host-3 (172.22.1.212)
Each of the Rancher management server nodes (referred to as "server nodes" from now on) will have Docker 1.10.3 installed and running.
Each of the server nodes will have the following security group inbound rules:
Security group inbound rules | ||||
---|---|---|---|---|
Type | Protocol | Port | Source | Purpose |
SSH | TCP | 22 | 0.0.0.0/0 | ssh |
HTTP | TCP | 80 | 0.0.0.0/0 | http |
HTTPS | TCP | 443 | 0.0.0.0/0 | https |
TCP | TCP | 81 | 0.0.0.0/0 | proxy_to_http |
TCP | TCP | 444 | 0.0.0.0/0 | proxy_to_https |
TCP | TCP | 6379 | 172.22.1.0/24 | redis |
TCP | TCP | 2376 | 172.22.1.0/24 | swarm |
TCP | TCP | 2181 | 0.0.0.0/0 | zookeeper_client |
TCP | TCP | 2888 | 172.22.1.0/24 | zookeeper_quorum |
TCP | TCP | 3888 | 172.22.1.0/24 | zookeeper_leader |
TCP | TCP | 3306 | 172.22.1.0/24 | mysql (RDS) |
TCP | TCP | 8080 | 0.0.0.0/0 | |
TCP | TCP | 18080 | 0.0.0.0/0 | <optional> |
UDP | UDP | 500 | 172.22.1.0/24 | access between nodes |
UDP | UDP | 4500 | 172.22.1.0/24 | access between nodes |
External database (RDS)
The external database (DB) will be running on an AWS Relational Database Service (RDS) and we shall call this RDS: "rancher-ext-db" and it will be listening on port 3306 on 172.22.1.26 and be in VPC "rancher-vpc". The RDS will be running MariaDB 10.0.24.
External load balancer (ELB)
The external load balancer (LB) will be running on an AWS Elastic Load Balancer (ELB) and we shall call this ELB: "rancher-ext-lb". It will be in VPC "rancher-vpc" and it will have the following listeners configured:
ELB listeners | |||||
---|---|---|---|---|---|
Load Balancer Protocol | Load Balancer Port | Instance Protocol | Instance Port | Cipher | SSL Certificate |
TCP | 80 | TCP | 81 | N/A | N/A |
TCP | 443 | TCP | 444 | N/A | N/A |
HTTP | 8080 | HTTP | 8080 | N/A | N/A |
- Create ELB policies:
$ AWS_PROFILE=dev $ LB_NAME=rancher-ext-lb $ POLICY_NAME=rancher-ext-lb-ProxyProtocol-policy $ aws --profile ${AWS_PROFILE} elb create-load-balancer-policy \ --load-balancer-name ${LB_NAME} \ --policy-name ${POLICY_NAME} \ --policy-type-name ProxyProtocolPolicyType \ --policy-attributes AttributeName=ProxyProtocol,AttributeValue=true $ aws --profile ${AWS_PROFILE} elb set-load-balancer-policies-for-backend-server \ --load-balancer-name ${LB_NAME} \ --instance-port 81 \ --policy-names ${POLICY_NAME} $ aws --profile ${AWS_PROFILE} elb set-load-balancer-policies-for-backend-server \ --load-balancer-name ${LB_NAME} \ --instance-port 444 \ --policy-names ${POLICY_NAME}
Rancher HA management stack
A fully functioning Rancher HA setup will have the following Docker containers running:
Rancher management stack | |||||
---|---|---|---|---|---|
Service | Containers | IPs | Traffic to | Portsa | Traffic flow |
6 x cattle | |||||
rancher-ha-parent (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper, redis | 3306/tcp 0.0.0.0:18080->8080/tcp 0.0.0.0:2181->12181/tcp 0.0.0.0:2888->12888/tcp 0.0.0.0:3888->13888/tcp 0.0.0.0:6379->16379/tcp | ||
rancher-ha-cattle (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper, redis | |||
2 x go-machine-service | |||||
management_go-machine-service_{1,2} | 172.22.1.210, 172.22.1.211 | cattle | 3306, 8080 | ||
3 x load-balancer | |||||
management_load-balancer_{1,2,3} | 172.22.1.210, 172.22.1.211, 172.22.1.212 | websocket-proxy, cattle | 80, 443, 81, 444 | 0.0.0.0:80-81->80-81/tcp 0.0.0.0:443-444->443-444/tcp | |
3 x load-balancer-swarm | |||||
management_load-blancer-swarm_{1,2,3} | 172.22.1.210, 172.22.1.211, 172.22.1.212 | websocket-proxy-ssl | 2376 | 0.0.0.0:2376->2376/tcp | |
2 x rancher-compose-executor | |||||
management_rancher-compose-executor_{1,2} | 172.22.1.211, 172.22.1.212 | cattle | |||
3 x redis | |||||
rancher-ha-redis | 172.22.1.210, 172.22.1.211, 172.22.1.212 | tunnel | |||
36 x tunnel | |||||
rancher-ha-tunnel-redis-1 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | redis | 6379 | 0.0.0.0:16379->127.0.0.1:6379/tcp | |
rancher-ha-tunnel-redis-2 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | redis | 6379 | 127.0.0.1:6380->172.22.1.211:6379/tcp | |
rancher-ha-tunnel-redis-3 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | redis | 6379 | 127.0.0.1:6381->172.22.1.212:6379/tcp | |
rancher-ha-tunnel-zk-client-1 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2181 | 0.0.0.0:12181->127.0.0.1:2181/tcp | |
rancher-ha-tunnel-zk-client-2 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2181 | 127.0.0.1:2182->172.22.1.211:2181/tcp | |
rancher-ha-tunnel-zk-client-3 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2181 | 127.0.0.1:2183->172.22.1.212:2181/tcp | |
rancher-ha-tunnel-zk-leader-1 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 3888 | 0.0.0.0:13888->127.0.0.1:3888/tcp | |
rancher-ha-tunnel-zk-leader-2 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 3888 | 127.0.0.1:3889->172.22.1.211:3888/tcp | |
rancher-ha-tunnel-zk-leader-3 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 3888 | 127.0.0.1:3890->172.22.1.212:3888/tcp | |
rancher-ha-tunnel-zk-quorum-1 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2888 | 0.0.0.0:12888->127.0.0.1:2888/tcp | |
rancher-ha-tunnel-zk-quorum-2 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2888 | 127.0.0.1:2889->172.22.1.211:2888/tcp | |
rancher-ha-tunnel-zk-quorum-3 (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | zookeeper | 2888 | 127.0.0.1:2890->172.22.1.212:2888/tcp | |
2 x websocket-proxy | |||||
management_websocket-proxy_{1,2} | 172.22.1.210, 172.22.1.212 | cattle | |||
2 x websocket-proxy-ssl | |||||
management_websocket-proxy-ssl_{1,2} | 172.22.1.210, 172.22.1.211 | cattle | |||
3 x zookeeper | |||||
rancher-ha-zk | 172.22.1.210, 172.22.1.211, 172.22.1.212 | tunnel | |||
3 x rancher-ha (cluster-manager) | |||||
rancher-ha (x3) | 172.22.1.210, 172.22.1.211, 172.22.1.212 | host | 80, 18080, 3306 | 172.22.1.x:x->172.22.1.26:3306 | |
3 x NetworkAgent | |||||
NetworkAgent | 172.22.1.210, 172.22.1.211, 172.22.1.212 | all | 500/udp, 4500/udp | 0.0.0.0:500->500/udp 0.0.0.0:4500->4500/udp |
a TCP, unless otherwise specified.
Setup Rancher HA on bare-metal
This section will show you how to setup Rancher in High Availability (HA) mode on bare-metal servers. We will also setup a Kubernetes cluster managed by Rancher.
Since a given version of Rancher requires specific versions of Docker and Kubernetes, we will use the following:
- Hardware: 4 x bare-metal servers (rack-mounted):
- rancher01.dev # Rancher HA Master #1 + Worker Node #1
- rancher02.dev # Rancher HA Master #2 + Worker Node #2
- rancher03.dev # Rancher HA Master #3 + Worker Node #3
- rancher04.dev # Worker Node #4
- OS and software:
- CentOS 7.4
- Rancher 1.6
- Docker 17.03.x-ce
- Kubernetes 1.8
Install and configure Docker
Note: Perform all of the actions in this section on all 4 bare-metal servers.
- Install Docker 17.03 (CE):
$ sudo yum update -y $ curl https://releases.rancher.com/install-docker/17.03.sh | sudo sh $ sudo systemctl enable docker $ sudo usermod -aG docker $(whoami) # logout and then log back in
- Check that Docker has been successfully installed:
$ docker --version Docker version 17.03.2-ce, build f5ec1e2 $ docker run hello-world ... This message shows that your installation appears to be working correctly. ...
- Cleanup unused containers:
$ docker rm $(docker ps -a -q)
- Prevent Docker from being upgraded (i.e., lock it to always use Docker 17.03):
$ sudo yum -y install yum-versionlock $ sudo yum versionlock add docker-ce docker-ce-selinux $ yum versionlock list Loaded plugins: fastestmirror, versionlock 0:docker-ce-17.03.2.ce-1.el7.centos.* 0:docker-ce-selinux-17.03.2.ce-1.el7.centos.*
Note: If you ever need to remove this version lock, you can run `sudo yum versionlock delete docker-ce-*`
.
Install and configure Network Time Protocol (NTP)
- see Network Time Protocol for details.
Note: Perform all of the actions in this section on all 4 bare-metal servers.
- Install NTP:
$ sudo yum install -y ntp $ sudo systemctl start ntpd && sudo systemctl enable ntpd
- Configure NTP (note: add the closest NTP pool of servers to your bare-metal server's location) by editing
/etc/ntp.conf
and add/update the following lines:
$ sudo vi /etc/ntp.conf restrict default nomodify notrap nopeer noquery kod limited #... server 0.north-america.pool.ntp.org iburst server 1.north-america.pool.ntp.org iburst server 2.north-america.pool.ntp.org iburst server 3.north-america.pool.ntp.org iburst
- Restart NTP and check status:
$ sudo systemctl restart ntpd $ ntpq -p # list NTP pools stats $ ntpdc -l # list NTP clients
Install and configure external database
Note: Perform all of the actions in this section on rancher04.dev
(i.e., Worker Node #4) only. I will use MariaDB 5.5.x.
- Install MariaDB Server:
$ sudo yum install -y mariadb-server $ sudo systemctl start mariadb && sudo systemctl enable mariadb
- Configure MariaDB Server:
$ sudo mysql_secure_installation # Follow the recommendations
- Edit
/etc/my.cnf
and add the following under the[mysqld]
section:
max_allowed_packet=16M
- Restart MariaDB Server:
$ sudo systemctl restart mariadb
- Log into MariaDB Server and create database and user for Rancher:
$ mysql -u root -p mysql> CREATE DATABASE IF NOT EXISTS <DB_NAME> COLLATE = 'utf8_general_ci' CHARACTER SET = 'utf8'; mysql> GRANT ALL ON <DB_NAME>.* TO '<DB_USER>'@'%' IDENTIFIED BY '<DB_PASSWD>'; mysql> GRANT ALL ON <DB_NAME>.* TO '<DB_USER>'@'localhost' IDENTIFIED BY '<DB_PASSWD>';
Replace <DB_NAME>
, <DB_USER>
, and <DB_PASSWD>
with values of your choice.
Install and configure Rancher HA Master nodes
Note: Perform all of the actions in this section on all 3 x Rancher HA Master servers (do not perform any of these actions on rancher04.dev
).
- Make sure all of your Rancher HA Master servers have the following ports opened between themselves:
9345 8080
- Make sure all of your Rancher HA Master servers can reach port
3306
on the server where MariaDB Server is running (i.e.,rancher04.dev
).
- Start Rancher on all three Rancher HA Master servers:
$ HOST_IP=$(ip addr show eth0 | awk '/inet /{print $2}' | cut -d'/' -f1) $ DB_HOST=10.x.x.x # <- replace with the private IP address of the host where MariaDB is running $ DB_PORT=3306 $ DB_NAME=<DB_NAME> # <- replace with actual value $ DB_USER=<DB_USER> # <- replace with actual value $ DB_PASSWD=<DB_PASSWD> # <- replace with actual value $ docker run -d --restart=unless-stopped -p 8080:8080 -p 9345:9345 rancher/server \ --db-host ${DB_HOST} --db-port ${DB_PORT} --db-user ${DB_USER} --db-pass ${DB_PASSWD} --db-name ${DB_NAME} \ --advertise-address ${HOST_IP}
- Check the logs for the container started by the above command:
$ docker logs -f <container_id>
Once you see the following message:
msg="Listening on :8090"
Rancher should be setup (in HA mode). You should now be able to bring up the Rancher UI by using the public IP of any one of your Rancher HA Master nodes in your browser with port 8080
(e.g., http://1.2.3.4:8080</code>).
Setup Nginx reverse proxy to act as a Load Balancer
Since we have 3 x Master Rancher nodes for our High Availability (HA) setup, we want to have some kind of load balancer (LB) to act as a single point of entry to the Rancher UI. We have various options available: 1) Use a hardware LB, use some external software LB, use an external Cloud-based LB (e.g., AWS ELB), or we could setup a simple Nginx reverse proxy residing on one of our bare-metal servers. Since we are already using the non-Master node (i.e., rancher04.dev
) as our "external database", we can also use it for our Nginx reverse proxy. Note that this is not something you want to do in production. However, since we are just setting up a Proof-of-Concept (POC) and we are limited to only using these 4 bare-metal servers for our entire setup, using the very light-weight Nginx reverse proxy as our "external load balancer" will do the job just fine.
Note: All of the actions performed in this section will be done on rancher04.dev
only.
- Install Nginx:
$ sudo yum install -y epel-release $ sudo yum install -y nginx
- Update the
/etc/nginx/nginx.conf
file to look like the following:
user nginx; worker_processes auto; error_log /var/log/nginx/error.log; pid /run/nginx.pid; include /usr/share/nginx/modules/*.conf; events { worker_connections 1024; } http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; include /etc/nginx/mime.types; default_type application/octet-stream; include /etc/nginx/conf.d/*.conf; }
- Create the reverse proxy with:
$ cat << EOF >/etc/nginx/conf.d/rancher.conf upstream rancher_ui { # Replace with actual _private_ IPs of your Rancher Master nodes server x.x.x.x:8080; server y.y.y.y:8080; server z.z.z.z:8080; } server { listen 80 default_server; listen [::]:80; server_name _; #index index.html index.htm; access_log /var/log/nginx/rancher.log; error_log /var/log/nginx/rancher.err; location / { proxy_pass http://rancher_ui; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Port $server_port; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_redirect default; proxy_cache off; } } EOF
- Start and enable Nginx:
$ sudo systemctl start nginx && sudo systemctl enable nginx
- Make sure Nginx has started successfully:
$ sudo systemctl status nginx
- Tail (with follow) the Nginx Rancher error log:
$ sudo tail -f /var/log/nginx/rancher.err
Open a browser and, using the public IP address of the rancher04.dev
server.
If you get a "502 Bad Gateway" and/or if see an error in the Nginx Rancher error log that looks something like the following:
failed (13: Permission denied) while connecting to upstream
you probably have SELinux set to "enforcing" mode. You can fix this by one of the following methods:
$ sudo setenforce 0 # This changes SELinux to "permissive" mode, but not a good idea for production #~OR~ $ sudo setsebool -P httpd_can_network_connect 1
Now, put the public IP address of rancher04.dev
into your browser and you should see the Rancher UI.
Install and configure Rancher Worker nodes
Note: Perform all of the actions in this section on all 4 x bare-metal servers.
We will now add our Worker Nodes. Since the Master nodes will also be acting as Worker nodes and the 4th node (rancher04.dev
) is just a Worker node, we need to do the following on all 4 servers.
Follow the instructions for adding a host in the Rancher UI. After working through the steps in the Rancher UI, it should provide you with a Docker command you should run on a given host, which looks something like the following:
$ HOST_IP=x.x.x.x $ sudo docker run -e CATTLE_AGENT_IP="${HOST_IP}" \ --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock \ -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.10 \ http://z.z.z.z/v1/scripts/xxxx:yyyy:zzzz
Setup Rancher 2.0 HA in AWS
This section will show how to install Rancher 2.0 HA by using self-signed certificates (+intermediate) and a Layer 4 Load balancer (TCP).
- IMPORTANT
- The following is for Rancher 2.0.6. Rancher 2.0.7 has changed the way you accomplish this. I will update this section with the new way when I find the time.
Requirements
- Linux OS (Ubuntu 16.04.5 LTS; 3 x AWS EC2 instances)
- Docker 17.03.2-ce (Storage Driver: aufs; Cgroup Driver: cgroupfs)
- OS Binaries
- RKE (https://github.com/rancher/rke/releases/latest)
- kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl/)
Security group for EC2 instances
|Protocol|Port |Source | |--------|-----:|------------:| | TCP | 22| 0.0.0.0/0 | | TCP | 80| 0.0.0.0/0 | | TCP | 443| 0.0.0.0/0 | | TCP | 6443| 0.0.0.0/0 | | TCP | 2376| sg-xxxxxxxx | | TCP | 2379| sg-xxxxxxxx | | TCP | 2380| sg-xxxxxxxx | | TCP | 10250| sg-xxxxxxxx | | TCP | 10251| sg-xxxxxxxx | | TCP | 10252| sg-xxxxxxxx | | UDP | 8472| 0.0.0.0/0 | | ICMP | All | sg-xxxxxxxx |
Variables
Variables used in this guide:
- FQDN:
rancher.yourdomain.com
- Node 1 IP:
10.10.0.167
- Node 2 IP:
10.10.1.90
- Node 3 IP:
10.10.2.61
Create self signed certificates
Follow this guide.
Configure the RKE template
This guide is based on the 3-node-certificate.yml
template, which is used for self signed certificates and using a Layer 4 Loadbalancer (TCP).
- Download the Rancher cluster config template:
$ wget -O /root/3-node-certificate.yml https://raw.githubusercontent.com/rancher/rancher/master/rke-templates/3-node-certificate.yml
- Edit the values (FQDN, BASE64_CRT, BASE64_KEY, BASE64_CA)
This command will replace the values/variables needed:
$ sed -i -e "s/<FQDN>/rancher.yourdomain.com/" \ -e "s/<BASE64_CRT>/$(cat /root/ca/rancher/base64/cert.base64)/" \ -e "s/<BASE64_KEY>/$(cat /root/ca/rancher/base64/key.base64)/" \ -e "s/<BASE64_CA>/$(cat /root/ca/rancher/base64/cacerts.base64)/" \ /root/3-node-certificate.yml
- Validate that the FQDN is replaced correctly:
$ cat 3-node-certificate.yml | grep rancher.yourdomain.com
- Configure nodes
At the top of the 3-node-certificate.yml
file, configure your nodes that will be used for the cluster.
Example:
nodes: - address: 10.10.0.167 # hostname or IP to access nodes user: ubuntu # root user (usually 'root') role: [controlplane,etcd,worker] # K8s roles for node ssh_key_path: /home/ubuntu/.ssh/rancher-ssh-key # path to PEM file - address: 10.10.1.90 user: ubuntu role: [controlplane,etcd,worker] ssh_key_path: /home/ubuntu/.ssh/rancher-ssh-key - address: 10.10.2.61 user: ubuntu role: [controlplane,etcd,worker] ssh_key_path: /home/ubuntu/.ssh/rancher-ssh-key
Run RKE to setup the cluster
Run RKE to setup the cluster (run from only one of the 3 x EC2 instances that will host the Rancher 2.0 HA setup):
$ ./rke_linux-amd64 up --config 3-node-certificate.yml
Which should finish with the following to indicate that it is successfull:
INFO[0XXX] Finished building Kubernetes cluster successfully
Validate cluster
- Nodes
All nodes should be in Ready
status (it can take a few minutes before they get Ready):
$ kubectl --kubeconfig kube_config_3-node-certificate.yml get nodes NAME STATUS ROLES AGE VERSION 10.10.0.167 Ready controlplane,etcd,worker 5d v1.10.5 10.10.1.90 Ready controlplane,etcd,worker 5d v1.10.5 10.10.2.61 Ready controlplane,etcd,worker 5d v1.10.5
- Pods
All pods must be in Running
status, and names containing job
should be in Completed
status:
$ kubectl --kubeconfig kube_config_3-node-certificate.yml get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE cattle-system cattle-859b6cdc6b-vrlp9 1/1 Running 0 17h default alpine-hrqx8 1/1 Running 0 17h default alpine-vghs6 1/1 Running 0 17h default alpine-wtxjl 1/1 Running 0 17h ingress-nginx default-http-backend-564b9b6c5b-25c7h 1/1 Running 0 17h ingress-nginx nginx-ingress-controller-2jcqx 1/1 Running 0 17h ingress-nginx nginx-ingress-controller-2mqkj 1/1 Running 0 17h ingress-nginx nginx-ingress-controller-nftl9 1/1 Running 0 17h kube-system canal-7mhn9 3/3 Running 0 17h kube-system canal-hkkhm 3/3 Running 0 17h kube-system canal-hms2n 3/3 Running 0 17h kube-system kube-dns-5ccb66df65-6nm78 3/3 Running 0 17h kube-system kube-dns-autoscaler-6c4b786f5-bjp5m 1/1 Running 0 17h kube-system rke-ingress-controller-deploy-job-dzf8t 0/1 Completed 0 17h kube-system rke-kubedns-addon-deploy-job-fh288 0/1 Completed 0 17h kube-system rke-network-plugin-deploy-job-ltdfj 0/1 Completed 0 17h kube-system rke-user-addon-deploy-job-5wgdb 0/1 Completed 0 17h
- Ingress created
The created Ingress should match your FQDN:
$ kubectl --kubeconfig kube_config_3-node-certificate.yml get ingress -n cattle-system NAME HOSTS ADDRESS PORTS AGE cattle-ingress-http rancher.yourdomain.com 10.10.0.167,10.10.1.90,10.10.2.61 80, 443 17h
Overlay network
To test the overlay network:
$ cat << EOF >ds-alpine.yml apiVersion: apps/v1 kind: DaemonSet metadata: name: alpine spec: selector: matchLabels: name: alpine template: metadata: labels: name: alpine spec: tolerations: - effect: NoExecute key: "node-role.kubernetes.io/etcd" value: "true" - effect: NoSchedule key: "node-role.kubernetes.io/controlplane" value: "true" containers: - image: alpine imagePullPolicy: Always name: alpine command: ["sh", "-c", "tail -f /dev/null"] terminationMessagePath: /dev/termination-log
Run the following commands:
$ kubectl --kubeconfig kube_config_3-node-certificate.yml create -f ds-alpine.yml $ kubectl --kubeconfig kube_config_3-node-certificate.yml rollout status ds/alpine -w
Wait until it returns: daemon set "alpine" successfully rolled out.
Check that these alpine
Pods are running:
$ kubectl --kubeconfig kube_config_3-node-certificate.yml get pods -l name=alpine NAME READY STATUS RESTARTS AGE alpine-hrqx8 1/1 Running 0 17h alpine-vghs6 1/1 Running 0 17h alpine-wtxjl 1/1 Running 0 17h
Then execute the following script to test network connectivity:
echo "=> Start" kubectl --kubeconfig kube_config_3-node-certificate.yml get pods -l name=alpine \ -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{"\n"}{end}' | \ while read spod shost; do kubectl --kubeconfig kube_config_3-node-certificate.yml get pods -l name=alpine \ -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' | \ while read tip thost; do kubectl --kubeconfig kube_config_3-node-certificate.yml \ --request-timeout='10s' exec $spod -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1" RC=$? if [ $RC -ne 0 ]; then echo $shost cannot reach $thost fi done done echo "=> End"
If you see the following:
=> Start command terminated with exit code 1 10.10.1.90 cannot reach 10.10.0.167 command terminated with exit code 1 10.10.1.90 cannot reach 10.10.2.61 command terminated with exit code 1 10.10.0.167 cannot reach 10.10.1.90 command terminated with exit code 1 10.10.0.167 cannot reach 10.10.2.61 command terminated with exit code 1 10.10.2.61 cannot reach 10.10.1.90 command terminated with exit code 1 10.10.2.61 cannot reach 10.10.0.167 => End
Something is mis-configured (see the Troubleshooting section below for tips).
However, if all you see is:
=> Start => End
All is good!
Troubleshooting
First, make sure ICMP is allowed between the EC2 instances (on their private network):
10.10.0.167> ping -c 2 10.10.1.90
They should all be able to reach each other.
However, what the above alpine
test Pods are trying to do is reach each other via the Flannel network:
# From within each Pod, ping the Flannel net of the other Pods: alpine-hrqx8 10.10.1.90 => ping -c2 10.42.0.3 alpine-vghs6 10.10.0.167 => ping -c2 10.42.2.4 alpine-wtxjl 10.10.2.61 => ping -c2 10.42.1.3 ubuntu@ip-10-10-0-167:~$ ip addr show flannel.1 28: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default link/ether 12:77:ec:05:a4:cd brd ff:ff:ff:ff:ff:ff inet 10.42.2.0/32 scope global flannel.1
Should those Flannel IPs be pingable?
Validate Rancher
Run the following to validate the accessibility to Rancher:
- Validate certificates
To validate the certificates:
$ sudo openssl s_client \ -CAfile /root/ca/rancher/cacerts.pem \ -connect 10.10.0.167:443 \ -servername rancher.yourdomain.com
This should result in the following, indicating the chain is correct. You can repeat this for the other hosts (10.10.1.90
and 10.10.2.61
):
Start Time: 1533924359 Timeout : 300 (sec) Verify return code: 0 (ok)
- Validate connection
Use the following command to see if you can reach the Rancher server:
$ sudo curl --cacert /root/ca/rancher/cacerts.pem \ --resolve rancher.yourdomain.com:443:10.10.0.167 \ https://rancher.yourdomain.com/ping
Response should be pong
. It is.
Set up load balancer
If this is all functioning correctly, you can put a load balancer in front.
I am using a "classic" load balancer (ELB; layer 4) in AWS.
- Security group for ELB
|Protocol|Port |Source | |--------|-----:|-------------| | TCP | 80| 0.0.0.0/0 | | TCP | 443| 0.0.0.0/0 |
- ELB Listeners
| Load Balancer Protocol | Load Balancer Port | Instance Protocol | Instance Port | SSL Certificate | |------------------------|-------------------:|-------------------|--------------:|-----------------| | TCP | 80 | TCP | 80 | N/A | | TCP | 443 | TCP | 443 | N/A |
$ aws elb describe-load-balancers \ --region us-west-2 \ --load-balancer-names rancher-elb-dev | \ jq '.LoadBalancerDescriptions[] | .ListenerDescriptions[].Listener' { "InstancePort": 80, "LoadBalancerPort": 80, "Protocol": "TCP", "InstanceProtocol": "TCP" } { "InstancePort": 443, "LoadBalancerPort": 443, "Protocol": "TCP", "InstanceProtocol": "TCP" }
From one of the EC2 hosts, run:
$ curl -IkL $(curl -s http://169.254.169.254/latest/meta-data/public-ipv4) HTTP/1.1 404 Not Found Server: nginx/1.13.8 $ curl -IkL https://$(curl -s http://169.254.169.254/latest/meta-data/public-ipv4) HTTP/1.1 404 Not Found Server: nginx/1.13.8
That 404 Not Found
is expected. If you see 504 Gateway Time-out
instead, something is mis-configured.
See if you can reach the public domain:
$ curl -IkL https://rancher.yourdomain.com HTTP/2 200 server: nginx/1.13.8
Works!
Try passing it the CA certificates:
$ sudo curl -IkL --cacert /root/ca/rancher/cacerts.pem https://rancher.yourdomain.com HTTP/1.1 200 OK Server: nginx/1.13.8
Works!
$ kubectl --kubeconfig kube_config_3-node-certificate.yml logs -l app=ingress-nginx -n ingress-nginx
Should show the above 200 OK
messages.
Your Rancher 2.0 HA cluster is now ready to start using.
Encryption at rest
This section will show how to setup encryption-at-rest for Kubernetes Secrets.
- Before encryption
- Check if the Kubernetes API Server (aka "
kube-apiserver
") is already using encryption at rest:
$ ps aux | \grep [k]ube-apiserver | tr ' ' '\n' | grep encryption
If the above command returns nothing, it is not.
- Create a test Secret:
$ cat << EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: secret-before namespace: default data: foo: $(echo bar | base64) EOF
$ kubectl get secret secret-before -o yaml | grep -A1 ^data data: foo: YmFyCg== $ echo "YmFyCg==" | base64 bar
- Check what ETCD knows about your Secret:
$ docker exec -it etcd /bin/sh
/ # ETCDCTL_API=3 etcdctl get /registry/secrets/default/secret-before -w json {"header":{"cluster_id":10635379586599678010,"member_id":17312218561102223823,"revision":13935321,"raft_term":5},"kvs":[{"key":"L3JlZ2lzdHJ5L3NlY3JldHMvZGVmYXVsdC9zZWNyZXQtYmVmb3Jl","create_revision":13935228,"mod_revision":13935228,"version":1, "value":"azhzAAoMCgJ2MRIGU2VjcmV0EqsCCpMCCg1zZWNyZXQtYmVmb3JlEgAaB2RlZmF1bHQiACokNWJhYTZhM2QtYjU3Yi0xMWU5LWFiOGYtMDJmOGZjNjdmNGI4MgA4AEIICPeHk+oFEABivgEKMGt1YmVjdGwua3ViZXJuZXRlcy5pby9sYXN0LWFwcGxpZWQtY29uZmlndXJhdGlvbhKJAXsiYXBpVmVyc2lvbiI6InYxIiwiZGF0YSI6eyJmb28iOiJZbUZ5Q2c9PSJ9LCJraW5kIjoiU2VjcmV0IiwibWV0YWRhdGEiOnsiYW5ub3RhdGlvbnMiOnt9LCJuYW1lIjoic2VjcmV0LWJlZm9yZSIsIm5hbWVzcGFjZSI6ImRlZmF1bHQifX0KegASCwoDZm9vEgRiYXIKGgZPcGFxdWUaACIA"}],"count":1} / # echo "azhzAAoM..." | base64 -d #<- show the mangled contents # Better way: / # ETCDCTL_API=3 etcdctl get /registry/secrets/default/secret-before -w fields | grep Value "Value" : "k8s\x00\n\f\n\x02v1\x12\x06Secret\x12\xab\x02\n\x93\x02\n\rsecret-before\x12\x00\x1a\adefault\"\x00*$5baa6a3d-b57b-11e9-ab8f-02f8fc67f4b82\x008\x00B\b\b\xf7\x87\x93\xea\x05\x10\x00b\xbe\x01\n0kubectl.kubernetes.io/last-applied-configuration\x12\x89\x01{\"apiVersion\":\"v1\",\"data\":{\"foo\":\"YmFyCg==\"},\"kind\":\"Secret\",\"metadata\":{\"annotations\":{},\"name\":\"secret-before\",\"namespace\":\"default\"}}\nz\x00\x12\v\n\x03foo\x12\x04bar\n\x1a\x06Opaque\x1a\x00\"\x00" / # echo "YmFyCg==" | base64 -d bar
As you can see, anyone with access to your ETCD cluster (aka your distributed key-value store) can easily view your Secrets. That is, your Kubernetes cluster is not using encryption-at-rest for your Secrets.
- After encryption
- Run the following command on all of your Rancher HA master nodes (aka the "controller" nodes):
$ sudo tee /etc/kubernetes/encryption.yaml << EOF apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: $(head -c 32 /dev/urandom | base64 -i -) - identity: {} EOF
$ sudo chown root:root /etc/kubernetes/encryption.yaml $ sudo chmod 0600 /etc/kubernetes/encryption.yaml
- Edit the
rancher-cluster.yaml
configuration file and add the following to theservices
section:
services: kube-api: extra_args: encryption-provider-config: /etc/kubernetes/encryption.yaml
- Restart the Rancher HA cluster:
$ rke up --config rancher-cluster.yaml
- Now, check that the Kubernetes API Server is using encryption-at-rest:
$ ps aux | \grep [k]ube-apiserver | tr ' ' '\n' | grep encryption --encryption-provider-config=/etc/kubernetes/encryption.yaml
Looks good!
- Create a new Secret:
$ cat << EOF | kubectl apply -f - apiVersion: v1 kind: Secret metadata: name: secret-after namespace: default data: foo: $(echo bar | base64) EOF
$ kubectl get secret secret-after -o yaml | grep -A1 ^data data: foo: YmFyCg== $ echo "YmFyCg==" | base64 bar
As you can see, you are still able to access, view, and work with Kubernetes Secrets.
However, let's check if we can still view/decode the above Secret from within ETCD:
$ docker exec -it etcd /bin/sh
/ # ETCDCTL_API=3 etcdctl get /registry/secrets/default/secret-after -w fields | grep Value "Value" : "k8s:enc:aescbc:v1:key1:\xd5\xef8\xd8Qnhq>\xb7\xf26m5\x9c6\xb0\xe3\xd2uC\xf3H\xfc\x95e\xb6\x03j\xadl\x9a\xdc\xefF\x04\xa0F\xfc\xa2\xe0\xe6\x89\xcfa\xfc?x\xe1\xa2\xe5\xbd\\:d\xea&\xbbE\x81\xb4#G%\xe3\x84\x01\xfd\x1eȇR]\x160\x96\xfc\x8a\xc4\xc8#@\xe5\xe2\xb1\xe7^\xe63>\xdf+\x91b8*B)\x05\xb7\xa0\xe4\xa2Y\x8d5Ԁ$\xb7@-1\xf9-\xccϡ\x96\b\x02\n\xa7@\xcfOK`\xedU\xff\xd2\xc3\xcbQQ\x89\xd6G\xf2\xd4\xd5$M(\xad*l$F\xefH\xb7%`\xe4\f\x06\x8db\x83fL\xad\x9e\xf5\xc3\xe1&N\xa6Jh\n\x1e6;_Rq]~\x12\xb5\xb6%\xdd\x16\x97\x89\x1c%8¨IaB\xe8\x10\x97\xb6e\\\x18؇E滧p\xb1ќ)\xcb\aS\xc9B;\xa6\xd5 \x15\u007fv\xff\x8f\x98\x9by\f\x87}y\xfb\xcf-\x01ݦ\u007f'-ͻ\xb2\xbbr\x12\x8d\xdf\x04\xa2\x89\x16J\xaa\x95\xd9\x0f\xf5\x05\x91-\xeav\xb5r\x88\fj\x91C{HfĐ\x16l\x19)\b\xcf+q\x03m\xe4\xb7a''&*\xe8@\xb8\xa9\xa4\xbe\x15\xf5\xe5\x03\xa9\x01\x1f\x10l\xf7:\x865=ѽt\x1fN\xea7su\xe3\xcf\xe0\xd6\x013\x02/\xa7=,\xcan\x01\xad\xdb\xf9\x0e\x8aM\xe83\x8f0^"
We definitely cannot! In fact, not only is the secret value encrypted, then entire contents of the data
section is encrypted!
Note: Any previously created Secrets (i.e., those created before encryption-at-rest was enabled) will not be encrypted. You can update them to start using encryption with the following command:
$ kubectl get secrets --all-namespaces -o json | kubectl replace -f -
IMPORTANT NOTE: Be careful with the above command! If you have mis-configured something with your encryption at rest, you could lock yourself out of your entire Kubernetes cluster. A better way to test that you have everything setup correctly is to only run the above command (i.e., update your Secrets) for the default
namespace:
$ kubectl -n default get secrets -o json | kubectl replace -f -
If the above works on the default namespace, you should be okay updating all Secrets in all namespaces (including kube-system).
Miscellaneous
- Install Rancher using Helm with default registry set (useful for private registries):
$ helm install rancher \ --name rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --set rancherImageTag=master \ --set 'extraEnv[0].name=CATTLE_SYSTEM_DEFAULT_REGISTRY' \ --set 'extraEnv[0].value=http://private-registry.example.com/'
- Get the randomly generated password the Rancher Terraform provider stores in the TF state file:
jq -crM '.resources[] | select(.provider == "module.rancher.provider.rancher2.bootstrap") | {instances: .instances[]|.attributes.current_password} | .[]' terraform.tfstate
Rancher State File
- Get the Rancher State File of a given cluster:
$ kubectl --kubeconfig=kube_config_rancher-cluster.yml \ --namespace kube-system \ get configmap full-cluster-state -o json | \ python -c 'import sys,json;data=json.loads(sys.stdin.read());print data["data"]["full-cluster-state"]' \ > rancher-cluster.rkestate_bkup_$(date +%f) #~OR~ $ kubectl --kubeconfig=kube_config_rancher-cluster.yml \ get configmap -n kube-system full-cluster-state -o json | \ jq -r .data.\"full-cluster-state\" > rancher-cluster.rkestate_bkup_$(date +%f) #~OR~ $ kubectl --kubeconfig $(docker inspect kubelet \ --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl/kubecfg-kube-node.yaml \ get configmap -n kube-system full-cluster-state -o json | \ jq -r .data.\"full-cluster-state\" > rancher-cluster.rkestate_bkup_$(date +%f)
- Get the Rancher (RKE) current state file directly from etcd:
$ docker exec etcd etcdctl get /registry/configmaps/kube-system/full-cluster-state |\ tail -n1 | tr -c '[:print:]\t\r\n' '[ *]' | sed 's/^.*{"desiredState/{"desiredState/' |\ docker run -i oildex/jq:1.6 jq -r '.currentState.rkeConfig' |\ python -c 'import sys,json,yaml;data=json.loads(sys.stdin.read());print(yaml.dump(yaml.load(json.dumps(data)),default_flow_style=False))' \ > rancher-cluster.rkestate_bkup_$(date +%f) 2>/dev/null