K8s Admin

Cluster Architecture

Roles

Master (1... *)

Can be replicated for HA, initiates and follows spec instructions

  • API server: comms hub for all components, expose k8s api
  • Scheduler: Assign app to a worker node, decides where it goes
  • Controller manager: Maintains the cluster. Node failures, replicating components maintaining correct no. of pods
  • etcd: Data store for cluster config

Worker (0... *)

Running app, monitoring and providing services

  • kubelet: runs and manages containers on node, talks to API
  • kube-proxy: Load balance traffic between app components
  • container runtime: Program that runs containers (docker, rkt, containderd)

Single node or 5000 node cluster is the same deployment, relocate app components to another node, focus on deploy and scaling, transparent for users

K8s Takes care of

  • service discovering
  • scaling
  • load balancing
  • self healing
  • leader election
kubectl get nodes
# roles: master or worker
kubectl get pods --all-namespaces -o wide
#get all with details (IP and node)

kubectl get namespaces
#show namepaces

kubectl describe pod nginx
# all details about a pod

kubectl delete pod nginx
# delete the pod

Example: multiple application components using Docker as container runtime

  • Multiple images on the containers 👌
  • Multiple container running in the same pod (dependencies) 👌
  • Multiple replicas of the same image 👌
  1. Human operator creates description of app and dependencies and replicas
  2. Scheduler (master) notifies the nodes(worker kubelet)
  3. Kubelet tells Docker (container runtime) to download images from registry (Docker hub)
  4. Docker provisions the container and runs it in the worker
  5. Control plane makes sure the application stays as specified in the description or fix it: Declarative intent

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - name: nginx
      image: nginx

---

kubectl create -f nginx.yaml

kubectl get pods

kubectl describe pod nginx
#handy for troublshooting

kubectl delete pod nginx

Kubernetes API Primitives

Intercluster communication happens with a call to the API Server

Every component communicates with the API server (not directly)

API server only to connect with etcd ⚠️

API server exposes API componentstatus shows all components + status of control plane

kubectl get componentstatus: scheduler, controller manager, etcd

Use API directly with client libraries, Go or Python

Pods, Services, etc are persistent entities, represent state of cluster = record of intent

k8s works to maintain that state → express record of intent with yaml files → kubectl

yaml : overcomes kubectl limitations of properties, version control

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec: # how the object should look like; status describe and return to control plane, correct if needed 
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
          - containerPort: 80

kubectl create -f nginx.yaml

kubectl get deployment nginx-deployment -o yaml

apiVersion: apps/v1  # of kubernetes, sw and API version not the same
kind: Deployment  # kind of object you want to create, pod, job, replicaset, daeamon set +
metadata:  # identifier for the obj, name UID, namespace
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-deployment","namespace":"default"},"spec":{"replicas":2,"selector":{"matchLabels":{"app":"nginx"}},"template":{"metadata":{"labels":{"app":"nginx"}},"spec":{"containers":[{"image":"nginx:1.7.9","name":"nginx","ports":[{"containerPort":80}]}]}}}}
  creationTimestamp: "2020-09-28T12:50:14Z"
  generation: 1
  name: nginx-deployment
  namespace: default
  resourceVersion: "259373"
  selfLink: /apis/apps/v1/namespaces/default/deployments/nginx-deployment
  uid: 2bb0620d-25cd-4625-a7a2-9b95cf815d1f
spec:                # state of the objects w nested fields
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx          # arbitrary, used as selectors, added any time
    spec:                   # container spec, pods container image, vols, exposed ports
      containers:
      - image: nginx:1.7.9
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:               # state of teh object, control plane to match 
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-09-28T12:50:14Z"
    lastUpdateTime: "2020-09-28T12:50:14Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-09-28T12:50:14Z"
    lastUpdateTime: "2020-09-28T12:50:48Z"
    message: ReplicaSet "nginx-deployment-54f57cf6bf" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 2
  unavailableReplicas: 1
  updatedReplicas: 2

kubectl get pods --show-labels

NAME                                READY   STATUS    RESTARTS   AGE   LABELS
nginx-deployment-54f57cf6bf-lwdqz   1/1     Running   0          42m   app=nginx,pod-template-hash=54f57cf6bf
nginx-deployment-54f57cf6bf-xzt5f   1/1     Running   0          42m   app=nginx,pod-template-hash=54f57cf6bf

Adding a label to existing pod

kubectl label pods nginx-deployment-54f57cf6bf-lwdqz env=prod

kubectl get pods -L env

use the selector to target certain pods

Adding annotation to existing pod

kubectl annotate deployment nginx-deployment mycompany.com/someannotation="yeah"

Filter objects based on fields

All running

kubectl get pods --field-selector status.phase=Running

All in the current namespace

kubectl get services --field-selector metadata.namespace=default

Multiple selectors

kubectl get pods --field-selector status.phase=Running,metadata.namespace!=default

Service and Network primitives

All pods communicate in the same way via IP (one flat layer)

Service: Allows dynamic access to a group of replicas, pod stucture is unimportant (deleted and created with new IP)

apiVersion: v1
kind: Service
metadata:
  name: nginx-nodeport
spec:
  type: NodePort
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 30080
  selector:
    app: nginx

kubectl get services nginx-nodeport

curl localhost:30080

Kubeproxy

handles traffic associated with a service by creating iptables rules — load balances traffic

  1. services → virtual ip and port pair →
  2. API server notifies kube-proxy about the new service
  3. kube-proxy creates new iptable rules to ensure the service is reachable
  4. a pod sends a packet  addressed to the service, when reaches the iptable rules it is re routed to the destination pod directly
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
    - name: busybox
      image: radial/busyboxplus:curl
      args:
      - sleep
      - "1000"

kubectl apply -f busybox.yaml

kubectl get pods -o wide

kubectl get services

$ nginx-nodeport NodePort 10.109.202.65 <none> 80:30080/TCP 15m

kubectl exec busybox -- curl 10.109.202.65:80

$ Welcome to Nginx

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - name: nginx
      image: nginx
  nodeSelector:
    disk: ssd

example: create new pod and assigning it to the node with label disk=ssd

Building the cluster

Release Binaries, Provisioning, and Types of Clusters

TODO: missing diagram

Manual installation

  • own network fabirc
  • configure without using flannel, weave
  • find binaries
  • control plane: etcd, runtime, kubelet, kubeadm, kubeapi server
  • kubeproxy → run on a pod or in the node itself
  • kubelet must not run as a pod (only)
  • build own images
  • secure cluster communication
  • Access API via https or secure cluster comms

Pre-built

  • minikube
  • test and easiest, local, single node
  • minishift
  • microK8s
  • Ubuntu on LXD
  • AWS, Azure, GC, etc

Useful commands

kubectl cluster-info
kubectl describe pods --all-namespaces
kubectl get services all-namespaces
kubectl api-resources

Build a cluster

Creating a cluster on Ubuntu Focal Nossa

# Get the Docker gpg key
curl -fsSL <https://download.docker.com/linux/ubuntu/gpg> | sudo apt-key add -

# Add the Docker repository
sudo add-apt-repository    "deb [arch=amd64] <https://download.docker.com/linux/ubuntu> \\
   $(lsb_release -cs) \\
   stable"

# Get the Kubernetes gpg key
curl -s <https://packages.cloud.google.com/apt/doc/apt-key.gpg> | sudo apt-key add -

# Add the Kubernetes repository
cat << EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb <https://apt.kubernetes.io/> kubernetes-xenial main
EOF

# Update your packages
sudo apt-get update

# Install Docker, kubelet, kubeadm, and kubectl
sudo apt-get install -y docker-ce=5:19.03.9~3-0~ubuntu-focal kubelet=1.17.8-00 kubeadm=1.17.8-00 kubectl=1.17.8-00

# Hold them at the current version
sudo apt-mark hold docker-ce kubelet kubeadm kubectl

# Add the iptables rule to sysctl.conf
echo "net.bridge.bridge-nf-call-iptables=1" | sudo tee -a /etc/sysctl.conf

# Enable iptables immediately
sudo sysctl -p

Master only

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# implement calico cni 
kubectl apply -f <https://docs.projectcalico.org/v3.14/manifests/calico.yaml>

Workers only

kubeadm join 172.31.31.60:6443 --token jeq3mr.2psex1eqezqwavq5     --discovery-token-ca-cert-hash sha256:991326ed19b971b27db06ab65aa530e21efb642cef6954145129b18c4cf6fd57

Verify all nodes are Ready with kubectl get nodes

Build a Highly Available and Faul tolerant cluster

API ServeretcdSchedulerController Managermaster 1API ServeretcdSchedulerController Managermaster 2API ServeretcdSchedulerController Managermaster 3Load balancerkubeletworker 1kubeletworker 2kubeletworker 3kubeletworker 4standby

Replicas of app and copies in multiple nodes + duplicate K8s components

kubectl get pods -o custom-columns=POD:metadata.name,NODE:spec.nodeName --sort-by spec.nodeName -n kube-system

custom query to check config and state of pod/nodes

In a HA setup you wil have multiple master nodes, to prevent scheduler and control manager racing and possibly duplicating objects, only one pair ( one master) is active at any time.

Use LeaderElect option to select the active component, informs others of which server is the leader. Achieved by creating an endpoint.

Chek via scheduler yaml kubectl get endpoints kube-scheduler -n kube-system -o yaml — periodically updates the resource, every 2 secs (default)

metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity": "f2c3d18a641c.mylabserver.com_

Replicating etcd

Stacked (each node has its own etcd) or external (to the cluster) topologies

external to the k8s cluster, consensus algo to progress state. State change requires majority (odd num) — 3 to 7 etcd instances usually

# 1. get binaries

# 2. put in place
/usr/local/bin

# 3. create dir
/etc/etcd and /var/lib/etcd

# 4. create the systemd unit file for etcd

# 5. enable and start etcd service

sudo kubeadm -init —config=kubeadm-config.yaml initialize cluster w/ stacked etcd

kubectl get pod -n kube-system -w watch pods

Securing cluster communications

podRoleRole bindingAdminsNamespace XServiceAccountpodNamespace YServiceAccountX

Set all comms to https

API Server → CRUD interface to modify the cluster state over a RESTful API

  1. Authentication: determined by http header or certificate, back to API server
  2. Authorization: determine if the action is available to the authenticated user. e.g., can create resource?
  3. Admission: Plugins modify the resource for reasons, such as  applying defaults
  4. Resource validation: validates before puting into etcd
  5. Change etcd: make changes and return the response

Bypass (by default) using self signed cert at .kube/config

User has Roles (allows actions on Resources)

  • Role: what can be done
  • Role binding: Who can do it

Example:

Admin Group ↔Role binding ↔ Role ↔✅ Pod

ServiceAccount → Pod authenticates to the API server, identity of the app running in the Pod

Must bbe in the same namespace as the Pod

Running end-to-end tests

kubetest commontool for testing

A preventive checklist

  • Deployments can run
  • Pods can run
    kubeclt run nginx --image=nginx && kubectl get deployments && kubectl get pods
  • Pods can be accessed directly
    kubectl port-forward nginx-6dasaghug 8081:80 && curl 127.0.0.1:8081
  • Logs can be collected
    kubectl get pods && kubectl logs nginx-6db489d4b7-fl869
  • Commands run from pod
    kubectl exec -it nginx-6db489d4b7-fl869 -- nginx -vnginx version: nginx/1.19.2
  • Services can provide access
    kubectl expose deployment nginx --port 80 --type NodePort
    >> worker node:  curl -I localhost:31297
  • Nodes are healthy
    kubectl describe nodes
  • Pods are healthy
    kubectl describe pods

Installing and testing components

Install 3 node cluster

  • all above

Expose port on the pod
kubectl create deployment nginx --image=nginx
kubectl get deployments -o wide
kubectl get podskubectl port-forward nginx-86c57db685-qwv5s 8081:80

Verify nginx version
kubectl exec -it nginx-86c57db685-qwv5s -- nginx -v

Create a service
kubectl expose deployment nginx --port 80 --type NodePort
kubectl get svckubectl get pods -o wide curl -I ip-10-0-1-102:30142

Documentation: Creating a cluster with kubeadm

Managing the cluster

Upgrading

kubectl get no
kubectl version --short
kubeadm version

unmarking kubectl and kubeadm

upgrading

sudo kubeadm upgrade apply v1.18.5
sudo apt-mark unhold kubectl

update kubeadm

apt-mark unhold kubeadm kubelet
apt install -y kubeadm=1.18.5-00
kubeadm upgrade plan
kubeadm upgrade apply v1.18.9
kubectl get no

apt-mark unhold kubectl && apt-mark hold kubectl

each node

apt-mark unhold kubelet
sudo apt install -y kubelet=1.18.5-00
apt-mark hold kubelet

Upgrading OS

Move pods from that node. Kubelet might attempt to restart pod in same node, if longer downtime, controller will reschedule some other pod

kubectl drain <node-identifier>

kubectl get pods -o wide
kubectl drain ed092a70c01c.mylabserver.com --ignore-daemonsets

$ node/ed092a70c01c.mylabserver.com evicted

kubectl get no
# do maintenance here on disabled server (no pods will run here)

reenable witth kubectl uncordon <node-identifier>

Remove completely from cluster

kubectl drain ed092a70c01c.mylabserver.com --ignore-daemonsets
kubectl delete node ed092a70c01c.mylabserver.com

Add a new node

  • In master

kueadm token generate

kubeadm token create xxxxx --ttl 2h --print-join-command

  • In new worker node, run the join command from previous step

Backing up and restoring

etcd is the only requirement, thus backup


# get etcd client binaries, unzip, and move
wget <https://github.com/etcd-io/etcd/releases/download/v3.3.25/etcd-v3.3.25-linux-amd64.tar.gz>
tar xvf etcd-v3.3.25-linux-amd64.tar.gz
mv etcd-v3.3.25-linux-amd64/* /usr/local/bin

# run backup command using certificates
ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/server.crt --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key

# verify backup
ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshot.db
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 478dcbd4 |    73771 |       1704 |     3.8 MB |
+----------+----------+------------+------------+

# backup, externally (ideal) 
## snapshot.db 
## certificates at /etc/kubernetes/pki/etcd

Restore via etcdctl restore

  • creates a new etcd data directory
  • restore all nodes using the same snapshot
  • overwrites member id and cluster id —  new cluster
  • new servers must have the same IP as old ones, for restore to succeed
  1. new etcd data dirs each node
  2. specifiy cluster IPs, token
  3. start new cluster with the new data dir

Update a cluster via kubeadm

TODO: missing diagram

Use kubeadm to update 1. control plane, 2.  kubelet, kubectl

master

sudo su
apt-mark unhold kubeadm
apt install -y kubeadm=1.18.9-00
kubeadm upgrade plan

# COMPONENT   CURRENT       AVAILABLE
# Kubelet     3 x v1.17.8   v1.18.9

# Upgrade to the latest stable version:

# COMPONENT            CURRENT    AVAILABLE
# API Server           v1.17.12   v1.18.9
# Controller Manager   v1.17.12   v1.18.9
# Scheduler            v1.17.12   v1.18.9
# Kube Proxy           v1.17.12   v1.18.9
# CoreDNS              1.6.5      1.6.7
# Etcd                 3.4.3      3.4.3-0

# You can now apply the upgrade by executing the following command:

# 	kubeadm upgrade apply v1.18.9

kubeadm upgrade apply v1.18.9
## downloads images and swaps them with minimal downtime

## might mess up k8s config file, use
kubectl --kubeconfig .kube/config get no

apt-mark unhold kubelet kubectl
apt install -y kubelet=1.18.9-00
apt install -y kubectl=1.18.9-00

kubectl --kubeconfig .kube/config version --short
# Client Version: v1.18.9
# Server Version: v1.18.9

kubectl --kubeconfig .kube/config get no
# NAME            STATUS   ROLES    AGE     VERSION
# ip-10-0-1-101   Ready    master   3h15m   v1.18.9
# ip-10-0-1-102   Ready    <none>   3h15m   v1.17.8
# ip-10-0-1-103   Ready    <none>   3h15m   v1.17.8

repeat kubelet upgrade on all worker nodes

sudo su
apt-mark unhold kubelet && apt install -y kubelet=1.18.9-00

on master

kubectl --kubeconfig .kube/config get no
# NAME            STATUS   ROLES    AGE     VERSION
# ip-10-0-1-101   Ready    master   3h19m   v1.18.9
# ip-10-0-1-102   Ready    <none>   3h18m   v1.18.9
# ip-10-0-1-103   Ready    <none>   3h18m   v1.18.9

Cluster communications

Pod and node

Missing diagrams

comms done by a container network interface (cni)

# process 8149 from docker ps; docker inspect --format '{{ State.Pid }}'
sudo nsenter -t 8149 -n ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    link/ether aa:04:93:82:cc:90 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.79.10/32 scope global eth0
       valid_lft forever preferred_lft forever

# to note:
# eth0 IN THE POD -> if9 interface 
# inet 10.244.79.10/32

should map to an interface via ifconfig

Container Network Interface

TODO: Missing diagram

plugin options:

  • calico
  • romana
  • flannel
  • weave net
  • more...
  1. Once installed, a network agent is installed per node,
  2. ties to the CNI interface,
  3. kubelet is notified (set network plugin flag = cni)
  • kubeadm init --pod-network-cidr=10.244.0.0/16 >> diffeent per plugin

The container runtime calls cni executable to add/remove instance from the interface

cni

  • creates IP address and assign to pod
  • IP address management (available) over time

Service Networking

TODO: Missing diagram

Communication outside the cluster (internet?)

Services

  • Simplify locating critical infrastructure components, move or create additional replicas
  • Provides one virtual interface; evenly distributed and assigned to pods
  • Pods' IP addresses change but from the outside all goes via the same Network interface
  • Node Port service
apiVersion: v1
kind: Service
metadata:
  name: nginx-nodeport
spec:
  type: NodePort
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 30080
  selector:
    app: nginx

ClusterIPService created by default on cluster creation.

kubectl get service -o yaml

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2020-09-29T06:58:42Z"
    labels:
      component: apiserver
      provider: kubernetes
    name: kubernetes
    namespace: default
    resourceVersion: "148"
    selfLink: /api/v1/namespaces/default/services/kubernetes
    uid: dd606201-09e0-450b-b551-521987d97dcf
  spec:
    clusterIP: 10.96.0.1
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 6443
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2020-09-29T09:20:18Z"
    labels:
      run: nginx
    name: nginx
    namespace: default
    resourceVersion: "21051"
    selfLink: /api/v1/namespaces/default/services/nginx
    uid: 4c817d34-9d1b-4519-8a97-d5a6ce71c24b
  spec:
    clusterIP: 10.103.151.135
    externalTrafficPolicy: Cluster
    ports:
    - nodePort: 31297
      port: 80
      protocol: TCP
      targetPort: 80
    selector:
      run: nginx
    sessionAffinity: None
    type: NodePort
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

service is created, notifies all kubeproxy agents

kubeproxy not an actual proxy, controller that keeps track of endpoints and update IP tables for traffic routing

endpoints are API objects created automatically with the services, has a cache of the IPs of the pods in that service.

redirects traffic to another pod in that service using iptables
root@f2c3d18a641c:~# kubectl get services
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP        7d
nginx        NodePort    10.103.151.135   <none>        80:31297/TCP   6d22h
root@f2c3d18a641c:~# kubectl get endpoints
NAME         ENDPOINTS           AGE
kubernetes   172.31.31.60:6443   7d
nginx        10.244.79.12:80     6d22h

# iptables-save | grep KUBE | grep nginx
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.151.135/32 -p tcp -m comment --comment "default/nginx: cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ

Ingress rules and load balancer

load balancer will help route traffic if any node goes down (no downtime)

LoadBalancer type service, can  be provisioned automatically instead of a Node type

Creating a load balancer service

kubectl expose deployment <name> --port 80 --target-port 808 --type LoadBalancer

checking yamlkubectl get services <name> -o yaml

Node Port assigned, SessionAffinity=None (balancer), ingressIP assigned
  • Not aware of pods within each NODE; use IPTables

Add annotations to avoid node hoping (and increased latency)

kubectl describe services <name>

kubectl annotate service <name> externalTrafficPolicy=Local

Ingress resource: Operates at the application layer, single point of communication for clients.

Ingress >> App >> Service >> Pod

Ingress controller and ingress resource

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: service-ingress
spec:
	rules:
	- host: 
    http:
      paths:
	...
	- host: 
    http:
      paths:
	...
	- host: 
    http:
      paths:
	...

kubectl create -f ingress.yaml

kubectl edit ingress

kubectl describe ingress

Cluster DNS

CoreDNS

native solution, flexible DNS server (golang), mem safe executable, DNS over TLS, integrates w etcd and cloud providers, plugin architecture.

kubectl get pods -n kube-system | grep coredns
# coredns-66bff467f8-vckc2                               1/1     Running   2          19h
# coredns-66bff467f8-vtz59                               1/1     Running   1          5h41m

kubectl get deployments -n kube-system | grep coredns
# coredns                   2/2     2            2           7d1h

kubectl get services -n kube-system
# NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
# kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   7d1h

---
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
    - image: busybox:1.28.4
      command:
        - sleep
        - "3600"
      imagePullPolicy: IfNotPresent
      name: busybox
  restartPolicy: Always

kubectl create -f busybox.yaml

kubectl get pods

kubectl exec -t busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local us-east-1.compute.internal
options ndots:5

kubectl exec -it busybox -- nslookup kubernetes

Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
#servicename. namespace. base_domain_name 
 kubernetes.default.svc.cluster.local

 10.96.0.1.default.pod.cluster.local
#pod-IP.namespace.base_domain_name

kubectl exec -it busybox -- nslookup 10-244-194-199.default.pod.cluster.local

kubectl exec -it busybox -- nslookup kube-dns.kube-system.default.pod.cluster.local

check the dns server is reachable within the busybox pod (good for troubleshooting)

Check dns pod

kubectl logs -n kube-system coredns-66bff467f8-vckc2
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b

headless service, returns single pod IP, instead of a Service

---
apiVersion: v1
kind: Service
metadata:
  name: kube-headless
spec:
  clusterIP: None
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: kubeserve2

Default config is cluster first: Pod inherits configuration from the Node, otherwise cutomize:

custom-dns.yaml

---
apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 8.8.8.8
    searches:
      - ns1.svc.cluster.local
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0

use busybox pod to query dns

kubectl exec -it busybox -- nslookup 10-244-194-200.default.pod.cluster.local

then query resolve to check dns has been configured well

kubectl exec -t dns-exampple -- cat /etc/resolv.conf

nameserver 8.8.8.8
search ns1.svc.cluster.local my.dns.search.suffix
options ndots:2 edns0

Hands on

# create an nginx deployment
kubectl --kubeconfig .kube/config run nginx --image=nginx
kubectl --kubeconfig .kube/config get pods

# create a service
kubectl expose deployment nginx --port 80 --type NodePort
kubectl get services

# a pod to check on dns
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
    - image: busybox:1.28.4
      command:
        - sleep
        - "3600"
      name: busybox
  restartPolicy: Always

kubectl create -f busybox.yaml

# query dns for nginx
kubectl exec busybox -- nslookup nginx
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      nginx
Address 1: 10.103.132.129 nginx.default.svc.cluster.local

# FQN: nginx.default.svc.cluster.local

Scheduling in Cluster

Scheduler determines which node should host a given pod. Rules by default (but customizable)

  1. Does the node have enough hardware resources?
  2. Is the node is running out of resources?
  3. Is the pod marked as destined to a specific node?
  4. Check if the pod and node have a matching selector
  5. Is the pod linked to a specific nodePort, is available?
  6. Is the pod linked to a specific volume, can be mounted?
  7. Does the pod tolerate taints of the node?
  8. Does the pod specify node or pod affinity rules?

Results in a set of candidate nodes that are prioritized then best one is chosen (round-robin if all the same priority)

Affinity: Scheduling without having to specify selectors, nice to have rules.

Example adding label to nodes

kubectl label node [fc9ccdd4e21c.mylabserver.com](<http://fc9ccdd4e21c.mylabserver.com/>) availability-zone=zone2

kubectl label node [ed092a70c01c.mylabserver.com](<http://ed092a70c01c.mylabserver.com/>) share-type=dedicated

Creating a deployment and setting up affinity to labels (pref-deployment.yaml)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pref
spec:
  replicas: 5
  selector:
    matchLabels:
      app: pref
  template:
    metadata:
      labels:
        app: pref
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: availability-zone
                    operator: In
                    values:
                      - zone1
            - weight: 20
              preference:
                matchExpressions:
                  - key: share-type
                    operator: In
                    values:
                      - dedicated
      containers:
        - args:
            - sleep
            - "99999"
          image: busybox
          name: main
  • preferredDuringSchedulingIgnoredDuringExecution: don't affect nodes already running, apply only during scheduling
  • give more preference to zone than type of node
kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP               NODE                           NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   3          22h     10.244.194.201   ed092a70c01c.mylabserver.com   <none>           <none>
dns-example              1/1     Running   1          22h     10.244.194.202   ed092a70c01c.mylabserver.com   <none>           <none>
nginx-6db489d4b7-rz56v   1/1     Running   2          28h     10.244.79.15     fc9ccdd4e21c.mylabserver.com   <none>           <none>
pref-646c88c576-58vhm    1/1     Running   0          4m16s   10.244.194.203   ed092a70c01c.mylabserver.com   <none>           <none>
pref-646c88c576-d2mw2    1/1     Running   0          4m16s   10.244.194.207   ed092a70c01c.mylabserver.com   <none>           <none>
pref-646c88c576-qmmw4    1/1     Running   0          4m16s   10.244.194.204   ed092a70c01c.mylabserver.com   <none>           <none>
pref-646c88c576-srz8l    1/1     Running   0          4m16s   10.244.194.205   ed092a70c01c.mylabserver.com   <none>           <none>
pref-646c88c576-xprgq    1/1     Running   0          4m16s   10.244.194.206   ed092a70c01c.mylabserver.com   <none>           <none>

pods assigned to node with higher affinity, unless scheduler decides to spread out

Multiple schedulers for multiple pods

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-scheduler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
  name: my-scheduler
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:kube-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-scheduler-as-volume-scheduler
subjects:
- kind: ServiceAccount
  name: my-scheduler
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:volume-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: scheduler
    tier: control-plane
  name: my-scheduler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: scheduler
      tier: control-plane
  replicas: 1
  template:
    metadata:
      labels:
        component: scheduler
        tier: control-plane
        version: second
    spec:
      serviceAccountName: my-scheduler
      containers:
      - command:
        - /usr/local/bin/kube-scheduler
        - --address=0.0.0.0
        - --leader-elect=false
        - --scheduler-name=my-scheduler
        image: chadmcrowell/custom-scheduler
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10251
          initialDelaySeconds: 15
        name: kube-second-scheduler
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10251
        resources:
          requests:
            cpu: '0.1'
        securityContext:
          privileged: false
        volumeMounts: []
      hostNetwork: false
      hostPID: false
      volumes: []
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  # "namespace" omitted since ClusterRoles are not namespaced
  name: csinode-admin
rules:
- apiGroups: ["storage.k8s.io"]
  resources: ["csinodes"]
  verbs: ["get", "watch", "list"]
---
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-csinodes-global
subjects:
- kind: ServiceAccount
  name: my-scheduler # Name is case sensitive
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: csinodes-admin
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: system:serviceaccount:kubesystem:my-scheduler
  namespace: kube-system
rules:
- apiGroups: ["storage.k8s.io"] # "" indicates the core API group
  resources: ["csinodes"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
  name: read-csinodes
  namespace: kube-system
subjects:
# You can specify more than one "subject"
- kind: User
  name: kubernetes-admin # "name" is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  # "roleRef" specifies the binding to a Role / ClusterRole
  kind: Role #this must be Role or ClusterRole
  name: system:serviceaccount:kube-system:my-scheduler # this must match the name of the Role or ClusterRole you wish to bind to
  apiGroup: rbac.authorization.k8s.io

kubectl create -f clusterrole.yaml && kubectl create -f clusterrolebinding.yaml

kubectl create -f role.yaml && kubectl create -f rolebinding.yaml

editing cluster role kubectl edit clusterrole system:kube-scheduler

resourceNames:
  - kube-scheduler
  - my-scheduler  # <------NEW

kubectl create -f myscheduler.yaml

check scheduler is running

kubectl get pods -n kube-system | grep scheduler
kube-scheduler-f2c3d18a641c.mylabserver.com            1/1     Running   3          43h
my-scheduler-5b986674f-w7sc6                           1/1     Running   0          2m9s

create a pod assign to default, my-scheduler, or leave empty (default scheduler assigned)

apiVersion: v1
kind: Pod
metadata:
  name: annotation-scheduler-type
  labels:
    name: multischeduler-example
spec:
  schedulerName: <default-scheduler | my-scheduler | remove line> # <--- HERE scheduler ?
  containers:
  - name: pod-with-scheduler-or-not
    image: k8s.gcr.io/pause:2.0

The easiest way to know which scheduler created the pod is by looking at metadata name; should match spec.

Resource limits and label selectors for scheduling

Taints: Repel work

  • e.g., Master node No schedule kubectl describe node master-node-name section Taints

Tolerations: Permit  a taint to potentially schedule on a node

  • e.g., kube-proxy: kube-ctl get pods kube-proxy-1111 -n namespace -o yaml section tolerations
  • tolerates unschedulable nodes, see toleation effect NoSchedule

Priority functions: Assign pods in nodes with most requested resources (optimize number of nodes)

  1. Find nodes capacity with describe in sections Capacity & Allocatable
  2. If pod fits, use the node selector to assign  kubectl create -f resource-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: resource-pod1
spec:
  nodeSelector:
    kubernetes.io/hostname: "name-of-node"
  containers:
  - image: busybox
    command: ["dd", "if=/dev/zero", "of=/dev/null"]
    name: pod1
    resources:
      requests:
        cpu: 800m
        memory: 20Mi #<---- needed mem

If not enough resources in the node, pod will fail with message Insufficient resources, verify via kubect describe pod

Set hard limits on the resources a pod maybe use

apiVersion: v1
kind: Pod
metadata:
  name: resource-pod-limit
spec:
  containers:
  - image: busybox
    command: ["dd", "if=/dev/zero", "of=/dev/null"]
    name: main
    resources:
      limits: #<--- how much can be used
        cpu: 1
        memory: 20Mi

with limit, pod may be deployed even if resources exceed node capacity. k8s will terminate a node if needed. Verify kubectl exec -it resource-pod-limit -- top

DaemonSets and Manual Schedules

TODO: Add daemonSet pods diagrams

A DaemonSet doesn't use the scheduler.

land a pod per node

Good fit: Run exactly one replica of a pod in every node, e.g.,  kube-proxy, calico-node

  • Run pod in each node
  • Make sure replicas running immediately, recreates automatically

Create a daemonSet to monitor SSD

# label the target node
kubectl label node ed092a70c01c.mylabserver.com disk=ssd

# create daemon set
kubectl create -f ssd-monitor.yaml

# check daemonsets and related pod
kubectl get daemonsets
kubectl get pod -o wide

# add label to other nodes and daemonSet will create new pods
kubectl label node fc9ccdd4e21c.mylabserver.com disk=ssd

# remove/change label; will make daemonset terminate ssd-monitor pod
kubectl label node fc9ccdd4e21c.mylabserver.com disk-
kubectl label node ed092a70c01c.mylabserver.com disk=hdd --overwrite
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ssd-monitor
spec:
  template:
    metadata:
      labels:
        app: ssd-monitor
    spec:
      nodeSelector:
        disk: ssd
      containers:
      - name: main
        image: linuxacademycontent/ssd-monitor
  selector:
    matchLabels:
      app: ssd-monitor

Scheduler events

Pod: check events section withkubectl describe pods kube-scheduler-f2c3d18a641c.mylabserver.com -n kube-system

  • Check Events: kubectl get event -n kube-system -w
  • Check Log: kubectl logs kube-scheduler-f2c3d18a641c.mylabserver.com -n kube-system
  • More logs at /var/log/pods/kube-system_*

Hands on

Taint node to repel work

kubectl taint node ip-node2 node-type=prod:NoSchedule && kubectl describe node ip-node2

Taints: node-type=prod:NoSchedule

Schedule pod to dev environment kubectl create -f dev-pod.yaml

---
apiVersion: v1
kind: Pod
metadata:
  name: dev-pod
  labels:
    app: busybox
spec:
  containers:
  - name: dev
    image: busybox
    command: ['sh', 'c', 'echo Hello K8s! && sleep 3600']

Allow a pod to be scheduled to production

  • Creat a deployment kubectl -f prod-deploymet.yaml;
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prod
  template:
    metadata:
      labels:
        app: prod
    spec:
      containers:
      - args: [sleep, "3600"]
        image: busybox
        name: main
      tolerations:
      - key: node-type
        operator: Equal
        value: prod
        effect: NoSchedule

Verify pod schedule and check toleration

  • kubeclt scale deployment/prod --replicas=3
  • kubeclt get pods prod-7654d444bc-slzlx -o yaml and look for
tolerations:
  - effect: NoSchedule
    key: node-type
    operator: Equal
    value: prod

Deploying  Applications

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubeserve
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubeserve
  template:
    metadata:
      name: kubeserve
      labels:
        app: kubeserve
    spec:
      containers:
      - image: linuxacademycontent/kubeserve:v1
        name: app

kubectl create -f kubeserve-deployment.yaml --record

  • --record: store CHANGE-CAUSE in the rollout history (see below)

check status of deployment

kubectl rollout status deployments kubeserve && kubectl get pods

Waiting for deployment "kubeserve" rollout to finish: 0 of 3 updated replicas are available...
Waiting for deployment "kubeserve" rollout to finish: 1 of 3 updated replicas are available...
Waiting for deployment "kubeserve" rollout to finish: 2 of 3 updated replicas are available...
deployment "kubeserve" successfully rolled out


kubeserve-6b7cdb8ddc-4hc2b   1/1     Running   0          93s
kubeserve-6b7cdb8ddc-v5g48   1/1     Running   0          93s
kubeserve-6b7cdb8ddc-vnx8f   1/1     Running   0          93s
  • Pod name comes from the hash val of pod template replica set and deployment
  • ReplicaSet created automatically to manage these pods
  • kubectl get replicasets
  • kubeserve-6b7cdb8ddc 3 3 3 3m58s

Scale via kubectl scale deployment kubeserve --replicas=5

kubectl expose deployment kubeserve --port 80 --target-port 80 --type NodePort
service/kubeserve exposed

# app exposed the internet
kubectl get services

NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubeserve    NodePort    10.111.181.169   <none>        80:31347/TCP   42s

Updating app

Change the spec to slow app deployment and visualize better

kubectl patch deployment kubeserve -p '{"spec": {"minReadySeconds":10}}'
deployment.apps/kubeserve patched

Method 1

Update the yaml file, e.g., image version, and kubectl apply -f kubeserve-deployment.yaml

kubectl describe deployments | grep linuxacademycontent
Image: linuxacademycontent/kubeserve:v2

kubectl replace -f kubeserve-deployment.yaml (will fail if deployment doesn't exist)

Method 2 (preferred)

Rolling update ( no downtime)

kubectl set image deployments/kubeserve app=linuxacademycontent/kubeserve:v2 --v 6

  • verbose level 6
  • will replace all pod images with v2 gradually
  • creates new replicaSet
  • As new replicaSet pods aree created, terminates pods in the old replicaSet
kubectl get replicasets
NAME DESIRED CURRENT READY AGE
kubeserve-6b7cdb8ddc 0 0 0 46m
kubeserve-7fd7b74ffd 3 3 3 31m

kubectl describe replicaset kubeserve-7fd7b74ffd

Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  32m    replicaset-controller  Created pod: kubeserve-7fd7b74ffd-n98tm
  Normal  SuccessfulCreate  31m    replicaset-controller  Created pod: kubeserve-7fd7b74ffd-q86n2
  Normal  SuccessfulCreate  31m    replicaset-controller  Created pod: kubeserve-7fd7b74ffd-p9p26
  Normal  SuccessfulDelete  10m    replicaset-controller  Deleted pod: kubeserve-7fd7b74ffd-p9p26
  Normal  SuccessfulDelete  10m    replicaset-controller  Deleted pod: kubeserve-7fd7b74ffd-q86n2
  Normal  SuccessfulDelete  10m    replicaset-controller  Deleted pod: kubeserve-7fd7b74ffd-n98tm
  Normal  SuccessfulCreate  6m58s  replicaset-controller  Created pod: kubeserve-7fd7b74ffd-cnzgg
  Normal  SuccessfulCreate  6m45s  replicaset-controller  Created pod: kubeserve-7fd7b74ffd-ghthr
  Normal  SuccessfulCreate  6m32s  replicaset-controller  Created pod: kubeserve-7fd7b74ffd-ckmj

Undo a deployment

kubectl rollout undo deployments kubeserve

deployment has a revision history in the underlying replicasets, therefore is preferred to patching the current deployment (method 1)

# rollback last deployment

kubectl rollout history deployment kubernetes

deployment.apps/kubeserve
REVISION  CHANGE-CAUSE
3         kubectl create --filename=kubeserve-deployment.yaml --record=true
4         kubectl create --filename=kubeserve-deployment.yaml --record=true

# rollback to specific version
kubectl rollout undo deployments kubeserve --to-revision=3

Pause a rollout deployment kubectl rollout pause deployment kubeserve leaving new and old pods (canary release), the command kubectl rollout resume will unpause the rollout.

Configuring for High Available and Scalable applications

TODO: configuration data passing to apps
  • minimize opportunity for errors when deploying apps
  • prevent faulty versions from being released

minReadySeconds property: How many secs a pod should be in state Ready before considered as available — deters rollout until available

readinessProbe = success: pod will receive client requests

Check every second until gets success message

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubeserve
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubeserve
  template:
    metadata:
      name: kubeserve
      labels:
        app: kubeserve
    spec:
      containers:
      - image: linuxacademycontent/kubeserve:v1
        name: app
        readinessProbe:
          periodSeconds: 1
            httpGet:
              path: /
              port: 80
kubectl apply -f readiness.yaml

kubectl rollout status deployment kubeserve
deployment "kubeserve" successfully rolled out

# if doesn't pass the readiness probe expect 'progress deadline exceeded' error

How to pass configuration options

Method 1: Env variables

  • Commonly use
  • store stuff like keys, passwords
  • use  a ConfigMap and pass to container via environment variable
  • May useSecret as environment variable
  • Just update configmap or secret, instead of rebuilding the image

create a configmap from literal


kubectl create configmap appconfig --from-literal=key1=value1 --from-literal=key2=value2

passing config map into Pod

apiVersion: v1
kind: Pod
metadata:
  name: configmap-pod
spec:
  containers:
  - name: app-container
    image: busybox:1.28.4
    command: ["sh", "-c", "echo $(MY_VAR) && sleep 3600"]
    env:
    - name: MY_VAR
        valueFrom:
          configMapKeyRef:
            name: appconfig
            key: key1
kubectl logs configmap-pod # retrieve via logs
$~ value1

Method 2: Mounted volume

apiVersion: v1
kind: Pod
metadata:
  name: configmap-pod
spec:
  containers:
  - name: app-container
    image: busybox:1.28.4
    command: ["sh", "-c", "echo $(MY_VAR) && sleep 3600"]
    volumeMounts:
    - name: configmapvolume
      mountPath: /etc/config
  volumes:
    - name: configmapvolume
      configMap: # <-------- mark volume as a configmap
        name: appconfig
kubectl exec configmap-pod -- ls /etc/config
$~ key1
$~ key2
kubectl exec configmap-pod -- cat /etc/config/key1
$~ value1

Method 3: Secret

for sensitive data use Secret

apiVersion: v1
kind: Secret
metadata: 
  name: appsecret
stringData:
  cert: value
  key: value
apiVersion: v1
kind: Pod
metadata:
  name: secret-pod
spec:
  containers:
  - name: app-container
    image: busybox:1.28.4
    command: ["sh", "-c", "echo $(MY_VAR) && sleep 3600"]
    env:
    - name: MY_CERT
      valueFrom:
        secretKeyRef:
          name: appsecret
          key: cert
kubectl exec -it secret-pod -- sh
echo $MY_CERT
$~ val

Method 4: Secrets in a volume

apiVersion: v1
kind: Pod
metadata:
  name: secret-mount-pod
spec:
  containers:
  - name: app-container
    image: busybox:1.28.4
    command: ["sh", "-c", "echo $(MY_VAR) && sleep 3600"]
    volumeMounts:
      - name: secretvolume
        mountPath: /etc/cert
  volumes:
    - name: secretvolume
      secret:
        secretName: appsecret
kubectl exec secret-mount-pod -- ls /etc/cert
$~ cert
$~ key

Self-healing applications

  • Reduce the need to monitor for errors,
  • Take error prone servers or application out of the cluster
  • via ReplicaSet => No downtime, create replicas in healthy nodes

e.g, anything labeled tier=frontend will be picked up by the replicaSet automatically

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: myreplicaset
  labels:
    app: app
    tier: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      tier: frontend
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      containers:
      - name: main
        image: linuxacademycontent/kubeserve
apiVersion: v1
kind: Pod
metadata:
  name: pod1
  labels:
    tier: frontend
spec:
  containers:
  - name: main
    image: linuxacademycontent/kubeserve

kubectl create -f replicaset.yaml

kubeclt apply -f pod-replica.yaml && kubeclt get pods -o wide

# the pod is terminated as it is managed by the replicaset and there are already 3
pod1 0/1 Terminating 0

The ReplicaSet manages the pod but ideally we should be using a Deployment (adds deploying, scaling, updating)

Remove a pod from a replicaSet by changing its label

StatefulSet Pods are unique, don't replace instead recreate an identical pod when needed.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi
  • service is headless, as every pod is unique (specific traffic to specific pods)
  • volumeClaimTemplate; each pod needs its own storage
kubectl get statefulsets
kubectl describe statefulsets

Hands on

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubeserve
spec:
  replicas: 3
  selector:
    matchLabels:
    app: kubeserve
  template:
    metadata:
      name: kubeserve
      labels:
        app: kubeserve
    spec:
      containers:
        - image: linuxacademycontent/kubeserve:v1
          name: app
  • Create a deployment (record), check status, and verify vesion
kubectl create -f deployment.yaml --record

kubectl rollout status deployments kubeserve
$ deployment "kubeserve" successfully rolled out

kubectl describe deployments/kubeserve | grep Image
$ Image:        linuxacademycontent/kubeserve:v1
  • Make highly available by scaling up

kubectl scale deployment/kubeserve --replicas=5 && kubectl get pods

  • Make it accessible via Service, NodePort

kubectl expose deployment kubeserve --port 80 --target-port 80 --type NodePort && kubectl get services

  • Update image and verify

on separate terminal while true; do curl <http://node-port-ip>; done;

# on main terminal, switch image to v2
kubectl set image deploymens/kubeserve app=linuxacademycontent/kubeserve:v2 --v 6
deployment.extensons/kubeserve image updated

# old and new replicasets
kubectl get replicasets
NAME                   DESIRED   CURRENT   READY   AGE
kubeserve-5589d5cb58   5         5         5       4m4s
kubeserve-968646c97    0         0         0       23m

kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
kubeserve-5589d5cb58-59pds   1/1     Running   0          5m4s
kubeserve-5589d5cb58-94vxx   1/1     Running   0          4m59s
kubeserve-5589d5cb58-dcsdx   1/1     Running   0          5m4s
kubeserve-5589d5cb58-n7nnv   1/1     Running   0          4m59s
kubeserve-5589d5cb58-tqxmp   1/1     Running   0          5m4s

# rollout versions if needed
kubectl rollout history deployment kubeserve
deployment.extensions/kubeserve
REVISION  CHANGE-CAUSE
0
1

Managing Data

Persistent volumes

Storage that is accessible even if the pods are gone and can be reassigned dynamically

gcloud compute disks create mongodb --size=10GB --zone=us-central1-c

without persistent volumes (just storage)

apiVersion: v1
kind: Pod
metadata:
  name: mongodb
spec:
  volumes:
  - name: mongodb-data
    gcePersistentDisk:
      pdName: mongodb
      fsType: ext4
  containers:
  - image: mongo
    name: mongodb
    volumeMounts:
    - name: mongodb-data
      mountPath: /data/db
    ports:
    - containerPort: 27017
      protocol: TCP

kubectl create -f mongodb-pod.yaml && kubectl get pods

insert something via shell kubectl exec -it mongodb mongo

use mystore
db.hello.insert({name: 'hello'})
WriteResult({ "nInserted" : 1 })
db.hello.find()
{ "_id" : ObjectId("5f85864fa02484221b37a386"), "name" : "hello" }
exit

then delete and drain node kubectl delete pod mongodb && kubectl drain gke-kca-lab-default-pool-eb962a64-c7q9

recreate from same yaml and retrieve data with exec on the new node (The data persists)

{ "_id" : ObjectId("5f85864fa02484221b37a386"), "name" : "hello" }

Using persistent volumes

apiVersion: v1
kind: PersistentVolume
metadata:
  name;: mongodb-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  gcePersistentDisk:
    pdName: mongodb
    fsType: ext4

kubectl create -f mongo-persistentvolume.yaml

kubectl get pv
NAME         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
mongodb-pv   10Gi       RWO,ROX        Retain           Available

Access modes

Whether a PersistentVolume can be accessed by multiple nodes concurrently:

  • RWO: Read  write once, only one node at a time
  • ROX: Read only many, multiple nodes can mount for reading
  • RWX: Read write many, multiple nodes can read and write

Read for the node, not the pod! and apply just one mode at a time

Persistent Volume Claims

Request for use by a pod, reserve existing (provisioned) storage for use by a pod

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-pvc
spec:
  resources:
    requests:
      storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: ""

kubectl create -f pvc.yaml

kubectl get pvc
NAME          STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongodb-pvc   Bound    mongodb-pv   10Gi       RWO,ROX                       8s

kubectl get pv
NAME         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
mongodb-pv   10Gi       RWO,ROX        Retain           Bound    default/mongodb-pvc                           14m

Verify status has changed to Bound for both pv and pvc

Persistent Volume claims diagram
apiVersion: v1
kind: Pod
metadata:
  name: mongodb
spec:
  volumes:
  - name: mongodb-data
    persistentVolumeClaim:
      claimName: mongodb-pvc
  containers:
  - image: mongo
    name: mongodb
    volumeMounts:
    - name: mongodb-data
      mountPath: /data/db
    ports:
    - containerPort: 27017
      protocol: TCP

kubectl create -f pvc-pod.yaml && kubectl describe pod mongodb

...
Volumes:
  mongodb-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mongodb-pvc
    ReadOnly:   false

kubectl exec -it mongodb mongo => use mystore => db.hello.find() => {...}

kubectl delete pod mongodb && kubectl delete pvc mongodb-pvc

and since persistentVolumeReclaimPolicy = Retain in pv.yaml (other options Recycle, Delete) the volume is kept and status is Released

kubectl get pv
NAME         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                 STORAGECLASS   REASON   AGE
mongodb-pv   10Gi       RWO,ROX        Retain           Released   default/mongodb-pvc                           26m

Storage Objects

Protects against data loss when a PersistentVolumeClaim is bound. StorageObject prevents pvc from being removed prematurely. If a pv is deleted and pvc are still using it, the PV removal is postponed

kubectl describe pv mongodb-pv
Name:            

  mongodb-pv
Labels:            failure-domain.beta.kubernetes.io/region=us-central1
                   failure-domain.beta.kubernetes.io/zone=us-central1-c
Annotations:       pv.kubernetes.io/bound-by-controller: yes
Finalizers:        [kubernetes.io/pv-protection] # <----- removal is postponed until the PVC is no longer actively
StorageClass:
Status:            Released
Claim:             default/mongodb-pvc
Reclaim Policy:    Retain
Access Modes:      RWO,ROX
VolumeMode:        Filesystem
Capacity:          10Gi
Node Affinity:
  Required Terms:
    Term 0:        failure-domain.beta.kubernetes.io/zone in [us-central1-c]
                   failure-domain.beta.kubernetes.io/region in [us-central1]
Message:
Source:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     mongodb
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>

kubectl describe pvc
Name:          mongodb-pvc
Namespace:     default
StorageClass:
Status:        Pending
Volume:
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection] # <---removal is postponed until the PVC is no longer actively

Finalizers: [kubernetes.io/pv-protection]:

A deleted pvc will remain in state terminating until the pod is deleted to Prevents data loss

Or use StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

kubectl -f create sc-f.yaml && kubectl get sc

NAME                 PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
fast                 kubernetes.io/gce-pd   Delete          Immediate           false                  8s
standard (default)   kubernetes.io/gce-pd   Delete          Immediate           true                   116m

update pvc.yaml

...
  accessModes:
    - ReadWriteOnce
  storageClassName: "fast" #<---- update this
kubectl create -f pvc.yaml && kubectl get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongodb-pvc   Bound    pvc-7b2d28a2-ee88-4a2e-8718-81b72ac668a3   1Gi        RWO            fast           6s

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                 STORAGECLASS   REASON   AGE
mongodb-pv                                 10Gi       RWO,ROX        Retain           Released   default/mongodb-pvc                           48m
pvc-7b2d28a2-ee88-4a2e-8718-81b72ac668a3   1Gi        RWO            Delete           Bound      default/mongodb-pvc   fast                    76s
  • provisioned already with storageClass=fast
  • Different kinds of storage class; including FileSystem
  • local storage and emptyDir available.
  • gitRepo volumes are deprecated

Applications with Persistence Storage

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kubeserve-pvc
spec:
  resources:
    requests:
      storage: 100Mi
  accessModes:
    - ReadWriteOnce
  storageClassName: "fast"

kubectl create -f storageclass-fast.yaml && kubectl create -f kubeserve-pvc.yaml

kubectl get sc && kubectl get pvc
NAME                 PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
fast                 kubernetes.io/gce-pd   Delete          Immediate           false                  37s
standard (default)   kubernetes.io/gce-pd   Delete          Immediate           true                   137m

NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
kubeserve-pvc   Bound    pvc-6a28b319-90dd-4d2c-a210-1fec1cef20a2   1Gi        RWO            fast           36s

# automatically provisioned storage
kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                   STORAGECLASS   REASON   AGE
pvc-6a28b319-90dd-4d2c-a210-1fec1cef20a2   1Gi        RWO            Delete           Bound      default/kubeserve-pvc   fast                    78s

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubeserve
spec:
  replicas: 1
  selector:
    matchLabels:
    app: kubeserve
  template:
    metadata:
      name: kubeserve
      labels:
        app: kubeserve
    spec:
      containers:
        - env:
          - name: app
            value: 1
          image: linuxacademycontent/kubeserve:v1
          name: app
          volumeMounts:
          - mountPath: /data
            name: volume-data
        volumes:
        - name: volume-data
          persistentVoumeClaim:
            claimName: kubeserve-pvc
TODO: diagram PV, PVC, SC

Hands on

1. create a PersistentVolume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: redis-pv
spec:
  storageClassName: ""
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

kubectl create -f redis-pv.yaml

2. create PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redisdb-pvc
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

kubectl create -f redis-pvc.yaml

3. Create a pod image with a mounted volume to mount /data

apiVersion: v1
kind: Pod
metadata:
  name: redispod
spec:
  containers:
  - image: redis
    name: redisdb
    volumeMounts:
    - name: redis-data
      mountPath: /data
    ports:
    - containerPort: 6379
      protocol: TCP
  volumes:
  - name: redis-data
    persistentVolumeClaim:
      claimName: redisdb-pvc

kubectl create -f redispod.yaml

4. Write some data in the containers

kubectl exec -it redispod -- redis-cli

SET server:name: "redis server"
GET server:name
"redis server"
QUIT

5. Delete pod and create redispod2

kubeclt delete pod redispod
# edit name in redispod.yam to redispod2
kubectl create -f redispod.yaml

6. Verify data persists

kubectl exec -it redispod2 -- redis-cli
127.0.0.1:6379> GET server:name
"redis server"
QUIT

Security

Primitives

When accessing the API, requests are evaluated:

  • Normal user:  Private key, user store, usr, pwd. — can't be added via API call
  • ServiceAccount: Manage identity request, they use a Secret used for API authentication
kubectl get serviceaccounts
NAME      SECRETS   AGE
default   1         16d

# create a new sa
kubectl create sa jenkins
serviceaccount/jenkins created

# see internals including secrets 
kubectl get sa jenkins -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2020-10-15T08:11:13Z"
  name: jenkins
  namespace: default
  resourceVersion: "303206"
  selfLink: /api/v1/namespaces/default/serviceaccounts/jenkins
  uid: bb1d17cd-9700-4ea8-a9fb-32b64f12ed7b
secrets:
- name: jenkins-token-4fpvn

# get secret used to authenticate to APO
kubectl get secret jenkins-token-4fpvn
NAME                  TYPE                                  DATA   AGE
jenkins-token-4fpvn   kubernetes.io/service-account-token   3      68s

Every new pod will be assigned to the default ServiceAccount unless specified otherwise

kubectl get pod web-pod -o yaml | grep serviceAccount
serviceAccount: default
serviceAccountName: default

Create  a new pod with a specific ServiceAccount

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  serviceAccountName: 'jenkins' # <----- add this
  containers:
  - image: busybox:1.28.4
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

kubectl get pods busybox -o yaml | grep serviceAccount
serviceAccountName: jenkins

Check location of cluster and credentials via kubectl config view or at ~/.kube/config

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: <https://172.31.31.60:6443> #<------- location (ip-for-master)
  name: kubernetes
contexts:
- context:
    cluster: kubernetes #<---------- context
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

Allow a specific username to access the cluster remotely (not the master, not recommended in prod)

  1. kubectl config set-credentials joe --username=joe --password=password

correct way: generate a public certificate with cfssl see https://kubernetes.io/docs/concepts/cluster-administration/certificates/

copy /etc/kubernetes/pki/ca.crt to remote server via scp

on remoter server kubectl config set-cluster kubernetes --server=https:/ip-for-master:6443 --certificate-authoritaty=ca.crt --embed-certs=true

on remote server kubectl config set-credentials chad --username=joe --password=password

on remote server, create new context and use it kubectl config set-context kubernetes --cluster=kubernetes --user=chad --namespace=defaultkubectl config use-context kubernetes

on remote server, verify via kubectl get nodes

Authentication and Authorization

Authorization: What users are allowed to do, configured via RBAC (Role based access control)

  • Roles* and ClusterRoles+: What can be performed in which resouce
  • RoleBindings* and ClusterRoleBindings+ : Who can do it
  • namespace level
  • cluster level
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: service-reader
  namespace: web
rules:
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list"]

kubectl create namespace web && kubectl create -f role.yamlkubectl create rolebinding test --role=service-reader --serviceaccount=web:defaut -n web

rolebinding can bind the role to more than one serviceAccount, user, group

list the services in the web namespace, from the default namespace

  • separate console kubectl proxy
  • main console
curl localhost:8001/api/v1/namespaces/web/services
{
"kind": "ServiceList",
"apiVersion": "v1",
"metadata": {
"selfLink": "/api/v1/namespaces/web/services",
"resourceVersion": "321662"
},
"items": []
}root@f2c3d18a641c:/home/cloud_user#

ClusterRole for viewing persistentVolumes

cat curl-pod
apiVersion: v1
kind: Pod
metadata:
  name: curlpod
  namespace: web
spec:
  containers:
  - image: tutum/curl
    command: ["sleep", "999999"]
    name: main
  - image: linuxacademycontent/kubectl-proxy
    name: proxy
  restartPolicy: Always
kubectl create clusterrole pv-reader --verb=get,list --resource=persistentvolumes && 
kubectl create clusterrolebinding pv-test --clusterrole=pv-reader --serviceaccount=web:default

clusterrole.rbac.authorization.k8s.io/pv-reader created
clusterrolebinding.rbac.authorization.k8s.io/pv-test created

# access at the cluster level
kubectl create -f curl-pod.yaml
kubectl get pods -n web
NAME      READY   STATUS    RESTARTS   AGE
curlpod   2/2     Running   0          106s

kubectl exec -it curlpod -n web -- sh
~ curl localhost:8001/api/v1/persistentvolumes
{
  "kind": "PersistentVolumeList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/persistentvolumes",
    "resourceVersion": "324617"
  },
  "items": []
}#

Can access the cluster level roles with clusterRole and clusterRoleBinding

Network policies

Govern how pods communicate with each other (default is open and accessible), matched by label selectors or namespace

  • ingress rules: who can access the pod
  • egress rules: what pods can use access (may use block range of IP)

Plugin is needed, but calico already has it. Other option is canal.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
kubectl create -f deny-all-np.yaml  
kubectl run nginx --image=nginx --replicas=2
kubectl expose deployment nginx --port=80
kubectl run busybox --rm -it --image=busybox -- /bin/sh
# in the busybox pod
~ wget --spider --timeout=1 nginx
Connecting to nginx (10.103.197.57:80) #<--- can resolve nginx to 10.103... OK
wget: download timed out               #<--- has been denied by net policy OK

2 pods talking to each other via NetworkPolicy (pod selector)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-netpolicy
spec:
  podSelector:
    matchLabels:
      app: db
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: web  #<--- pods with app=web can talk to app=db pods using port 5432
     ports:
     - port: 5432

using a namespace selector

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ns-netpolicy
spec:
  podSelector:
    matchLabels:
      app: db
  ingress:
  - from:
    - namespaceSelector:  #<--- pods within namespace web, can access pod wih app=db
        matchLabels:
          tenant: web 
     ports:
     - port: 5432

using IP block specification

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ipblock-netpolicy
spec:
  podSelector:
    matchLabels:
      app: db
  ingress:
  - from:
    - ipBlock:
        cidr: 192.168.1.0/24

kubectl get netpol

egress policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: egress-netpolicy
spec:
  podSelector:
    matchLabels:
      app: web # <--- web can communicate wit pods labeled db on 5432
  egress: #<---
  - to:   #<---
    - podSelector:
        matchLabels:
          app: db  
     ports:
     - port: 5432

TLS Certificates

CA used to generate a TLS certificate and authentica with the API server.

CA certificate bundle auto mounted into pods with default service account to /var/run/secrets/kubernetes.io/serviceaccount

kubectl exec nginx -- ls /var/run/secrets/kubernetes.io/serviceaccount
ca.crt
namespace
token
  1. Generate a CSR (Certificate signing request)
  • ca-csr.json
{
  "CN": "my-pod.my-namespace.pod.cluster.local",
  "hosts": [
    "172.168.0.24",
    "10.0.34.2",
    "my-svc.my-namespace.svc.cluster.local",
    "my-pod.my-namespace.pod.cluster.local"
  ],
  "key": {
    "algo": "ecdsa",
    "size": 256
  }
}
  • csr.yaml
# install
wget <https://pkg.cfssl.org/R1.2/cfssl_linux-amd64> <https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64>
chmod +x cfssl*
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
mv cfssl_linux-amd64 /usr/local/bin/cfssl
cfssl version
$~ Version: 1.2.0
$~ Revision: dev
$~ Runtime: go1.6

# generate
cfssl genkey ca-csr.json  | cfssljson -bare server
ls server*
$~ server.csr
$~ server-key.pem

# create k8s csr object, note command line substition in request
cat <<EOF | kubectl create -f -
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: pod-csr.web
spec:
  signerName: kubernetes.io/kube-apiserver-client
  groups:
  - system:authenticated
  request: $(cat server.csr | base64 | tr -d '\\n')
  usages:
  - digital signature
  - key encipherment
  - server auth
EOF

$~ certificatesigningrequest.certificates.k8s.io/pod-csr.web created
  1. Verify CSR object; pending state until admin approves
kubectl get pod-csr.web
NAME          AGE   SIGNERNAME                            REQUESTOR          CONDITION
pod-csr.web   4m    kubernetes.io/kube-apiserver-client   kubernetes-admin   Pending

kubectl describe pod-csr.web
Name:               pod-csr.web
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Tue, 20 Oct 2020 04:04:45 +0000
Requesting User:    kubernetes-admin
Signer:             kubernetes.io/kube-apiserver-client
Status:             Pending
Subject:
  Common Name:    my-pod.my-namespace.pod.cluster.local
  Serial Number:
Subject Alternative Names:
         DNS Names:     my-svc.my-namespace.svc.cluster.local
                        my-pod.my-namespace.pod.cluster.local
         IP Addresses:  172.168.0.24
                        10.0.34.2
Events:  <none>

kubectl certificate approve pod-csr.web

pod-csr.web   6m15s   kubernetes.io/kube-apiserver-client   kubernetes-admin   Approved

kubectl get csr pod-csr.web -o yaml

apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  creationTimestamp: "2020-10-20T04:04:45Z"
  managedFields:
  - apiVersion: certificates.k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:groups: {}
        f:request: {}
        f:signerName: {}
        f:usages: {}
      f:status:
        f:conditions: {}
    manager: kubectl
    operation: Update
    time: "2020-10-20T04:10:52Z"
  name: pod-csr.web
  resourceVersion: "350881"
  selfLink: /apis/certificates.k8s.io/v1beta1/certificatesigningrequests/pod-csr.web
  uid: b8823cac-3afb-4547-a217-46df2757b92e
spec:
  groups:
  - system:masters
  - system:authenticated
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQllqQ0NBUWdDQVFBd01ERXVNQ3dHQTFVRUF4TWxiWGt0Y0c5a0xtMTVMVzVoYldWemNHRmpaUzV3YjJRdQpZMngxYzNSbGNpNXNiMk5oYkRCWk1CTUdCeXFHU000OUFnRUdDQ3FHU000OUF3RUhBMElBQkhrOG41OEU0eHJuCnh6NU9FcENLRExUSHZiaW1NaUZ0anNLL2NPcVpJTUtnQnlhamVIcDBmOVZrZEJPM09hby9RbTlMaHBYcU5qVGoKZVhWN1pKOEtOclNnZGpCMEJna3Foa2lHOXcwQkNRNHhaekJsTUdNR0ExVWRFUVJjTUZxQ0pXMTVMWE4yWXk1dAplUzF1WVcxbGMzQmhZMlV1YzNaakxtTnNkWE4wWlhJdWJHOWpZV3lDSlcxNUxYQnZaQzV0ZVMxdVlXMWxjM0JoClkyVXVjRzlrTG1Oc2RYTjBaWEl1Ykc5allXeUhCS3lvQUJpSEJBb0FJZ0l3Q2dZSUtvWkl6ajBFQXdJRFNBQXcKUlFJZ01aODZGWXpQcTQ5VDZoYTFFaXRrZWFNbUJzMXQ2bmcrVzlGTlhNb1k0VFFDSVFDZU01NlppVmJ2UVhrVAoyOWQwWllua05zZ09pR3Z3ZDRvdjRGT2NmQ1dmT3c9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
  signerName: kubernetes.io/kube-apiserver-client
  usages:
  - digital signature
  - key encipherment
  - server auth
  username: kubernetes-admin
status:
  conditions:
  - lastUpdateTime: "2020-10-20T04:10:52Z"
    message: This CSR was approved by kubectl certificate approve.
    reason: KubectlApprove
    type: Approved

retrieve certificate back

kubectl get csr pod-csr.web -o jsonpath='{.spec.request}' | base64 --decode > server.crt
-----BEGIN CERTIFICATE REQUEST-----
MIIBYjCCAQgCAQAwMDEuMCwGA1UEAxMlbXktcG9kLm15LW5hbWVzcGFjZS5wb2Qu
Y2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABHk8n58E4xrn
xz5OEpCKDLTHvbimMiFtjsK/cOqZIMKgByajeHp0f9VkdBO3Oao/Qm9LhpXqNjTj
eXV7ZJ8KNrSgdjB0BgkqhkiG9w0BCQ4xZzBlMGMGA1UdEQRcMFqCJW15LXN2Yy5t
eS1uYW1lc3BhY2Uuc3ZjLmNsdXN0ZXIubG9jYWyCJW15LXBvZC5teS1uYW1lc3Bh
Y2UucG9kLmNsdXN0ZXIubG9jYWyHBKyoABiHBAoAIgIwCgYIKoZIzj0EAwIDSAAw
RQIgMZ86FYzPq49T6ha1EitkeaMmBs1t6ng+W9FNXMoY4TQCIQCeM56ZiVbvQXkT
29d0ZYnkNsgOiGvwd4ov4FOcfCWfOw==
-----END CERTIFICATE REQUEST-----

Secure images

Images come from the container registry (docker hub by default)

AWS, Azure, etc, Custom also  possible

By logging with private Docker hub account after login, config is updated

# "~/.docker/config.json" [noeol] 10L, 174C                                                                                                                                               1,1           All
{
        "auths": {
                "<https://index.docker.io/v1/>": {
                        "auth": "Y2R2ZWw6WjIxxxxxZXSlczRFE2QlQ="
                }
        },
        "HttpHeaders": {
                "User-Agent": "Docker-Client/19.03.13 (linux)"
        }

docker images && docker pull image busybox

tag and push (after logging in to containter registry)

docker tag busybox:1.28.4 [privatecr.io/busybox:latest](<http://privatecr.io/busybox:latest>) & docker push privatecr.io/busybox:latest

specify a secret container registry named acr (avoid using unverified registries)

kubectl create secret docker-registry acr --docker-server=https://privatecr.io --docker-username=usr --docker-password='xxx' --docker-email=usr@mail.com

set service account to use new acr

kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "acr"}]}'

serviceaccount/default patched

kubectl get sa default -o -yaml

apiVersion: v1
imagePullSecrets: #<----- new
- name: acr #<----- new
kind: ServiceAccount
metadata:
  creationTimestamp: "2020-09-29T06:59:00Z"
  name: default
  namespace: default
  resourceVersion: "379"
  selfLink: /api/v1/namespaces/default/serviceaccounts/default
  uid: 8ccd9095-db8b-4f3a-892f-6e5c65caf419
secrets:
- name: default-token-6sw8s

in new pods:

...
spec:
  containers:
  - name: busybox
    image: privatecr.io/busybox:latest #<---- good practice to keep the cr but optional
    imagePullPolicy: Always #<--- always get the image, even if on disk
...

Security contexts

Access control for a pod or containers, access to a file or process.

Applicable to all containers in a pod, as described in its yaml

kubectl run pod-default --image alpine --restart Never -- /bin/sleep 999999

# running as user root
kubectl exec pod-default -- id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)

running a pod as a different user

apiVersion: v1
kind: Pod
metadata:
  name: alpine-user-context
spec:
  containers:
  - name: main
    image: alpine
    command: ["/bin/sleep", "999999"]
    securityContext:
    runAsUser: 405
#   runAsNonRoot: true  #<---- may no work if intented as root
#   privileged: true  #<---- grant access to pod level (devices)
kubectl exec alpine-user-context -- id
uid=405(guest) gid=100(users)

Running container in priviledged mode with securityContext privileged: true

kubectl exec -it privileged-pod -- ls /dev | wc -l
38 #<--- only 4 if not privileged
...
securityContext:
  capabilities:
    add:
    - ["NET_ADMIN", "SYS_TIME"]

Add specific capabilities (modify kernel):

kubectl exec -it kernel-pod -- date +%T -s "10:00:00"

Removing specific capabilities (e.g., ownership)

...
securityContext:
  capabilities:
    drop:
    - CHOWN

Writing only to volumes (read only file system)

apiVersion: v1
kind: Pod
metadata:
  name: readonly-pod
spec:
  containers:
  - name: main
    image: alpine
    command: ["/bin/sleep", "999999"]
    securityContext:
      readOnlyRootFileSystem: true
    volumeMounts:
    - name: my-volume
      mountPath: /volume
      readOnly: false
  volumes:
  - name: my-volume
    emptyDir:
kubectl exec -it readonly-pod -- touch /file.md
touch: /file.md: Read-only file system
command terminated with exit code

kubectl exec -it readonly-pod -- touch /volume/file.md
kubectl exec -it readonly-pod -- ls /volume
file.md

Security context at the pod level

apiVersion: v1
kind: Pod
metadata:
  name: group-context
spec:
  securityContext:
    fsGroup: 555 #<--- default group is 555
    supplementalGroups: [666, 777]
  containers:
  - name: first
    image: alpine
    command: ["/bin/sleep", "999999"]
    securityContext:
      runAsUser: 111  #<--- first container run as user 111
    volumeMounts:
    - name: shared-volume
      mountPath: /volume
      readOnly: false
  - name: second
    image: alpine
    command: ["/bin/sleep", "999999"]
    securityContext:
      runAsUser: 2222   #<--- second container run as user 2222
    volumeMounts:
    - name: shared-volume
      mountPath: /volume
      readOnly: false
  volumes:
  - name: shared-volume
    emptyDir:

default user and group as specified IN the volume, otherwise group is root

kubectl exec -it group-context -c first -- sh

/ $ id
uid=111 gid=0(root) groups=555,666,777

/ $ touch /volume/file && ls -l /volume
total 0
-rw-r--r--    1 111      555              0 Oct 20 07:33 file

touch /tmp/file && ls -l /tmp
total 0
-rw-r--r--    1 111      root             0 Oct 20 07:35 file

Securing persistent key value stores

Passed info into containers might be sensitive and need to be persistent

Use secrets: maps with key values

Not best practice to expose secrets in environment variables

Kept on in memory, not written to physical storage (tmpfs: in memory storage)

Secrets are decoded before use

kubectl get secrets

use kubectl describe to check secrets in a pod

Volumes:
  default-token-6sw8s:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6sw8s

...
Mounts:
  /var/run/secretes/kubernetes.io/serviceaccount from default-823js

kubectl describe secret default-token-6sw8s

Name:         default-token-6sw8s
Namespace:    default
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: default
              kubernetes.io/service-account.uid: 8ccd9095-db8b-4f3a-892f-6e5c65caf419

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1025 bytes
namespace:  7 bytes
token:      eyJhbGciOiJ...

Create a Secret  certificate and a key

openssl genrsa -out https.key 2048
openssl req -new -x509 -key https.key -out https.cert -days 3650 -subj /CN=www.example.com
touch file

kubectl create secret generic example-https --from-file=https.key --from-file=https.cert --from-file=file

kubectl get secrets example-https -o yaml

apiVersion: v1
data:
  file: ""
  https.cert: LS0tLS1CRUdJTiBDRVJU==  #<--- base64 encoded
  https.key: LS0tLS1CRUdJTiBDRVJU==   #<--- base64 encoded
kind: Secret
metadata:
  creationTimestamp: "2020-10-20T07:51:34Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:file: {}
        f:https.cert: {}
        f:https.key: {}
      f:type: {}
    manager: kubectl
    operation: Update
    time: "2020-10-20T07:51:34Z"
  name: example-https
  namespace: default
  resourceVersion: "382860"
  selfLink: /api/v1/namespaces/default/secrets/example-https
  uid: 6a395fdd-ef7e-47ea-87cd-34c5e7101a59
type: Opaque

Mount secret on the pod

- image: nginx:alpine
  name: web-server
  volumeMounts:
  - name: certs
    mountPath: /etc/nginx/certs #<-- where is mounted
    readOnly: true

volumes: 
...
- name: certs
  secret:
    secretName: example-https

kubectl exec example-https -c web-server -- mount | grep cert tmpfs on /etc/nginx/certs type tmpfs (ro, relatime)

tmpfs: not written on disk

Hands on

Create a clusterRole to access a PersistentVolume

  1. Check persistent volumes
kubectl get pv
NAME          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE
database-pv   1Gi        RWO            Retain           Available           local-storage            55m
  1. Create ClusterRole and ClusterRoleBinding
kubectl create clusterrole pv-reader --resource=pv --verb=get,list
clusterrole.rbac.authorization.k8s.io/pv-reader created

kubectl create clusterrolebinding pv-test --clusterrole=pv-reader --serviceaccount=web:default
clusterrolebinding.rbac.authorization.k8s.io/pv-test created
  1. create and access pod to curl resource kubectl -f create pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: curlpod
  namespace: web
spec:
  containers:
  - name: main
    image: tutum/curl
    command: ["sleep", "3600"]
  - name: proxy
    image: linuxacademycontent/kubectl-proxy
  restartPolicy: Always
kubectl exec -it curlpod -n web -- sh
# defaults to main container
~ curl http:localhost:8001/api/v1/persistentvolumes
{
  "kind": "PersistentVolumeList",
  "apiVersion": "v1",
...

Monitoring

Components

Check application resource usage at pod, node cluster levels

Increase performance and reduce bottlenecks

Metrics server: Accessed via API, query each kubelet for CPU, Mem usage

Check metrics configuration

kubectl get --raw /apis/metrics.k8s.io

kubectl top collect info about nodes, current usage of all pods (describe only shows limits)

  • node
  • pod
  • --all-namespaces
  • -n NAMESPACE
  • -l key=value
  • name-of-pod
  • name-of-pod --containers
kubectl top node
NAME                           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ed092a70c01c.mylabserver.com   154m         7%     1190Mi          31%
f2c3d18a641c.mylabserver.com   196m         9%     1602Mi          42%
fc9ccdd4e21c.mylabserver.com   132m         6%     1194Mi          31%

kubectl top pod
NAME                         CPU(cores)   MEMORY(bytes)
alpine-user-context          0m           0Mi
configmap-pod                0m           0Mi
dns-example                  0m           7Mi
group-context                0m           1Mi

kubectl top pod --all-namespaces
...

kubectl top pod -n kube-system
...

kubectl top pod -l env=dev
...

kubectl top pod nginx
NAME    CPU(cores)   MEMORY(bytes)
nginx   0m           2Mi

kubectl top pods pod-example --containers
POD             NAME     CPU(cores)   MEMORY(bytes)
pod-example     second   0m           0Mi
pod-example     first    0m           0Mi

Application in a cluster

Detect resource utilization automatically

  • Liveness probe: Check if container is alive (if fails, restarts container)

httpGet, tcpSocket, exec (run arbiratry code)

  • Readiness probe: Ready to receive client requests (if fails, no restart, remove from endpoints)
apiVersion: v1
kind: Pod
metadata:
  name: liveness
spec:
  containers:
  - image: linuxacademycontent/kubeserve
    name: kubeserve
    livenessProbe:
      httpGet:
        path: /
        port: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    readinessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5

if readiness probe fails, no pod created and removed from endpoints kubectl get ep

Cluster component logs

TODO: Cluster logs diagram

Check /var/log/containers may consume all disk space, use sidecar pattern with a logger.

kubelet (process) logs at /var/log

Use the sidecar pattern to mount for each container and query logs separately

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
  - image: busybox
    name: count
    args:
    - /bin/sh
    - -c
    - >
      i=0;
      while true;
      do
        echo "$i: $(date)" >> /var/log/1.log;
        echo "$(date) $i" >> /var/log/2.log;
        i=$((i+1));
        sleep 1;
      done
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  - name: count-log-1
    image: busybox
    args: [/bin/sh, -c, 'tail -n+1 -f /var/log/1.log']
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  - name: count-log-2
    image: busybox
    args: [/bin/sh, -c, 'tail -n+1 -f /var/log/2.log']
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  volumes:
  - name: varlog
    emptyDir: {}

Use kubectl logs

kubectl logs counter count-log-1 && kubectl logs counter count-log-2

Log rotation may be needed, but not supported in k8s

Application Logs

Containers write logs to stdout, but runtime (Docker) redirect to files. K8s can pick up these and make them accessible via:

kubectl logs

  • pod-name
  • pod-name -c container-name
  • pod-name --all-containers=true
  • -l run=nginx

Logs from a terminated pod: kubectl logs -p - c container pod

  • streaming  flag -f
  • tail --tail=20
  • last hour --since=1h
  • from deployment kubectl logs deployment/nginx
  • from container in deployment kubectl logs deployment/nginx -c nginx > file.txt

Hands on

  1. Find failing Pod kubectl get pods --all-namespaces
  2. Find logs and save to file for that pod kubectl logs pod4 -n web > error4.log

Failure identification & troubleshooting

Ideas for troubleshooting:

  • check pods for event and condiftions
  • check coredns pod
  • listing events for pod
  • decribe pod for image pull error

Application

Application developer should put all info needed in log files

AppComErr, CrashKoopBackoff, FailedMountErr, Pending, RbacErr, ImagePullErr, DiscoveryErr

k8s allows a termination message to be written to a file

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - image: busybox
    name: main
    command:
    - sh
    - -c
    - 'echo "It''s enough" > /var/termination-reason; exit 1'
    terminationMessagePath: /var/termination-reason #<--- file to write to

kubectl describe pod my-pod

...
Last State:  Terminated
      Reason:    Error
      Message:   It's enough

Built-in liveness probe in kubernetes

livenessProbe:
  httpGet:
    path: /healthz  #<--- not all images have a healthz endpoint
    port: 8081

Troubleshooting checklist

  1. Find pods with Error status
  2. kubectl describe and check events, check state.reason
  • On running pods modify some attributes with kubectl editotherwise recreate (5)
  1. kubectl logs my-pod
  2. kubectl get po my-pod -o yaml and scan for reasons or inconsistencies
  3. If not a deployment, recreate the pod kubeclt get po my-pod -o yaml --export > file.yaml

Control plane

		control plane
		+------------------------------------------+
		|                                          |
		|   +------------+           +------+      |
		|   |            |           |      |      |
		|   | API server +---------->+ etcd |      |
		|   |            |           |      |      |
		|   +-----+-+----+           +------+      |
		|         ^ ^                              |
		|         | +-------------------+          |
		|         |                     |          |
		|   +-----+-------+     +-------+-----+    |
		|   |scheduler    |     | controller  |    |
		|   |             |     | manager     |    |
		|   +-------------+     +-------------+    |
		|                                          |
		+------------------------------------------+

Considerations

  • choose infra vendor with SLA guarantees
  • take snapshot from storage
  • use deployments and services to load balance
  • use federation to join cluster together
  • review events & schedulers
  • kubectl get events -n kube-system
  • kubectl logs -n kube-system [kube-scheduler-f2c3d18a641c.mylabserver.com](<http://kube-scheduler-f2c3d18a641c.mylabserver.com/>)
  • services starte when the server starts automatically
  • systemctl start docker && systemctl enable kubelet
  • disable swap to run kubelet swapoff -a && vim /etc/fstab > comment swap line
  • firewall issues, disable it systemctl disable firewalld
  • Public IPs in kube config files? issues afterward, tie to static IP view
  • used default from kubeadm: see kubectl config view
  • Make API server HA (replicate)

Worker node

checklist

  1. Check for faulty nodes kubectl get no
  2. Review kubectl describe no node_name
  3. Find details like IP address with kubectl get nodes -o wide
  4. Recreate a node if irresponsive and troubleshooting fails
  5. View journalctl -u kubelet logs
  6. May check /var/log/syslog for more details on kubelet  status

Networking

Intercluster communication and services

  • Create deployment

kubectl run hostnames --image=[k8s.gcr.io/serve_hostname](<http://k8s.gcr.io/serve_hostname>) --replicas=3

  • Expose ports

kubectl expose pod hostnames --port=80 --target-port=9376

  • Run bbox to test dns

kubectl run -it --rm --restart=Never busybox3 --image=busybox:1.28 -- sh

### checking for dns resolution, hostnames and default

/ # nslookup hostnames
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      hostnames
Address 1: 10.104.243.218 hostnames.default.svc.cluster.local
/ #

/ # nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
/ #
  • Verify target port, container port, protocol kubectl get svc hostnames -o json
...
    "spec": {
        "clusterIP": "10.104.243.218",
        "ports": [
            {
                "port": 80,
                "protocol": "TCP",
                "targetPort": 9376
  • check endpoints with kubectl get ep && use wget to check access via target port
NAME         ENDPOINTS                                            AGE
hostnames    10.244.79.31:9376                                    10m
kubernetes   172.31.31.60:6443                                    22d
kubeserve    10.244.194.194:80,10.244.194.196:80,10.244.79.7:80   9d
nginx        10.244.194.241:80,10.244.79.8:80                     6d16h
  • check kube-proxy runs on the nodes ps auxw | grep kube-proxy
  • check iptables are up to date in kube-proxy
kubectl exec -it kube-proxy-29v2s -n kube-system -- sh
> iptables-save | grep hostnames
> -A KUBE-SEP-54E5H2EQMA5FL7LP -s 10.244.79.31/32 -m comment --comment "default/hostnames:" -j KUBE-MARK-MASQ
...
  • You may replace network plugin, e.g., from flannel to calico via kubectl delete and apply, check docs.

Hands-on

1. Identifying broken pods: kubectl get po --all-namespaces


web nginx-856876659f-2fskf 0/1 ErrImagePull 0 3m14s
web nginx-856876659f-5k8lr 0/1 ImagePullBackOff 0 3m15s
web nginx-856876659f-nfxt7 0/1 ErrImagePull 0 3m14s
web nginx-856876659f-v7mjr 0/1 ImagePullBackOff 0 3m14s
web nginx-856876659f-vtmxd 0/1 ImagePullBackOff 0 3m14s

2. Find failure reason: kubectl describe po nginx-xxxx -n web

error
Failed to pull image "nginx:191

solved by:

# check events
kubectl describe po nginx-xxxx -n web 

# edit running pod spec image
kubectl edit po nginx-xxxx -n web
#OR (better)
kubectl edit deploy nginx -n web

#switch yaml to image=nginx
:wq

3. Verify replica set kubectl get rs -w web

4. Access pod directly using interactive busybox pod

kubectl run busybox --image=busybox --rm -it --restartNever=true -- sh
wget -qO- 10.244.1.2:80  #<--- get ip using kubectl get -o wide