CKA 模拟题目的学习，虽然比较难。但是感觉都是值得学习的。

这里做完一次模拟之后用这篇文章来复习一下模拟题目

考前技巧

alias k=kubectl                         # will already be pre-configured

export do="--dry-run=client -o yaml"    # k create deploy nginx --image=nginx $do

export now="--force --grace-period 0"   # k delete pod x $now

vimrc 编辑，

set tabstop=2
set expandtab
set shiftwidth=2

常见的资源缩写

deploy
ds
sts
sa

正题

Q1

获取当前拥有哪些kube的环境。这个基本是送分题。第二个猪一下sed 的用法。****

kubectl config get-contexts -o name > /opt/course/1/contexts
cat ~/.kube/config | grep current | sed -e "s/current-context: //"

Q2

Create a single Pod of image httpd:2.4.41-alpine in Namespace default. The Pod should be named pod1 and the container should be named pod1-container. This Pod should only be scheduled on a controlplane node, do not add new labels any nodes.

创建一个图像httpd的单个吊舱：2.4.41-Alpine在名称空间默认情况下。 POD应命名为POD1，并且该容器应命名为POD1-container。该吊舱只能安排在控制平面节点上，请勿添加新标签任何节点。

这里就使用了节点选择器的知识了，使用 node selector，以及 tolerate。一个是选择节点，还有一个是容忍节点上的污点，用下面的代码来查看节点的信息。得到其label 进行 select 以及污点，来进行容忍。

k get node # find controlplane node
k describe node cluster1-controlplane1 | grep Taint -A1 # get controlplane node taints
k get node cluster1-controlplane1 --show-labels # get controlplane node labels

使用下面的命令进行pod来dry run，得到基础的yaml 文件来进行删减，在yaml 里面添加下面的容忍度和节点选择。

k run pod1 --image=httpd:2.4.41-alpine $do > 2.yaml

## 
  tolerations:                                 # add
  - effect: NoSchedule                         # add
    key: node-role.kubernetes.io/control-plane # add
  nodeSelector:                                # add
    node-role.kubernetes.io/control-plane: ""  # add

拓展

为 node1 设置 taint：

kubectl taint nodes node1 key1=value1:NoSchedule
kubectl taint nodes node1 key1=value1:NoExecute
kubectl taint nodes node1 key2=value2:NoSchedule

删除上面的 taint：

kubectl taint nodes node1 key1:NoSchedule-
kubectl taint nodes node1 key1:NoExecute-
kubectl taint nodes node1 key2:NoSchedule-

查看 node1 上的 taint：

kubectl describe nodes node1

effect 控制着副作用的行为有

NoExecute
NoSchedule
PreferNoSchedule

如果在给节点添加上述污点之前，该 Pod 已经在上述节点运行，那么它还可以继续运行在该节点上

如果给一个节点添加了一个 effect 值为 NoExecute 的污点，则任何不能忍受这个污点的 Pod 都会马上被驱逐，任何可以忍受这个污点的 Pod 都不会被驱逐。但是，如果 Pod 存在一个 effect 值为 NoExecute 的容忍度指定了可选属性 tolerationSeconds 的值，则表示在给节点添加了上述污点之后， Pod 还能继续在节点上运行的时间。例如，

Q3

Use context: kubectl config use-context k8s-c1-H

There are two Pods named o3db-* in Namespace project-c13. C13 management asked you to scale the Pods down to one replica to save resources.

查看POD 情况可以看到末尾的有序命名，那么可以看到是使用 stateful set 来进行管理的。

➜ k -n project-c13 get pod | grep o3db
o3db-0                                  1/1     Running   0          52s
o3db-1                                  1/1     Running   0          42s

之后列出全部的工作负载，可以找到o3 是一个 statefulset

➜ k -n project-c13 get deploy,ds,sts | grep o3db
statefulset.apps/o3db   2/2     2m56s

之后直接进行Scale 就可以了

➜ k -n project-c13 scale sts o3db --replicas 1
statefulset.apps/o3db scaled

➜ k -n project-c13 get sts o3db
NAME   READY   AGE
o3db   1/1     4m39s

Q4

Do the following in Namespace default. Create a single Pod named ready-if-service-ready of image nginx:1.16.1-alpine. Configure a LivenessProbe which simply executes command true. Also configure a ReadinessProbe which does check if the url http://service-am-i-ready:80 is reachable, you can use wget -T2 -O- http://service-am-i-ready:80 for this. Start the Pod and confirm it isn’t ready because of the ReadinessProbe.

Create a second Pod named am-i-ready of image nginx:1.16.1-alpine with label id: cross-server-ready. The already existing Service service-am-i-ready should now have that second Pod as endpoint.

Now the first Pod should be in ready state, confirm that.

这个是考的POD 的几个探针的功能。按照题目要求配置 liveness 和 readiness 的探针

    livenessProbe:                                      # add from here
      exec:
        command:
        - 'true'
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - 'wget -T2 -O- http://service-am-i-ready:80'   # to here

因为svc是已经创建了，所以就可以直接手动的 run 一个pod 带上 label 就可以了。

k run am-i-ready --image=nginx:1.16.1-alpine --labels="id=cross-server-ready"

拓展

这里记录下两种探针的区别

LivenessProbe（存活探针）： 存活探针主要作用是，用指定的方式进入容器检测容器中的应用是否正常运行，如果检测失败，则认为容器不健康，那么 Kubelet 将根据 Pod 中设置的 restartPolicy （重启策略）来判断，Pod 是否要进行重启操作，如果容器配置中没有配置 livenessProbe 存活探针，Kubelet 将认为存活探针探测一直为成功状态。
ReadinessProbe（就绪探针）： 用于判断容器中应用是否启动完成，当探测成功后才使 Pod 对外提供网络访问，设置容器 Ready 状态为 true，如果探测失败，则设置容器的 Ready 状态为 false。对于被 Service 管理的 Pod，Service 与 Pod、EndPoint 的关联关系也将基于 Pod 是否为 Ready 状态进行设置，如果 Pod 运行过程中 Ready 状态变为 false，则系统自动从 Service 关联的 EndPoint 列表中移除，如果 Pod 恢复为 Ready 状态。将再会被加回 Endpoint 列表。通过这种机制就能防止将流量转发到不可用的 Pod 上。

探针支持下面的集中探测方式：

ExecAction 执行命令成功即成功
HttpGet 发送http请求，成功就成功
TcpSocketAction 发起TCP建连成功就成功

对于探测失败两种探针也有不同的动作，ReadinessProbe 和 LivenessProbe 是使用相同探测的方式，只是探测后对 Pod 的处置方式不同：

ReadinessProbe： 当检测失败后，将 Pod 的 IP:Port 从对应 Service 关联的 EndPoint 地址列表中删除。
LivenessProbe： 当检测失败后将杀死容器，并根据 Pod 的重启策略来决定作出对应的措施。

Q5 Question 5 | Kubectl sorting

There are various Pods in all namespaces. Write a command into /opt/course/5/find_pods.sh which lists all Pods sorted by their AGE (metadata.creationTimestamp).

Write a second command into /opt/course/5/find_pods_uid.sh which lists all Pods sorted by field metadata.uid. Use kubectl sorting for both commands.

简单的命令使用的考察，在sort by 里面使用正确的列就可以了

kubectl get pod -A --sort-by=.metadata.creationTimestamp
kubectl get pod -A --sort-by=.metadata.uid

Question 6 | Storage, PV, PVC, Pod volume

Create a new PersistentVolume named safari-pv. It should have a capacity of 2Gi, accessMode ReadWriteOnce, hostPath /Volumes/Data and no storageClassName defined.

Next create a new PersistentVolumeClaim in Namespace project-tiger named safari-pvc . It should request 2Gi storage, accessMode ReadWriteOnce and should not define a storageClassName. The PVC should bound to the PV correctly.

Finally create a new Deployment safari in Namespace project-tiger which mounts that volume at /tmp/safari-data. The Pods of that Deployment should be of image httpd:2.4.41-alpine.

考察基础的pv和pvc 的创建知识，虽然知道怎么搞出来，但是yaml的模板获取只能到doc中去。考试浏览器的体验不怎样。复制粘贴比较耗时。

# 6_pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
 name: safari-pv
spec:
 capacity:
  storage: 2Gi
 accessModes:
  - ReadWriteOnce
 hostPath:
  path: "/Volumes/Data"

# 6_pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: safari-pvc
  namespace: project-tiger
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
     storage: 2Gi

使用命令来生成deploy的模板。

k -n project-tiger create deploy safari \
  --image=httpd:2.4.41-alpine $do > 6_dep.yaml

在spec 部分来实现volume的挂载

    spec:
      volumes:                                      # add
      - name: data                                  # add
        persistentVolumeClaim:                      # add
          claimName: safari-pvc                     # add
      containers:
      - image: httpd:2.4.41-alpine
        name: container
        volumeMounts:                               # add
        - name: data                                # add
          mountPath: /tmp/safari-data               # add

之后describe 可以查看volume的挂载情况

➜ k -n project-tiger describe pod safari-5cbf46d6d-mjhsb  | grep -A2 Mounts:   
    Mounts:
      /tmp/safari-data from data (rw) # there it is
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-n2sjj (ro)

Question 7 | Node and Pod Resource Usage

The metrics-server has been installed in the cluster. Your college would like to know the kubectl commands to:

show Nodes resource usage
show Pods and their containers resource usage
Please write the commands into /opt/course/7/node.sh and /opt/course/7/pod.sh.

这个是个送分题，最多再加一个 sort by

kubectl top node
# 下面这个展示pod内每个容器的，需要加一个参数
kubectl top pod --containers=true

Question 8 | Get Controlplane Information

Ssh into the controlplane node with ssh cluster1-controlplane1. Check how the controlplane components kubelet, kube-apiserver, kube-scheduler, kube-controller-manager and etcd are started/installed on the controlplane node. Also find out the name of the DNS application and how it’s started/installed on the controlplane node.

Write your findings into file /opt/course/8/controlplane-components.txt. The file should be structured like:
# /opt/course/8/controlplane-components.txt
kubelet: [TYPE]
kube-apiserver: [TYPE]
kube-scheduler: [TYPE]
kube-controller-manager: [TYPE]
etcd: [TYPE]
dns: [TYPE] [NAME]
Choices of [TYPE] are: not-installed, process, static-pod, pod

检查哥哥组件的安装方式，这个没做出来。不知道该用何种形式查看。解析中的流程如下：

先查看每个进程，看看是否是已经安装的。直接 ps aux 加上 grep来看。

之后find /etc/systemd/system 查看是否注册 service

➜ root@cluster1-controlplane1:~# find /etc/systemd/system/ | grep kube
➜ root@cluster1-controlplane1:~# find /etc/systemd/system/ | grep etcd

之后可以查看kube的清单文件（默认安装的资源文件），这里看到的都是节点上部署的staticpod。

find /etc/kubernetes/manifests/

通过下面的拓展知识可以知道，static-pod 有固定的命名方式，可以通过节目的的命名来找到。

kubectl -n kube-system get pod -o wide | grep controlplane1

拓展

静态 Pod 或者说 static-pod 是指的是在指定的节点上由 kubelet 守护进程直接管理，不需要 API 服务器监管。与由控制面管理的 Pod（例如，Deployment）不同；kubelet 监视每个静态 Pod（在它失败之后重新启动）。

直接使用 kubectl 是无法进行节点的控制的，但是资源是可见的。Pod 名称将把以连字符开头的节点主机名作为后缀。

用于在指定的节点上运行一个指定的服务。当然如果是需要全部节点都部署的话那就需要使用到 daemonset。

Question 9 | Kill Scheduler, Manual Scheduling

Ssh into the controlplane node with ssh cluster2-controlplane1. Temporarily stop the kube-scheduler, this means in a way that you can start it again afterwards.

Create a single Pod named manual-schedule of image httpd:2.4-alpine, confirm it’s created but not scheduled on any node.

Now you’re the scheduler and have all its power, manually schedule that Pod on node cluster2-controlplane1. Make sure it’s running.

Start the kube-scheduler again and confirm it’s running correctly by creating a second Pod named manual-schedule2 of image httpd:2.4-alpine and check if it’s running on cluster2-node1.

这里就比较难了，第一步是确定scheduler 的部署方式，使用上面提到的 static的方式找到 scheduler，之后吧他的manifestfile 移除目录。这样scheduler 就没有了。temporarily killed。

手动调度这里就比较高级了。记录下官方的操作方法。

k run manual-schedule --image=httpd:2.4-alpine
# 可以看到pod是pending的状态，没有被调度。
k get pod manual-schedule -o wide
NAME              READY   STATUS    ...   NODE     NOMINATED NODE
manual-schedule   0/1     Pending   ...   <none>   <none>

没有scheduler之后，pod 就不会被进行调度。进行手动调度的方式很简单。获取yaml文件之后。

k get pod manual-schedule -o yaml > 9.yaml

手动的设置他的当前的nodeName。强制的指定他当前的运行node即可完成手动的调度。

下面应用官方的解释，scheduler的目的就是设置 nodename 字段。但是实际上考虑了非常多的变量。

调度程序唯一要做的是它为POD声明设置了名称。它如何找到可以安排的正确节点，这是一个非常复杂的问题，并考虑了许多变量。

# 9.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2020-09-04T15:51:02Z"
  labels:
    run: manual-schedule
  managedFields:
...
    manager: kubectl-run
    operation: Update
    time: "2020-09-04T15:51:02Z"
  name: manual-schedule
  namespace: default
  resourceVersion: "3515"
  selfLink: /api/v1/namespaces/default/pods/manual-schedule
  uid: 8e9d2532-4779-4e63-b5af-feb82c74a935
spec:
  nodeName: cluster2-controlplane1        # add the controlplane node name
  containers:
  - image: httpd:2.4-alpine
    imagePullPolicy: IfNotPresent
    name: manual-schedule
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-nxnc7
      readOnly: true
  dnsPolicy: ClusterFirst
...

之后直接使用命令，replace 掉资源即可完成手动的调度。

k -f 9.yaml replace --force

之后，恢复manifest 文件，恢复 scheduler 的启动。

Question 10 | RBAC ServiceAccount Role RoleBinding

Create a new ServiceAccount processor in Namespace project-hamster. Create a Role and RoleBinding, both named processor as well. These should allow the new SA to only create Secrets and ConfigMaps in that Namespace.

这个比较简单是一个RBAC 的问题。理解role binding 以及 account的关系就可以。

ClusterRole|Role 定义了一组权限及其可用位置，是在整个集群中还是在单个命名空间中。 ClusterRoleBinding|RoleBinding 将一组权限与一个帐户连接起来，并定义它的应用位置，是在整个集群中还是在单个命名空间中。

因此，有 4 种不同的 RBAC 组合和 3 种有效组合：

Role + RoleBinding（单Namespace可用，单Namespace应用）
ClusterRole + ClusterRoleBinding（全集群可用，全集群应用）
ClusterRole + RoleBinding（在集群范围内可用，应用于单个命名空间）
Role + ClusterRoleBinding（不可能：在单个命名空间中可用，在集群范围内应用）

这里记录下创建的过程

➜ k -n project-hamster create sa processor
# 查看role 的help
k -n project-hamster create role -h
# 根据help 来授权动词和资源。
k -n project-hamster create role processor \
  --verb=create \
  --resource=secret \
  --resource=configmap

如果写yaml 文件的话内容如下

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: processor
  namespace: project-hamster
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  - configmaps
  verbs:
  - create

之后创建rolebinding，来绑定 sa 和 role

k -n project-hamster create rolebinding processor \
  --role processor \
  --serviceaccount project-hamster:processor

这里学到了一个技巧，可以使用 auth 来测试自己的sa 权限。可以测试自己的权限。

k auth can-i -h # examples

➜ k -n project-hamster auth can-i create secret \
  --as system:serviceaccount:project-hamster:processor
yes

➜ k -n project-hamster auth can-i create configmap \
  --as system:serviceaccount:project-hamster:processor
yes

➜ k -n project-hamster auth can-i create pod \
  --as system:serviceaccount:project-hamster:processor
no

➜ k -n project-hamster auth can-i delete secret \
  --as system:serviceaccount:project-hamster:processor
no

➜ k -n project-hamster auth can-i get configmap \
  --as system:serviceaccount:project-hamster:processor
no

Question 11 | DaemonSet on all Nodes

Use Namespace project-tiger for the following. Create a DaemonSet named ds-important with image httpd:2.4-alpine and labels id=ds-important and uuid=18426a0b-5f59-4e10-923f-c0e078e82462. The Pods it creates should request 10 millicore cpu and 10 mebibyte memory. The Pods of that DaemonSet should run on all nodes, also controlplanes.

创建一个daemonset ，需要带id的标签，可以直接使用dry run 来生成配置

k -n project-tiger create deployment --image=httpd:2.4-alpine ds-important $do > 11.yaml

这里直接贴yaml 的配置了，里面有些常用字段还是要记忆一下。

# 11.yaml
apiVersion: apps/v1
kind: DaemonSet                                     # change from Deployment to Daemonset
metadata:
  creationTimestamp: null
  labels:                                           # add
    id: ds-important                                # add
    uuid: 18426a0b-5f59-4e10-923f-c0e078e82462      # add
  name: ds-important
  namespace: project-tiger                          # important
spec:
  #replicas: 1                                      # remove
  selector:
    matchLabels:
      id: ds-important                              # add
      uuid: 18426a0b-5f59-4e10-923f-c0e078e82462    # add
  #strategy: {}                                     # remove
  template:
    metadata:
      creationTimestamp: null
      labels:
        id: ds-important                            # add
        uuid: 18426a0b-5f59-4e10-923f-c0e078e82462  # add
    spec:
      containers:
      - image: httpd:2.4-alpine
        name: ds-important
        resources:
          requests:                                 # add
            cpu: 10m                                # add
            memory: 10Mi                            # add
      tolerations:                                  # add
      - effect: NoSchedule                          # add
        key: node-role.kubernetes.io/control-plane  # add
#status: {}                                         # remove

这里需要run on allnode ，所以容忍度这里需要进行配置，需要容忍 control-panel的taint。所以 tolerations 是比较重要的一部份。

拓展

Kubectl apply 和 create 的区别

根据搜索结果 [1][2][3][4][5][6], kubectl apply和kubectl create是用来创建Kubernetes对象的两个命令。它们之间的主要区别在于：

Imperative vs. Declarative：kubectl create是一种Imperative command，它是直接告诉kubectl要创建什么资源或对象，而kubectl apply是一种Declarative command，它不会直接告诉kubectl具体做什么，而是根据后面-f中的yaml文件与k8s中对应的对象进行比较和合并，实现将声明（Desired state）与实际状态（Actual state）进行比较和更新。

资源存在性判断：kubectl create创建新资源，如果给定的资源名称已存在，将返回错误；而kubectl apply则不会创建新资源，而是在已有资源的基础上做更新的操作。

部分更新：kubectl create只能创建完全指定的资源，如果只是想修改资源的某个字段，需要先获取资源然后修改，再进行更新。而kubectl apply可以对yaml文件中部分字段进行更新，只要在文件中定义那些要修改的字段就行了。

综上，kubectl apply适用于部署对象的创建和更新，可以方便地进行部分更新，而kubectl create适用于在没有任何资源的情况下进行全新的创建操作。

个人看法：由于apply基于现有的定义进行更新，仅更改当前定义和现有定义之间的差异，所以适用于应用的持续部署和自动化部署。而create更适用于应用的首次部署或罕见资源的创建。

Question 13 | Multi Containers and Pod shared Volume

Create a Pod named multi-container-playground in Namespace default with three containers, named c1, c2 and c3. There should be a volume attached to that Pod and mounted into every container, but the volume shouldn’t be persisted or shared with other Pods.

Container c1 should be of image nginx:1.17.6-alpine and have the name of the node where its Pod is running available as environment variable MY_NODE_NAME.

Container c2 should be of image busybox:1.31.1 and write the output of the date command every second in the shared volume into file date.log. You can use while true; do date >> /your/vol/path/date.log; sleep 1; done for this.

Container c3 should be of image busybox:1.31.1 and constantly send the content of file date.log from the shared volume to stdout. You can use tail -f /your/vol/path/date.log for this.

Check the logs of container c3 to confirm correct setup.

创建一个有三个container的pod。不同pod不同功能

环境变量缺省的变量名
Empty DIR 共享
run command

yaml 文件重点的部份这里贴一下。

使用节点名来作为环境变量导入container。

  env:
    - name: MY_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

emptyDir 用来container 挂载

# pod
	volumes:
    - name: vol
      emptyDir: {}
# container
  volumeMounts:
    - name: vol
      mountPath: /vol

定义容器的Command，来实现题目中的功能

    # pod1
    command: ["sh", "-c", "while true; do date >> /vol/date.log; sleep 1; done"]
    #pod2 
    command: ["sh", "-c", "tail -f /vol/date.log"]

Question 14 | Find out Cluster Information

You’re ask to find out following information about the cluster k8s-c1-H:

How many controlplane nodes are available?

How many worker nodes are available?

What is the Service CIDR?

Which Networking (or CNI Plugin) is configured and where is its config file?

Which suffix will static pods have that run on cluster1-node1?

获取集群的基本信息的能力，不难，但是有些名词不知道。这里也简单的记录一下。

使用get node 通过role 就可以看到。
使用get node 通过role 就可以看到。
是查看service 的CIDR ，这个就是看 apiserver的配置。它是一个 staticipod。那么可以上控制平面节节点通过manifest 来查看。

➜ ssh cluster1-controlplane1

➜ root@cluster1-controlplane1:~# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep range
    - --service-cluster-ip-range=10.96.0.0/12

查找主机安装的CNI插件，

$find /etc/cni/net.d/
/etc/cni/net.d/
/etc/cni/net.d/10-weave.conflist

$cat /etc/cni/net.d/10-weave.conflist
{
    "cniVersion": "0.3.0",
    "name": "weave",
...

后缀是带有前导连字符的节点主机名。

直接get all pod就可以看到。

Question 15 | Cluster Event Logging

Write a command into /opt/course/15/cluster_events.sh which shows the latest events in the whole cluster, ordered by time (metadata.creationTimestamp). Use kubectl for it.

Now kill the kube-proxy Pod running on node cluster2-node1 and write the events this caused into /opt/course/15/pod_kill.log.

Finally kill the containerd container of the kube-proxy Pod on node cluster2-node1 and write the events into /opt/course/15/container_kill.log.

Do you notice differences in the events both actions caused?

一个查看event的基本操作，用于定位集群问题。

kubectl get events -A --sort-by=.metadata.creationTimestamp

k -n kube-system delete pod kube-proxy-z64cg

# 这里使用 crictl命令 和 docker 命令类似
crictl ps | grep kube-proxy
crictl rm 1e020b43c4423

Question 16 | Namespaces and Api Resources

Write the names of all namespaced Kubernetes resources (like Pod, Secret, ConfigMap…) into /opt/course/16/resources.txt.

Find the project-* Namespace with the highest number of Roles defined in it and write its name and amount of Roles into /opt/course/16/crowded-namespace.txt.

送分题，get 所有的资源，以及role最多的命名空间。

k api-resources    # shows all
k api-resources -h # help always good
k api-resources --namespaced -o name > /opt/course/16/resources.txt

获取ns中间的role 计数

k -n project-c13 get role --no-headers | wc -l

Question 17 | Find Container of Pod and check info

In Namespace project-tiger create a Pod named tigers-reunite of image httpd:2.4.41-alpine with labels pod=container and container=pod. Find out on which node the Pod is scheduled. Ssh into that node and find the containerd container belonging to that Pod.

Using command crictl:

Write the ID of the container and the info.runtimeType into /opt/course/17/pod-container.txt

Write the logs of the container into /opt/course/17/pod-container.log

如题目所示，先创建pod，在找到它调度的节点。

k -n project-tiger run tigers-reunite \
  --image=httpd:2.4.41-alpine \
  --labels "pod=container,container=pod"
  
k -n project-tiger get pod -o wide

登录节点主机获取容器的信息，这里使用 crictl 命令实际上和 docker 命令是一样的

➜ ssh cluster1-node2

➜ root@cluster1-node2:~# crictl ps | grep tigers-reunite
b01edbe6f89ed    54b0995a63052    5 seconds ago    Running        tigers-reunite ...

➜ root@cluster1-node2:~# crictl inspect b01edbe6f89ed | grep runtimeType
    "runtimeType": "io.containerd.runc.v2",
    
# get 
ssh cluster1-node2 'crictl logs b01edbe6f89ed' &> /opt/course/17/pod-container.log

Question 18 | Fix Kubelet

There seems to be an issue with the kubelet not running on cluster3-node1. Fix it and confirm that cluster has node cluster3-node1 available in Ready state afterwards. You should be able to schedule a Pod on cluster3-node1 afterwards.

Write the reason of the issue into /opt/course/18/reason.txt.

找到节点上 Kubectl 的未运行的原因。

service kubelet status
➜ root@cluster3-node1:~# /usr/local/bin/kubelet
-bash: /usr/local/bin/kubelet: No such file or directory

➜ root@cluster3-node1:~# whereis kubelet
kubelet: /usr/bin/kubelet

# /opt/course/18/reason.txt
wrong path to kubelet binary specified in service config

Question 19 | Create Secret and mount into Pod

Do the following in a new Namespace secret. Create a Pod named secret-pod of image busybox:1.31.1 which should keep running for some time.

There is an existing Secret located at /opt/course/19/secret1.yaml, create it in the Namespace secret and mount it readonly into the Pod at /tmp/secret1.

Create a new Secret in Namespace secret called secret2 which should contain user=user1 and pass=1234. These entries should be available inside the Pod’s container as environment variables APP_USER and APP_PASS.

Confirm everything is working.

简单的看下题目，是secret 的用法。一个是挂载只读。一个是注入到环境变量中。

Secret 文件

# 19_secret1.yaml
apiVersion: v1
data:
  halt: IyEgL2Jpbi9zaAo...
kind: Secret
metadata:
  creationTimestamp: null
  name: secret1
  namespace: secret           # change

之后使用命令来进行创建

k -f 19_secret1.yaml create

k -n secret create secret generic secret2 --from-literal=user=user1 --from-literal=pass=1234

使用dry run 来创建yaml 模板

k -n secret run secret-pod --image=busybox:1.31.1 $do -- sh -c "sleep 5d" > 19.yaml

如下文来进行yaml 的文件修改，里面有两段，一个是使用 key 来导入到 ENV 中去。一段是挂载只读文件。

# 19.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: secret-pod
  name: secret-pod
  namespace: secret                       # add
spec:
  containers:
  - args:
    - sh
    - -c
    - sleep 1d
    image: busybox:1.31.1
    name: secret-pod
    resources: {}
    env:                                  # add
    - name: APP_USER                      # add
      valueFrom:                          # add
        secretKeyRef:                     # add
          name: secret2                   # add
          key: user                       # add
    - name: APP_PASS                      # add
      valueFrom:                          # add
        secretKeyRef:                     # add
          name: secret2                   # add
          key: pass                       # add
    volumeMounts:                         # add
    - name: secret1                       # add
      mountPath: /tmp/secret1             # add
      readOnly: true                      # add
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  volumes:                                # add
  - name: secret1                         # add
    secret:                               # add
      secretName: secret1                 # add
status: {}

Question 20 | Update Kubernetes Version and join cluster

Your coworker said node cluster3-node2 is running an older Kubernetes version and is not even part of the cluster. Update Kubernetes on that node to the exact version that’s running on cluster3-controlplane1. Then add this node to the cluster. Use kubeadm for this.

k8s 节点的升级。kubeadm upgrade node 但是回显报错。经验上看是没有安装k8s的环境。所以需要进行安装，并且手动的加入到集群。

couldn't create a Kubernetes client from file "/etc/kubernetes/kubelet.conf": failed to load admin kubeconfig: open /etc/kubernetes/kubelet.conf: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

指定安装命令如下。

root@cluster3-node2:~# apt show kubectl -a | grep 1.26
apt install kubectl=1.26.0-00 kubelet=1.26.0-00
kubelet --version
service kubelet restart
service kubelet status
# 输出会有报错，因为没有加入集群

在控制平面上创建新的join token

➜ ssh cluster3-controlplane1
➜ root@cluster3-controlplane1:~# kubeadm token create --print-join-command
kubeadm join 192.168.100.31:6443 --token rbhrjh.4o93r31o18an6dll --discovery-token-ca-cert-hash sha256:d94524f9ab1eed84417414c7def5c1608f84dbf04437d9f5f73eb6255dafdb18
➜ root@cluster3-controlplane1:~# kubeadm token list

到node节点上在使用 join 命令来加入到集群中去

kubeadm join 192.168.100.31:6443 --token rbhrjh.4o93r31o18an6dll --discovery-token-ca-cert-hash

Question 21 | Create a Static Pod and Service

Create a Static Pod named my-static-pod in Namespace default on cluster3-controlplane1. It should be of image nginx:1.16-alpine and have resource requests for 10m CPU and 20Mi memory.

Then create a NodePort Service named static-pod-service which exposes that static Pod on port 80 and check if it has Endpoints and if it’s reachable through the cluster3-controlplane1 internal IP address. You can connect to the internal node IPs from your main terminal.

static pod 的创建，前面提到了相关的知识点。 static pod 是不受 kube api 控制的。是本机的kubelet 直接控制。创建起来很简单，放在manifest 文件夹中就可以成功创建了。

➜ ssh cluster3-controlplane1

➜ root@cluster1-controlplane1:~# cd /etc/kubernetes/manifests/

➜ root@cluster1-controlplane1:~# kubectl run my-static-pod \
    --image=nginx:1.16-alpine \
    -o yaml --dry-run=client > my-static-pod.yaml

Question 22 | Check how long certificates are valid

Check how long the kube-apiserver server certificate is valid on cluster2-controlplane1. Do this with openssl or cfssl. Write the exipiration date into /opt/course/22/expiration.

Also run the correct kubeadm command to list the expiration dates and confirm both methods show the same date.

Write the correct kubeadm command that would renew the apiserver server certificate into /opt/course/22/kubeadm-renew-certs.sh.

➜ ssh cluster2-controlplane1

➜ root@cluster2-controlplane1:~# find /etc/kubernetes/pki | grep apiserver

#openssl 进行查看
openssl x509  -noout -text -in /etc/kubernetes/pki/apiserver.crt | grep Validity -A2

使用kubeadm 来查看证书情况，以及renew

➜ root@cluster2-controlplane1:~# kubeadm certs check-expiration | grep apiserver
kubeadm certs renew apiserver

Question 23 | Kubelet client/server cert info

Node cluster2-node1 has been added to the cluster using kubeadm and TLS bootstrapping.

Find the “Issuer” and “Extended Key Usage” values of the cluster2-node1:

kubelet client certificate, the one used for outgoing connections to the kube-apiserver.

kubelet server certificate, the one used for incoming connections from the kube-apiserver.

Write the information into file /opt/course/23/certificate-info.txt.

Compare the “Issuer” and “Extended Key Usage” fields of both certificates and make sense of these.

Question 24 | NetworkPolicy

There was a security incident where an intruder was able to access the whole cluster from a single hacked backend Pod.

To prevent this create a NetworkPolicy called np-backend in Namespace project-snake. It should allow the backend-* Pods only to:

connect to db1-* Pods on port 1111

connect to db2-* Pods on port 2222

Use the app label of Pods in your policy.

After implementation, connections from backend-* Pods to vault-* Pods on port 3333 should for example no longer work.

配置 Networkpolicy 来限制pod 到指定范围的出口。

# 先看看目前的pod环境
➜ k -n project-snake get pod -L app
# 得到目前的pod 的ip
➜ k -n project-snake get pod -o wide

### 使用exec 来进行访问测试，可见目前都是通的
➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.25:1111
database one
➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.23:2222
database two
➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.22:3333
vault secret storage

编辑networkPolicy，下面直接列出来配置，因为是对backend 进行配置，所以是限制他的egress。

# 24_np.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: np-backend
  namespace: project-snake
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress                    # policy is only about Egress
  egress:
    -                           # first rule
      to:                           # first condition "to"
      - podSelector:
          matchLabels:
            app: db1
      ports:                        # second condition "port"
      - protocol: TCP
        port: 1111
    -                           # second rule
      to:                           # first condition "to"
      - podSelector:
          matchLabels:
            app: db2
      ports:                        # second condition "port"
      - protocol: TCP
        port: 2222

Question 25 | Etcd Snapshot Save and Restore

Make a backup of etcd running on cluster3-controlplane1 and save it on the controlplane node at /tmp/etcd-backup.db.

Then create a Pod of your kind in the cluster.

Finally restore the backup, confirm the cluster is still working and that the created Pod is no longer with us.

ETCD 的备份和恢复。 ETCD 可能是服务安装方式，或者是集群内安装方式。所以需要检查etc文件夹和 manifest 文件夹。找到配置文件。来了得到关键的信息。证书的路径等等

➜ root@cluster3-controlplane1:~# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379

TIPS

Components

Understanding Kubernetes components and being able to fix and investigate clusters: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster
Know advanced scheduling: https://kubernetes.io/docs/concepts/scheduling/kube-scheduler
When you have to fix a component (like kubelet) in one cluster, just check how it’s setup on another node in the same or even another cluster. You can copy config files over etc
If you like you can look at Kubernetes The Hard Way once. But it’s NOT necessary to do, the CKA is not that complex. But KTHW helps understanding the concepts
You should install your own cluster using kubeadm (one controlplane, one worker) in a VM or using a cloud provider and investigate the components
Know how to use Kubeadm to for example add nodes to a cluster
Know how to create an Ingress resources
Know how to snapshot/restore ETCD from another machine

CKA Preparation

Read the Curriculum

https://github.com/cncf/curriculum

Read the Handbook

https://docs.linuxfoundation.org/tc-docs/certification/lf-candidate-handbook

Read the important tips

https://docs.linuxfoundation.org/tc-docs/certification/tips-cka-and-ckad

Read the FAQ

https://docs.linuxfoundation.org/tc-docs/certification/faq-cka-ckad

考前技巧#

正题#

Q1#

Q2#

拓展#

Q3#

Q4#

拓展#

Q5 Question 5 | Kubectl sorting#

Question 6 | Storage, PV, PVC, Pod volume#

Question 7 | Node and Pod Resource Usage#

Question 8 | Get Controlplane Information#

拓展#

Question 9 | Kill Scheduler, Manual Scheduling#

Question 10 | RBAC ServiceAccount Role RoleBinding#

Question 11 | DaemonSet on all Nodes#

拓展#

Question 13 | Multi Containers and Pod shared Volume#

Question 14 | Find out Cluster Information#

Question 15 | Cluster Event Logging#

Question 16 | Namespaces and Api Resources#

Question 17 | Find Container of Pod and check info#

Question 18 | Fix Kubelet#

Question 19 | Create Secret and mount into Pod#

Question 20 | Update Kubernetes Version and join cluster#

Question 21 | Create a Static Pod and Service#

Question 22 | Check how long certificates are valid#

Question 23 | Kubelet client/server cert info#

Question 24 | NetworkPolicy#

Question 25 | Etcd Snapshot Save and Restore#

TIPS#

CKA Preparation#