pod调度将 Pod 指派给节点
人生的哲理 人气:0一.系统环境
服务器版本 | docker软件版本 | Kubernetes(k8s)集群版本 | CPU架构 |
---|---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | v1.21.9 | x86_64 |
Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
服务器 | 操作系统版本 | CPU架构 | 进程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master节点 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
二.前言
本文介绍pod的调度,即如何让pod运行在Kubernetes集群的指定节点。
进行pod的调度的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》
三.pod的调度
3.1 pod的调度概述
你可以约束一个 Pod 以便 限制 其只能在特定的节点上运行, 或优先在特定的节点上运行。 有几种方法可以实现这点,推荐的方法都是用 标签选择算符来进行选择。 通常这样的约束不是必须的,因为调度器将自动进行合理的放置(比如,将 Pod 分散到节点上, 而不是将 Pod 放置在可用资源不足的节点上等等)。但在某些情况下,你可能需要进一步控制 Pod 被部署到哪个节点。例如,确保 Pod 最终落在连接了 SSD 的机器上, 或者将来自两个不同的服务且有大量通信的 Pods 被放置在同一个可用区。
你可以使用下列方法中的任何一种来选择 Kubernetes 对特定 Pod 的调度:
- 与节点标签匹配的 nodeSelector
- 亲和性与反亲和性
- nodeName 字段
- Pod 拓扑分布约束
3.2 pod自动调度
如果不手动指定pod运行在哪个节点上,k8s会自动调度pod的,k8s自动调度pod在哪个节点上运行考虑的因素有:
- 待调度的pod列表
- 可用的node列表
- 调度算法:主机过滤,主机打分
3.2.1 创建3个主机端口为80的pod
查看hostPort字段的解释,hostPort字段表示把pod的端口映射到节点,即在节点上公开 Pod 的端口。
#主机端口映射:hostPort: 80 [root@k8scloude1 pod]# kubectl explain pods.spec.containers.ports.hostPort KIND: Pod VERSION: v1 FIELD: hostPort <integer> DESCRIPTION: Number of port to expose on the host. If specified, this must be a valid port number, 0 < x < 65536. If HostNetwork is specified, this must match ContainerPort. Most containers do not need this.
创建第一个pod,hostPort: 80表示把容器的80端口映射到节点的80端口
[root@k8scloude1 pod]# vim schedulepod.yaml #kind: Pod表示资源类型为Pod labels指定pod标签 metadata下面的name指定pod名字 containers下面全是容器的定义 #image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名 #resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略 #restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口 [root@k8scloude1 pod]# cat schedulepod.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod name: pod namespace: pod spec: containers: - image: nginx imagePullPolicy: IfNotPresent name: pod resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod.yaml pod/pod created [root@k8scloude1 pod]# kubectl get pods NAME READY STATUS RESTARTS AGE pod 1/1 Running 0 6s
可以看到pod创建成功。
接下来创建第二个pod,hostPort: 80表示把容器的80端口映射到节点的80端口,两个pod只有pod名字不一样。
[root@k8scloude1 pod]# cp schedulepod.yaml schedulepod1.yaml [root@k8scloude1 pod]# vim schedulepod1.yaml [root@k8scloude1 pod]# cat schedulepod1.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod1.yaml pod/pod1 created [root@k8scloude1 pod]# kubectl get pods NAME READY STATUS RESTARTS AGE pod 1/1 Running 0 11m pod1 1/1 Running 0 5s
第二个pod创建成功,现在创建第三个pod。
开篇我们已经介绍过集群架构,Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
,k8s集群只有2个worker节点,master节点默认不运行应用pod,主机端口80已经被占用两台worker节点全部占用,所以pod2无法运行。
[root@k8scloude1 pod]# sed 's/pod1/pod2/' schedulepod1.yaml | kubectl apply -f - pod/pod2 created #主机端口80已经被占用两台worker节点全部占用,pod2无法运行 [root@k8scloude1 pod]# kubectl get pods NAME READY STATUS RESTARTS AGE pod 1/1 Running 0 16m pod1 1/1 Running 0 5m28s pod2 0/1 Pending 0 5s
观察pod在k8s集群的分布情况,NODE
显示pod运行在哪个节点
[root@k8scloude1 pod]# kubectl get pods NAME READY STATUS RESTARTS AGE pod 1/1 Running 0 18m pod1 1/1 Running 0 7m28s [root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod 1/1 Running 0 29m 10.244.251.208 k8scloude3 <none> <none> pod1 1/1 Running 0 18m 10.244.112.156 k8scloude2 <none> <none>
删除pod
[root@k8scloude1 pod]# kubectl delete pod pod2 pod "pod2" deleted [root@k8scloude1 pod]# kubectl delete pod pod1 pod pod "pod1" deleted pod "pod" deleted
上面三个pod都是k8s自动调度的,下面我们手动指定pod运行在哪个节点。
3.3 使用nodeName 字段指定pod运行在哪个节点
使用nodeName 字段指定pod运行在哪个节点,这是一种比较直接的方式,nodeName 是 Pod 规约中的一个字段。如果 nodeName 字段不为空,调度器会忽略该 Pod, 而指定节点上的 kubelet 会尝试将 Pod 放到该节点上。
使用 nodeName 规则的优先级会高于使用 nodeSelector 或亲和性与非亲和性的规则。
使用 nodeName 来选择节点的方式有一些局限性:
- 如果所指代的节点不存在,则 Pod 无法运行,而且在某些情况下可能会被自动删除。
- 如果所指代的节点无法提供用来运行 Pod 所需的资源,Pod 会失败, 而其失败原因中会给出是否因为内存或 CPU 不足而造成无法运行。
- 在云环境中的节点名称并不总是可预测的,也不总是稳定的。
创建pod,nodeName: k8scloude3表示pod要运行在名为k8scloude3
的节点
[root@k8scloude1 pod]# vim schedulepod2.yaml #kind: Pod表示资源类型为Pod labels指定pod标签 metadata下面的name指定pod名字 containers下面全是容器的定义 #image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名 #resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略 #restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口 #nodeName: k8scloude3指定pod在k8scloude3上运行 [root@k8scloude1 pod]# cat schedulepod2.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: nodeName: k8scloude3 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod2.yaml pod/pod1 created
可以看到pod运行在k8scloude3节点
[root@k8scloude1 pod]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 7s 10.244.251.209 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. [root@k8scloude1 pod]# kubectl get pods No resources found in pod namespace.
创建pod,nodeName: k8scloude1让pod运行在k8scloude1节点
[root@k8scloude1 pod]# vim schedulepod3.yaml #kind: Pod表示资源类型为Pod labels指定pod标签 metadata下面的name指定pod名字 containers下面全是容器的定义 #image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名 #resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略 #restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口 #nodeName: k8scloude1让pod运行在k8scloude1节点 [root@k8scloude1 pod]# cat schedulepod3.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: nodeName: k8scloude1 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod3.yaml pod/pod1 created
可以看到pod运行在k8scloude1,注意k8scloude1是master节点,master节点一般不运行应用pod,并且k8scloude1有污点,一般来说,pod是不运行在有污点的主机上的,如果强制调度上去的话,pod的状态应该是pending,但是通过nodeName可以把一个pod调度到有污点的主机上正常运行的,比如nodeName指定pod运行在master上
[root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 47s 10.244.158.81 k8scloude1 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
3.4 使用节点标签nodeSelector指定pod运行在哪个节点
与很多其他 Kubernetes 对象类似,节点也有标签。 你可以手动地添加标签。 Kubernetes 也会为集群中所有节点添加一些标准的标签。
通过为节点添加标签,你可以准备让 Pod 调度到特定节点或节点组上。 你可以使用这个功能来确保特定的 Pod 只能运行在具有一定隔离性,安全性或监管属性的节点上。
nodeSelector 是节点选择约束的最简单推荐形式。你可以将 nodeSelector 字段添加到 Pod 的规约中设置你希望目标节点所具有的节点标签。 Kubernetes 只会将 Pod 调度到拥有你所指定的每个标签的节点上。nodeSelector 提供了一种最简单的方法来将 Pod 约束到具有特定标签的节点上。
3.4.1 查看标签
查看节点node的标签,标签的格式:键值对:xxxx/yyyy.aaaa=456123,xxxx1/yyyy1.aaaa=456123,--show-labels参数显示标签
[root@k8scloude1 pod]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= k8scloude2 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux k8scloude3 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
查看namespace的标签
[root@k8scloude1 pod]# kubectl get ns --show-labels NAME STATUS AGE LABELS default Active 7d1h kubernetes.io/metadata.name=default kube-node-lease Active 7d1h kubernetes.io/metadata.name=kube-node-lease kube-public Active 7d1h kubernetes.io/metadata.name=kube-public kube-system Active 7d1h kubernetes.io/metadata.name=kube-system ns1 Active 6d5h kubernetes.io/metadata.name=ns1 ns2 Active 6d5h kubernetes.io/metadata.name=ns2 pod Active 4d2h kubernetes.io/metadata.name=pod
查看pod的标签
[root@k8scloude1 pod]# kubectl get pod -A --show-labels NAMESPACE NAME READY STATUS RESTARTS AGE LABELS kube-system calico-kube-controllers-6b9fbfff44-4jzkj 1/1 Running 12 7d k8s-app=calico-kube-controllers,pod-template-hash=6b9fbfff44 kube-system calico-node-bdlgm 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1 kube-system calico-node-hx8bk 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1 kube-system calico-node-nsbfs 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1 kube-system coredns-545d6fc579-7wm95 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579 kube-system coredns-545d6fc579-87q8j 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579 kube-system etcd-k8scloude1 1/1 Running 7 7d1h component=etcd,tier=control-plane kube-system kube-apiserver-k8scloude1 1/1 Running 11 7d1h component=kube-apiserver,tier=control-plane kube-system kube-controller-manager-k8scloude1 1/1 Running 7 7d1h component=kube-controller-manager,tier=control-plane kube-system kube-proxy-599xh 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1 kube-system kube-proxy-lpj8z 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1 kube-system kube-proxy-zxlk9 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1 kube-system kube-scheduler-k8scloude1 1/1 Running 7 7d1h component=kube-scheduler,tier=control-plane kube-system metrics-server-bcfb98c76-k5dmj 1/1 Running 6 6d5h k8s-app=metrics-server,pod-template-hash=bcfb98c76
3.4.2 创建标签
以node-role.kubernetes.io/control-plane= 标签为例,键是node-role.kubernetes.io/control-plane,值为空。
创建标签的语法:kubectl label 对象类型 对象名 键=值
给k8scloude2节点设置标签
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2 node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,k8snodename=k8scloude2,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
k8scloude2节点删除标签
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename- node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux
列出含有标签k8snodename=k8scloude2的节点
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2 #列出含有标签k8snodename=k8scloude2的节点 [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2 NAME STATUS ROLES AGE VERSION k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename- node/k8scloude2 labeled
对所有节点设置标签
[root@k8scloude1 pod]# kubectl label nodes --all k8snodename=cloude node/k8scloude1 labeled node/k8scloude2 labeled node/k8scloude3 labeled
列出含有标签k8snodename=cloude的节点
#列出含有标签k8snodename=cloude的节点 [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 7d1h v1.21.0 k8scloude2 Ready <none> 7d1h v1.21.0 k8scloude3 Ready <none> 7d1h v1.21.0 #删除标签 [root@k8scloude1 pod]# kubectl label nodes --all k8snodename- node/k8scloude1 labeled node/k8scloude2 labeled node/k8scloude3 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude No resources found
--overwrite参数,标签的覆盖
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2 node/k8scloude2 labeled #标签的覆盖 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude error: 'k8snodename' already has a value (k8scloude2), and --overwrite is false #--overwrite参数,标签的覆盖 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude --overwrite node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2 No resources found [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude NAME STATUS ROLES AGE VERSION k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename- node/k8scloude2 labeled
Tips:如果不想在k8scloude1的ROLES里看到control-plane,则可以通过取消标签达到目的:kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane- 进行取消标签
[root@k8scloude1 pod]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers= k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux [root@k8scloude1 pod]# kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane-
3.4.3 通过标签控制pod在哪个节点运行
给k8scloude2节点打上标签k8snodename=k8scloude2
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2 node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2 NAME STATUS ROLES AGE VERSION k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl get pods No resources found in pod namespace.
创建pod,nodeSelector:k8snodename: k8scloude2 指定pod运行在标签为k8snodename=k8scloude2的节点上
[root@k8scloude1 pod]# vim schedulepod4.yaml [root@k8scloude1 pod]# cat schedulepod4.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: nodeSelector: k8snodename: k8scloude2 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod4.yaml pod/pod1 created
可以看到pod运行在k8scloude2节点
[root@k8scloude1 pod]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 21s 10.244.112.158 k8scloude2 <none> <none>
删除pod,删除标签
[root@k8scloude1 pod]# kubectl get pod --show-labels NAME READY STATUS RESTARTS AGE LABELS pod1 1/1 Running 0 32m run=pod1 [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted [root@k8scloude1 pod]# kubectl get pod --show-labels No resources found in pod namespace. [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename- node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2 No resources found [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude No resources found
注意:如果两台主机的标签是一致的,那么通过在这两台机器上进行打分,哪个机器分高,pod就运行在哪个pod上
给k8s集群的master节点打标签
[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename=k8scloude1 node/k8scloude1 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1 NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 7d2h v1.21.0
创建pod,nodeSelector:k8snodename: k8scloude1 指定pod运行在标签为k8snodename=k8scloude1的节点上
[root@k8scloude1 pod]# vim schedulepod5.yaml [root@k8scloude1 pod]# cat schedulepod5.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: nodeSelector: k8snodename: k8scloude1 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod5.yaml pod/pod1 created
因为k8scloude1上有污点,所以pod不能运行在k8scloude1上,pod状态为Pending
[root@k8scloude1 pod]# kubectl get pod NAME READY STATUS RESTARTS AGE pod1 0/1 Pending 0 9s
删除pod,删除标签
[root@k8scloude1 pod]# kubectl delete pod pod1 pod "pod1" deleted [root@k8scloude1 pod]# kubectl get pod No resources found in pod namespace. [root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename- node/k8scloude1 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1 No resources found
3.5 使用亲和性与反亲和性调度pod
nodeSelector 提供了一种最简单的方法来将 Pod 约束到具有特定标签的节点上。 亲和性和反亲和性扩展了你可以定义的约束类型。使用亲和性与反亲和性的一些好处有:
- 亲和性、反亲和性语言的表达能力更强。nodeSelector 只能选择拥有所有指定标签的节点。 亲和性、反亲和性为你提供对选择逻辑的更强控制能力。
- 你可以标明某规则是“软需求”或者“偏好”,这样调度器在无法找到匹配节点时仍然调度该 Pod。
- 你可以使用节点上(或其他拓扑域中)运行的其他 Pod 的标签来实施调度约束, 而不是只能使用节点本身的标签。这个能力让你能够定义规则允许哪些 Pod 可以被放置在一起。
亲和性功能由两种类型的亲和性组成:
- 节点亲和性功能类似于 nodeSelector 字段,但它的表达能力更强,并且允许你指定软规则。
- Pod 间亲和性/反亲和性允许你根据其他 Pod 的标签来约束 Pod。
节点亲和性概念上类似于 nodeSelector, 它使你可以根据节点上的标签来约束 Pod 可以调度到哪些节点上。 节点亲和性有两种:
- requiredDuringSchedulingIgnoredDuringExecution: 调度器只有在规则被满足的时候才能执行调度。此功能类似于 nodeSelector, 但其语法表达能力更强。
- preferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。
在上述类型中,IgnoredDuringExecution 意味着如果节点标签在 Kubernetes 调度 Pod 后发生了变更,Pod 仍将继续运行。
你可以使用 Pod 规约中的 .spec.affinity.nodeAffinity 字段来设置节点亲和性。
查看nodeAffinity字段解释
[root@k8scloude1 pod]# kubectl explain pods.spec.affinity.nodeAffinity KIND: Pod VERSION: v1 RESOURCE: nodeAffinity <Object> DESCRIPTION: Describes node affinity scheduling rules for the pod. Node affinity is a group of node affinity scheduling rules. FIELDS: #软策略 preferredDuringSchedulingIgnoredDuringExecution <[]Object> The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred. #硬策略 requiredDuringSchedulingIgnoredDuringExecution <Object> If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.
3.5.1 使用硬策略requiredDuringSchedulingIgnoredDuringExecution
创建pod,requiredDuringSchedulingIgnoredDuringExecution参数表示:节点必须包含一个键名为 kubernetes.io/hostname
的标签, 并且该标签的取值必须为 k8scloude2
或 k8scloude3
。
你可以使用 operator 字段来为 Kubernetes 设置在解释规则时要使用的逻辑操作符。 你可以使用 In、NotIn、Exists、DoesNotExist、Gt 和 Lt 之一作为操作符。NotIn 和 DoesNotExist 可用来实现节点反亲和性行为。 你也可以使用节点污点 将 Pod 从特定节点上驱逐。
注意:
- 如果你同时指定了 nodeSelector 和 nodeAffinity,两者 必须都要满足, 才能将 Pod 调度到候选节点上。
- 如果你指定了多个与 nodeAffinity 类型关联的 nodeSelectorTerms, 只要其中一个 nodeSelectorTerms 满足的话,Pod 就可以被调度到节点上。
- 如果你指定了多个与同一 nodeSelectorTerms 关联的 matchExpressions, 则只有当所有 matchExpressions 都满足时 Pod 才可以被调度到节点上。
[root@k8scloude1 pod]# vim requiredDuringSchedule.yaml #硬策略 [root@k8scloude1 pod]# cat requiredDuringSchedule.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8scloude2 - k8scloude3 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule.yaml pod/pod1 created
可以看到pod运行在k8scloude3节点
[root@k8scloude1 pod]# kubectl get pods NAME READY STATUS RESTARTS AGE pod1 1/1 Running 0 6s [root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 10s 10.244.251.212 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
创建pod,requiredDuringSchedulingIgnoredDuringExecution参数表示:节点必须包含一个键名为 kubernetes.io/hostname
的标签, 并且该标签的取值必须为 k8scloude4
或 k8scloude5
。
[root@k8scloude1 pod]# vim requiredDuringSchedule1.yaml [root@k8scloude1 pod]# cat requiredDuringSchedule1.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8scloude4 - k8scloude5 containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule1.yaml pod/pod1 created
由于requiredDuringSchedulingIgnoredDuringExecution是硬策略,k8scloude4,k8scloude5不满足条件,所以pod创建失败
[root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 0/1 Pending 0 7s <none> <none> <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
3.5.2 使用软策略preferredDuringSchedulingIgnoredDuringExecution
给节点打标签
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 xx=72 node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl label nodes k8scloude3 xx=59 node/k8scloude3 labeled
创建pod,preferredDuringSchedulingIgnoredDuringExecution参数表示:节点最好具有一个键名为 xx
且取值大于 60
的标签。
[root@k8scloude1 pod]# vim preferredDuringSchedule.yaml [root@k8scloude1 pod]# cat preferredDuringSchedule.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 2 preference: matchExpressions: - key: xx operator: Gt values: - "60" containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule.yaml pod/pod1 created
可以看到pod运行在k8scloude2,因为k8scloude2标签为 xx=72,72大于60
[root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 13s 10.244.112.159 k8scloude2 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
创建pod,preferredDuringSchedulingIgnoredDuringExecution参数表示:节点最好具有一个键名为 xx
且取值大于 600
的标签。
[root@k8scloude1 pod]# vim preferredDuringSchedule1.yaml [root@k8scloude1 pod]# cat preferredDuringSchedule1.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 2 preference: matchExpressions: - key: xx operator: Gt values: - "600" containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule1.yaml pod/pod1 created
因为preferredDuringSchedulingIgnoredDuringExecution是软策略,尽管k8scloude2,k8scloude3都不满足xx>600,但是还是能成功创建pod
[root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 7s 10.244.251.213 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
3.5.3 节点亲和性权重
你可以为 preferredDuringSchedulingIgnoredDuringExecution
亲和性类型的每个实例设置 weight 字段,其取值范围是 1 到 100。 当调度器找到能够满足 Pod 的其他调度请求的节点时,调度器会遍历节点满足的所有的偏好性规则, 并将对应表达式的 weight 值加和。最终的加和值会添加到该节点的其他优先级函数的评分之上。 在调度器为 Pod 作出调度决定时,总分最高的节点的优先级也最高。
给节点打标签
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 yy=59 node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl label nodes k8scloude3 yy=72 node/k8scloude3 labeled
创建pod,preferredDuringSchedulingIgnoredDuringExecution指定了2条软策略,但是权重不一样:weight: 2 和 weight: 10
[root@k8scloude1 pod]# vim preferredDuringSchedule2.yaml [root@k8scloude1 pod]# cat preferredDuringSchedule2.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 namespace: pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 2 preference: matchExpressions: - key: xx operator: Gt values: - "60" - weight: 10 preference: matchExpressions: - key: yy operator: Gt values: - "60" containers: - image: nginx imagePullPolicy: IfNotPresent name: pod1 resources: {} ports: - name: http containerPort: 80 protocol: TCP hostPort: 80 dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule2.yaml pod/pod1 created
存在两个候选节点,因为yy>60这条规则的weight权重大,所以pod运行在k8scloude3
[root@k8scloude1 pod]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 10s 10.244.251.214 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted
3.6 Pod 拓扑分布约束
你可以使用 拓扑分布约束(Topology Spread Constraints) 来控制 Pod 在集群内故障域之间的分布, 故障域的示例有区域(Region)、可用区(Zone)、节点和其他用户自定义的拓扑域。 这样做有助于提升性能、实现高可用或提升资源利用率。
加载全部内容