K8s 三种核心控制器
K8s 应用的"控制器"决定了 Pod 的行为模式。生产环境用错控制器,轻则"Pod 不断重启",重则"数据全丢"。
1. Deployment:无状态
1.1 原理
Deployment 通过 ReplicaSet 管 Pod 副本——保证"指定数量的 Pod 副本在运行",Pod 之间无差别。
适用场景:
- Web 服务(nginx / Spring Boot)
- API 服务
- 微服务(无状态)
- 任何"可以随时被 kill + 重建"的应用
1.2 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-container
image: nginx:latest
ports:
- containerPort: 80
|
2. StatefulSet:有状态
2.1 原理
StatefulSet 给每个 Pod 分配:
- 稳定的网络标识:
my-sts-0、my-sts-1、my-sts-2 - 稳定的持久存储:每个 Pod 绑定独立 PVC
- 有序部署 / 扩容 / 缩容:
my-sts-0 先 Ready,my-sts-1 才创建
适用场景:
- 数据库(MySQL、MongoDB、PostgreSQL)
- 消息队列(Kafka、RabbitMQ 集群模式)
- 分布式存储节点
- 任何"必须有持久身份"的应用
2.2 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-statefulset
spec:
replicas: 3
serviceName: my-service
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-container
image: nginx:latest
ports:
- containerPort: 80
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
|
2.3 关键:serviceName 必填
StatefulSet 必须有 serviceName(Headless Service 的名字),K8s 通过这个 Service 给每个 Pod 生成 DNS 记录:
1
2
| my-sts-0.my-service.default.svc.cluster.local
my-sts-1.my-service.default.svc.cluster.local
|
3. DaemonSet:节点守护
3.1 原理
DaemonSet 在每个 Node 上都跑一个 Pod 副本。新节点加入集群 → 自动创建 Pod;节点删除 → 自动清理。
适用场景:
- 日志收集(Promtail / Filebeat / Fluentd)
- 节点监控(node-exporter / cadvisor)
- 网络组件(calico-node / flannel / cilium)
- 存储守护进程
3.2 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| apiVersion: apps/v1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-container
image: nginx:latest
ports:
- containerPort: 80
|
DaemonSet 不关心 replicas 数量,只关心"每个节点都有"。
3.3 节点选择器
1
2
3
4
5
6
7
8
9
10
| spec:
template:
spec:
nodeSelector:
node-role: worker # 只在打了这个标签的节点跑
tolerations:
- key: dedicated
operator: Equal
value: ceph
effect: NoSchedule # 能容忍 NoSchedule 污点
|
4. 三者对比
| 维度 | Deployment | StatefulSet | DaemonSet |
|---|
| 副本数 | 任意 | 任意 | 节点数 |
| Pod 名称 | 随机 | 有序稳定 | 稳定 |
| 存储 | 共享 PVC | 独立 PVC | hostPath / 共享 |
| 扩缩容 | 无序 | 有序 | 自动跟随节点 |
| 滚动更新 | 任意 | 有序 | 任意 |
| 删除 | 一起删 | 有序删 | 节点没了才删 |
| 适用 | Web / API | DB / 消息队列 | 监控 / 日志 / 网络 |
5. 健康探针(Probe)
5.1 3 种探针
| 探针 | 作用 | 失败后果 |
|---|
| livenessProbe | 容器是否"活着" | 重启容器 |
| readinessProbe | 容器是否"准备好接流量" | 从 Service Endpoints 移除 |
| startupProbe | 启动是否完成 | 慢启动场景,避免 liveness 过早失败 |
5.2 Spring Boot 应用示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| spec:
containers:
- name: springboot-app
image: my-app:1.0
ports:
- name: http
containerPort: 8080
- name: management
containerPort: 9090
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: management
initialDelaySeconds: 180 # 启动后等 3 分钟
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 6 # 连续失败 6 次才重启
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: management
initialDelaySeconds: 180
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 9 # 连续失败 9 次才下线
|
坑:initialDelaySeconds 设太短(如 10s)会导致慢启动应用被频繁重启;设太长(>10min)会拖慢故障恢复。
5.3 tcpSocket 探针
1
2
3
4
5
| livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
|
5.4 exec 探针
1
2
3
4
5
6
7
| livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
|
6. hostNetwork 调度
6.1 为什么用
有些场景 Pod 必须用宿主网络命名空间:
- Ingress-nginx(DaemonSet + hostNetwork 监听 80/443)
- 监控 agent(采集宿主机 metrics)
- 端口被固定需要"独占"某端口
6.2 示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| apiVersion: apps/v1
kind: Deployment
metadata:
name: safety-powerjob-worker
namespace: test
spec:
replicas: 1
template:
metadata:
labels:
app: safety-powerjob-worker
spec:
hostNetwork: true
hostAliases:
- ip: 10.100.99.252
hostnames: ["extend-service.test"]
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: [safety-powerjob-worker]
topologyKey: "kubernetes.io/hostname"
containers:
- name: safety-powerjob-worker
image: my-app:latest
|
6.3 关键风险
端口冲突:hostNetwork 模式下 Pod 直接占用宿主机端口。如果 2 个 Pod 调度到同一节点都想占用 8080 → 后启动的 CrashLoopBackOff。
症状:
1
2
| kubectl describe pod <pod>
# Events: ... failed to bind port 8080
|
解决:
- DaemonSet + hostAntiAffinity(推荐)
- 用 DaemonSet 而不是 Deployment(每节点最多 1 个)
- 指定 nodeName 强制单节点
6.4 实战:hostNetwork + 端口冲突排查
1
2
| journalctl -u kubelet | grep "<pod-name>"
# 看到 "port already in use"
|
常见原因:calico-kube-controllers 用了 hostNetwork,端口被其他 Pod 抢占。
7. 节点亲和与反亲和
7.1 软亲和 / 反亲和(preferred)
1
2
3
4
5
6
7
8
9
10
11
| affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: [my-app]
topologyKey: "kubernetes.io/hostname"
|
软约束:尽量不调度到一起,但不保证。生产用得最多。
7.2 硬亲和 / 反亲和(required)
1
2
3
4
5
6
7
8
9
| affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: [my-app]
topologyKey: "kubernetes.io/hostname"
|
硬约束:必须不调度到一起,调度不到就 Pending。
7.3 节点亲和(nodeAffinity)
1
2
3
4
5
6
7
8
| affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values: [ml]
|
把 Pod 调度到打了 dedicated=ml 标签的节点(专用机器学习节点)。
8. 节点选择与污点容忍
1
2
3
4
5
6
7
8
| spec:
nodeSelector:
disktype: ssd
tolerations:
- key: dedicated
operator: Equal
value: ml
effect: NoSchedule
|
nodeSelector 选节点,tolerations 容忍污点。
9. 实战:Spring Boot 微服务完整 yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
| apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
labels:
app: user-service
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: [user-service]
topologyKey: "kubernetes.io/hostname"
containers:
- name: user-service
image: my-harbor.example.com/base/user-service:v1.2.3
ports:
- name: http
containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
- name: NACOS_SERVER
value: nacos.example.com:8848
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2048Mi
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 120
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 60
periodSeconds: 5
imagePullSecrets:
- name: harbor-secret
---
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
|
10. 常见问题
10.1 Deployment 一直 RestartCount 增长
livenessProbe 失败 → kubelet 重启容器。检查:
1
2
3
| kubectl describe pod <pod>
# Events: ... Liveness probe failed
kubectl logs <pod> --previous
|
10.2 StatefulSet 扩容后 Pod 一直 Pending
PVC 创建失败。检查:
1
2
3
| kubectl get pvc -n <ns>
kubectl describe pvc <pvc-name>
# FailedBinding:节点不满足 storage class 标签
|
10.3 DaemonSet 一些节点没 Pod
1
2
3
4
| kubectl describe daemonset <name>
# 看 Desired / Current / Ready / Available
kubectl get nodes --show-labels
# 节点有 nodeSelector 标签吗?
|
10.4 hostNetwork 端口冲突
1
2
| journalctl -u kubelet -n 50 --no-pager
# 看到 "address already in use"
|
10.5 Pod 调度到 taint 节点失败
1
2
| kubectl describe pod <pod>
# Events: ... 0/3 nodes are available: 3 node(s) had taint {dedicated: ceph}, that the pod didn't tolerate.
|
加 tolerations 或换节点。
11. 小结
K8s 控制器三件套是部署应用的"基本功":
- Deployment 是默认选择(无状态)
- StatefulSet 用于有状态(必须有
serviceName) - DaemonSet 用于节点级守护
- Probe + hostNetwork + Affinity 是高级调度三件套
- RollingUpdate strategy 控制滚动更新节奏
下一步(其他批次的双机房双集群 / Redis Cluster 6 节点架构实践)。