Featured image of post K8s 实战:Deployment/StatefulSet/DaemonSet 区别 + 探针 + hostNetwork

K8s 实战:Deployment/StatefulSet/DaemonSet 区别 + 探针 + hostNetwork

三种 K8s 控制器对比、readinessProbe/livenessProbe 探针、hostNetwork 端口冲突、亲和调度

K8s 三种核心控制器

K8s 应用的"控制器"决定了 Pod 的行为模式。生产环境用错控制器,轻则"Pod 不断重启",重则"数据全丢"。


1. Deployment:无状态

1.1 原理

Deployment 通过 ReplicaSet 管 Pod 副本——保证"指定数量的 Pod 副本在运行",Pod 之间无差别

适用场景:

  • Web 服务(nginx / Spring Boot)
  • API 服务
  • 微服务(无状态)
  • 任何"可以随时被 kill + 重建"的应用

1.2 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: nginx:latest
        ports:
        - containerPort: 80

2. StatefulSet:有状态

2.1 原理

StatefulSet 给每个 Pod 分配:

  • 稳定的网络标识my-sts-0my-sts-1my-sts-2
  • 稳定的持久存储:每个 Pod 绑定独立 PVC
  • 有序部署 / 扩容 / 缩容my-sts-0 先 Ready,my-sts-1 才创建

适用场景:

  • 数据库(MySQL、MongoDB、PostgreSQL)
  • 消息队列(Kafka、RabbitMQ 集群模式)
  • 分布式存储节点
  • 任何"必须有持久身份"的应用

2.2 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-statefulset
spec:
  replicas: 3
  serviceName: my-service
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-container
          image: nginx:latest
          ports:
            - containerPort: 80
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 10Gi

2.3 关键:serviceName 必填

StatefulSet 必须有 serviceName(Headless Service 的名字),K8s 通过这个 Service 给每个 Pod 生成 DNS 记录:

1
2
my-sts-0.my-service.default.svc.cluster.local
my-sts-1.my-service.default.svc.cluster.local

3. DaemonSet:节点守护

3.1 原理

DaemonSet 在每个 Node 上都跑一个 Pod 副本。新节点加入集群 → 自动创建 Pod;节点删除 → 自动清理。

适用场景:

  • 日志收集(Promtail / Filebeat / Fluentd)
  • 节点监控(node-exporter / cadvisor)
  • 网络组件(calico-node / flannel / cilium)
  • 存储守护进程

3.2 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: my-daemonset
spec:
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: nginx:latest
        ports:
        - containerPort: 80

DaemonSet 不关心 replicas 数量,只关心"每个节点都有"。

3.3 节点选择器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
spec:
  template:
    spec:
      nodeSelector:
        node-role: worker          # 只在打了这个标签的节点跑
      tolerations:
      - key: dedicated
        operator: Equal
        value: ceph
        effect: NoSchedule         # 能容忍 NoSchedule 污点

4. 三者对比

维度DeploymentStatefulSetDaemonSet
副本数任意任意节点数
Pod 名称随机有序稳定稳定
存储共享 PVC独立 PVChostPath / 共享
扩缩容无序有序自动跟随节点
滚动更新任意有序任意
删除一起删有序删节点没了才删
适用Web / APIDB / 消息队列监控 / 日志 / 网络

5. 健康探针(Probe)

5.1 3 种探针

探针作用失败后果
livenessProbe容器是否"活着"重启容器
readinessProbe容器是否"准备好接流量"从 Service Endpoints 移除
startupProbe启动是否完成慢启动场景,避免 liveness 过早失败

5.2 Spring Boot 应用示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
spec:
  containers:
  - name: springboot-app
    image: my-app:1.0
    ports:
    - name: http
      containerPort: 8080
    - name: management
      containerPort: 9090
    livenessProbe:
      httpGet:
        path: /actuator/health/liveness
        port: management
      initialDelaySeconds: 180   # 启动后等 3 分钟
      periodSeconds: 10
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 6         # 连续失败 6 次才重启
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: management
      initialDelaySeconds: 180
      periodSeconds: 10
      timeoutSeconds: 5
      successThreshold: 1
      failureThreshold: 9         # 连续失败 9 次才下线

initialDelaySeconds 设太短(如 10s)会导致慢启动应用被频繁重启;设太长(>10min)会拖慢故障恢复。

5.3 tcpSocket 探针

1
2
3
4
5
livenessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

5.4 exec 探针

1
2
3
4
5
6
7
livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

6. hostNetwork 调度

6.1 为什么用

有些场景 Pod 必须用宿主网络命名空间

  • Ingress-nginx(DaemonSet + hostNetwork 监听 80/443)
  • 监控 agent(采集宿主机 metrics)
  • 端口被固定需要"独占"某端口

6.2 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: apps/v1
kind: Deployment
metadata:
  name: safety-powerjob-worker
  namespace: test
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: safety-powerjob-worker
    spec:
      hostNetwork: true
      hostAliases:
        - ip: 10.100.99.252
          hostnames: ["extend-service.test"]
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values: [safety-powerjob-worker]
                topologyKey: "kubernetes.io/hostname"
      containers:
        - name: safety-powerjob-worker
          image: my-app:latest

6.3 关键风险

端口冲突:hostNetwork 模式下 Pod 直接占用宿主机端口。如果 2 个 Pod 调度到同一节点都想占用 8080 → 后启动的 CrashLoopBackOff。

症状

1
2
kubectl describe pod <pod>
# Events: ... failed to bind port 8080

解决

  1. DaemonSet + hostAntiAffinity(推荐)
  2. 用 DaemonSet 而不是 Deployment(每节点最多 1 个)
  3. 指定 nodeName 强制单节点

6.4 实战:hostNetwork + 端口冲突排查

1
2
journalctl -u kubelet | grep "<pod-name>"
# 看到 "port already in use"

常见原因:calico-kube-controllers 用了 hostNetwork,端口被其他 Pod 抢占。


7. 节点亲和与反亲和

7.1 软亲和 / 反亲和(preferred)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [my-app]
          topologyKey: "kubernetes.io/hostname"

软约束:尽量不调度到一起,但不保证。生产用得最多。

7.2 硬亲和 / 反亲和(required)

1
2
3
4
5
6
7
8
9
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values: [my-app]
        topologyKey: "kubernetes.io/hostname"

硬约束:必须不调度到一起,调度不到就 Pending。

7.3 节点亲和(nodeAffinity)

1
2
3
4
5
6
7
8
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: dedicated
              operator: In
              values: [ml]

把 Pod 调度到打了 dedicated=ml 标签的节点(专用机器学习节点)。


8. 节点选择与污点容忍

1
2
3
4
5
6
7
8
spec:
  nodeSelector:
    disktype: ssd
  tolerations:
  - key: dedicated
    operator: Equal
    value: ml
    effect: NoSchedule

nodeSelector 选节点,tolerations 容忍污点。


9. 实战:Spring Boot 微服务完整 yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  labels:
    app: user-service
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values: [user-service]
                topologyKey: "kubernetes.io/hostname"
      containers:
      - name: user-service
        image: my-harbor.example.com/base/user-service:v1.2.3
        ports:
        - name: http
          containerPort: 8080
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: prod
        - name: NACOS_SERVER
          value: nacos.example.com:8848
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 2000m
            memory: 2048Mi
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 120
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 5
        imagePullSecrets:
        - name: harbor-secret
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 80
    targetPort: 8080
    name: http
  type: ClusterIP

10. 常见问题

10.1 Deployment 一直 RestartCount 增长

livenessProbe 失败 → kubelet 重启容器。检查:

1
2
3
kubectl describe pod <pod>
# Events: ... Liveness probe failed
kubectl logs <pod> --previous

10.2 StatefulSet 扩容后 Pod 一直 Pending

PVC 创建失败。检查:

1
2
3
kubectl get pvc -n <ns>
kubectl describe pvc <pvc-name>
# FailedBinding:节点不满足 storage class 标签

10.3 DaemonSet 一些节点没 Pod

1
2
3
4
kubectl describe daemonset <name>
# 看 Desired / Current / Ready / Available
kubectl get nodes --show-labels
# 节点有 nodeSelector 标签吗?

10.4 hostNetwork 端口冲突

1
2
journalctl -u kubelet -n 50 --no-pager
# 看到 "address already in use"

10.5 Pod 调度到 taint 节点失败

1
2
kubectl describe pod <pod>
# Events: ... 0/3 nodes are available: 3 node(s) had taint {dedicated: ceph}, that the pod didn't tolerate.

加 tolerations 或换节点。


11. 小结

K8s 控制器三件套是部署应用的"基本功":

  1. Deployment 是默认选择(无状态)
  2. StatefulSet 用于有状态(必须有 serviceName
  3. DaemonSet 用于节点级守护
  4. Probe + hostNetwork + Affinity 是高级调度三件套
  5. RollingUpdate strategy 控制滚动更新节奏

下一步(其他批次的双机房双集群 / Redis Cluster 6 节点架构实践)。

使用 Hugo 构建
主题 StackJimmy 设计