Featured image of post K8s Master 三大组件安装:apiserver / scheduler / controller-manager

K8s Master 三大组件安装:apiserver / scheduler / controller-manager

K8s Master 节点的三大核心组件深度解析:kube-apiserver 30+ 关键参数、controller-manager 选举机制、scheduler 自定义策略、systemd 启动顺序与排错指南。

写于 2020-12,背景:K8s 1.20 弃用 dockershim 进入倒计时。本文聚焦 Master 三大组件的参数调优、systemd 守护、与 etcd / kubelet 的协作。

一、Master 三大组件的角色

K8s 控制平面(control plane)由三大组件组成,跑在 master 节点上:

组件角色端口
kube-apiserver集群网关,所有组件都跟它通信6443 (HTTPS)
kube-scheduler调度器,决定 Pod 跑哪个节点无监听端口(HTTP 10251)
kube-controller-manager控制器集合(30+ 控制器,Deployment/Node/Endpoint…)无监听端口(HTTP 10252)

协作流程:kubectl → apiserver → scheduler 看到 Pod 没绑定节点 → 选节点写回 apiserver → controller-manager 看到 Pod 被调度 → 创建容器(通过 kubelet)

二、前置准备

所有 master 节点创建目录:

1
mkdir -p /etc/kubernetes/manifests/ /etc/systemd/system/kubelet.service.d /var/lib/kubelet /var/log/kubernetes

前置依赖:

三、kube-apiserver.service

3.1 systemd 单元文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
After=network.target

[Service]
ExecStart=/usr/local/bin/kube-apiserver \
  --v=2 \
  --allow-privileged=true \
  --bind-address=0.0.0.0 \
  --secure-port=6443 \
  --advertise-address=192.168.139.133 \
  --service-cluster-ip-range=10.96.0.0/12,fd00:1111::/112 \
  --service-node-port-range=30000-32767 \
  --etcd-servers=https://192.168.139.133:2379,https://192.168.139.134:2379,https://192.168.139.135:2379 \
  --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem \
  --etcd-certfile=/etc/etcd/ssl/etcd.pem \
  --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \
  --client-ca-file=/etc/kubernetes/pki/ca.pem \
  --tls-cert-file=/etc/kubernetes/pki/apiserver.pem \
  --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem \
  --kubelet-client-certificate=/etc/kubernetes/pki/apiserver.pem \
  --kubelet-client-key=/etc/kubernetes/pki/apiserver-key.pem \
  --service-account-key-file=/etc/kubernetes/pki/sa.pub \
  --service-account-signing-key-file=/etc/kubernetes/pki/sa.key \
  --service-account-issuer=https://kubernetes.default.svc.cluster.local \
  --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \
  --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \
  --authorization-mode=Node,RBAC \
  --enable-bootstrap-token-auth=true \
  --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.pem \
  --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.pem \
  --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client-key.pem \
  --requestheader-allowed-names=aggregator \
  --requestheader-group-headers=X-Remote-Group \
  --requestheader-extra-headers-prefix=X-Remote-Extra- \
  --requestheader-username-headers=X-Remote-User \
  --enable-aggregator-routing=true
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

3.2 关键参数详解

参数含义调优建议
--advertise-addressapiserver 对外宣告的 IP每个 master 不同
--service-cluster-ip-rangeService IP 段/12 给 ~1M Service 够用
--etcd-serversetcd 集群地址多个用逗号分隔
--enable-admission-plugins准入控制器至少包含 NamespaceLifecycle,LimitRanger,ServiceAccount
--authorization-mode鉴权模式Node,RBAC 是标配
--requestheader-allowed-names聚合层(metrics-server 等)必须是 aggregator
--enable-bootstrap-token-authkubelet TLS bootstrapkubeadm 部署必开
--service-account-issuerSA token 签发者1.20+ 必须配,否则 controller 报错

3.3 启动

1
2
3
4
5
# 所有 master 节点
systemctl daemon-reload
systemctl enable --now kube-apiserver
systemctl status kube-apiserver
journalctl -f -u kube-apiserver

验证(master1):

1
2
3
4
5
6
7
curl -k https://192.168.139.133:6443/version
# {
#   "major": "1",
#   "minor": "28",
#   "gitVersion": "v1.28.5",
#   ...
# }

四、kube-controller-manager.service

4.1 systemd 单元文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
After=network.target

[Service]
ExecStart=/usr/local/bin/kube-controller-manager \
  --v=2 \
  --bind-address=0.0.0.0 \
  --root-ca-file=/etc/kubernetes/pki/ca.pem \
  --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem \
  --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem \
  --service-account-private-key-file=/etc/kubernetes/pki/sa.key \
  --kubeconfig=/etc/kubernetes/controller-manager.kubeconfig \
  --leader-elect=true \
  --use-service-account-credentials=true \
  --controllers=*,bootstrapsigner,tokencleaner
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

4.2 关键参数

参数含义
--leader-elect=true启用 leader 选举,多 master 时只有 leader 干活
--use-service-account-credentials=true每个 controller 用独立 SA(1.20+ 必须)
--controllers=*启用所有 controller(默认),可裁剪
--cluster-signing-*给新节点签发证书用的 CA(kubelet 证书签发)

4.3 controller 分类

kube-controller-manager 实际上是 30+ 控制器的合集:

控制器作用
Deployment controller维护 Deployment 期望的副本数
ReplicaSet controller维护 RS 副本
Node controller监听节点心跳,标记 NotReady
Endpoint controller维护 Service ↔ Pod 映射
ServiceAccount controller自动给 namespace 创建 default SA
Token controller给 SA 签发 token
Job controller维护一次性任务
CronJob controller维护定时任务
PV controller处理 PV/PVC 绑定
Namespace controller清理已删除 namespace 的资源
ResourceQuota controller强制 namespace 资源配额
HorizontalPodAutoscalerHPA 自动扩缩容
等等…

为什么不裁剪? K8s 默认 --controllers=*,几乎所有 controller 都需要;只有定制化集群才裁剪(比如不要 serviceaccount-controller,因为用外部 SA 系统)。

五、kube-scheduler.service

5.1 systemd 单元文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/kubernetes/kubernetes
After=network.target

[Service]
ExecStart=/usr/local/bin/kube-scheduler \
  --v=2 \
  --bind-address=0.0.0.0 \
  --leader-elect=true \
  --kubeconfig=/etc/kubernetes/scheduler.kubeconfig \
  --config=/etc/kubernetes/scheduler-config.yaml
Restart=on-failure
RestartSec=10s
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

5.2 调度配置

/etc/kubernetes/scheduler-config.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: true
  resourceNamespace: kube-system
  resourceName: kube-scheduler
profiles:
  - schedulerName: default-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesFit
          - name: NodeAffinity
          - name: PodTopologySpread
          - name: InterPodAffinity
          - name: TaintToleration
          - name: ImageLocality

5.3 调度流程(两阶段)

1. Filtering(过滤):把所有节点过一遍,留下"能跑"的节点

  • 资源够不够(CPU/内存 request)
  • 端口是否被占用
  • 是否匹配 nodeSelector / nodeAffinity
  • 是否容忍 Pod 的 toleration
  • 是否有 PVC 满足条件

2. Scoring(打分):对剩下的节点打分,选最高分

  • 资源使用率均衡(避免热点节点)
  • 镜像本地优先(pod 内已有镜像的节点加分)
  • 拓扑打散(Pod 反亲和性)
  • 自定义插件(如 binpack 优化利用率)

六、启动顺序

严格按顺序启动

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 1. etcd
systemctl start etcd
etcdctl endpoint health --cluster   # 验证

# 2. kube-apiserver
systemctl start kube-apiserver
curl -k https://localhost:6443/healthz   # 验证

# 3. controller-manager 和 scheduler(可并行)
systemctl start kube-controller-manager
systemctl start kube-scheduler

为什么这个顺序? apiserver 依赖 etcd;controller-manager 和 scheduler 都通过 apiserver 工作。

七、验证

1
2
3
4
5
6
7
8
# 在 master1 验证
kubectl get componentstatuses
# 或
kubectl get cs
# NAME                 STATUS    MESSAGE             ERROR
# controller-manager   Healthy   ok
# scheduler            Healthy   ok
# etcd-0               Healthy   {"health":"true"}

详细验证

1
2
3
4
5
6
7
8
# 看 apiserver 性能指标(需 prometheus 或 kubectl proxy)
kubectl get --raw=/metrics | grep apiserver_request_total

# 看 scheduler 调度队列
kubectl get --raw=https://localhost:10251/metrics

# 看 controller-manager 状态
kubectl get --raw=https://localhost:10252/metrics

八、HA 场景下的 leader 选举

3 master 高可用时,scheduler 和 controller-manager 都有 leader 选举机制(apiserver 不用选举,所有 apiserver 都对外服务):

  • 启动时通过 --leader-elect=true 抢占分布式锁
  • 当前 leader 挂了,其他节点 5~15 秒后接管
  • kube-scheduler-leaderkube-controller-manager-leader 是两个 ConfigMap,存当前 leader 名

查看当前 leader:

1
2
kubectl get cm -n kube-system kube-scheduler-leader -o yaml
kubectl get cm -n kube-system kube-controller-manager-leader -o yaml

九、常见坑

  1. apiserver 启动报 “x509: certificate is not valid for 10.96.0.1”:apiserver 证书的 hostname 列表漏配 10.96.0.1(Service cluster IP)或其他 IP,重新签发
  2. apiserver 启动报 “etcd cluster is unavailable”:先 etcdctl endpoint health 验证 etcd,常见是证书路径错(/etc/etcd/ssl/ vs /etc/kubernetes/pki/etcd/
  3. controller-manager 启动后 Pod 创建不出 Pod:检查 cluster-signing-cert-file 路径;或 RBAC 权限不足(controller-manager 自己的 SA kubeconfig)
  4. scheduler 日志 “leader election lost”:网络抖动,多 master 情况下短暂无影响
  5. 修改配置后不生效:必须 systemctl daemon-reload && systemctl restart kube-apiserver,不是 reload

十、前置知识 / 下一步

前置

下一步

  1. API Server 高可用(2021-03-15)—— Nginx 四层 + Keepalived VIP
  2. K8s 集群插件(2021-09-15)—— CNI / CoreDNS / Metrics / Dashboard
  3. K8s 集群管理(2021-12-15)—— 升级、节点隔离

参考资料

使用 Hugo 构建
主题 StackJimmy 设计