K8s 集群扩容的场景
集群跑了一段时间,资源不够了,要加新 worker?常见两种来源:
- 全新机器(裸金属 + Ubuntu 22.04)
- 之前装过 kubeadm 跑挂了,现在想重新加入
第二种特别棘手——kubeadm 残留的证书、配置文件、iptables 规则都会让 node join 失败。
适用版本:K8s 1.28.5 / Ubuntu 22.04
1. 全新 worker 加入(推荐)
1.1 master1 上操作
设置 hosts
1
2
3
4
| cat << "EOF" >> /etc/hosts
<新 worker1 ip> worker1
<新 worker2 ip> worker2
EOF
|
免密登录
1
2
3
4
5
| sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker1
# 如果 worker 之前有 known_hosts 冲突
ssh-keygen -f "/root/.ssh/known_hosts" -R "worker1"
sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker1
|
实际密码用占位符 <YOUR_SSH_PASSWORD> 替代(执行前请把 <YOUR_SSH_PASSWORD> 替换为真实 SSH 密码,或通过 export SSHPASS=... 后 sshpass -p $SSHPASS 注入)。
推送二进制 + 证书 + 服务配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # 1. 推送二进制
scp /usr/local/bin/kube{let,-proxy} worker1:/usr/local/bin/
scp /usr/local/bin/cri-dockerd worker1:/usr/local/bin/
# 2. 推送证书
ssh worker1 mkdir -p /etc/kubernetes/{manifests,pki}
cd /etc/kubernetes
for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem kubelet-conf.yml kube-proxy.yaml bootstrap-kubelet.kubeconfig kubelet.kubeconfig kube-proxy.kubeconfig; do
scp /etc/kubernetes/$FILE worker1:/etc/kubernetes/${FILE}
done
scp -r /etc/kubernetes/manifests worker1:/etc/kubernetes
# 3. 推送 systemd 单元
for FILE in /etc/systemd/system/cri-docker.service /etc/systemd/system/cri-docker.socket /etc/systemd/system/kubelet.service /etc/systemd/system/kube-proxy.service; do
scp $FILE worker1:${FILE}
done
|
1.2 一键脚本(推荐)
把上面 3 步封装到 join_k8s.sh,在 master1 上:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| #!/bin/bash
# join_k8s.sh <worker-name>
WORKER=$1
echo "=== Pushing binaries to $WORKER ==="
scp /usr/local/bin/kube{let,-proxy} $WORKER:/usr/local/bin/
scp /usr/local/bin/cri-dockerd $WORKER:/usr/local/bin/
echo "=== Pushing certs ==="
ssh $WORKER "mkdir -p /etc/kubernetes/{manifests,pki}"
for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem kubelet-conf.yml kube-proxy.yaml bootstrap-kubelet.kubeconfig kubelet.kubeconfig kube-proxy.kubeconfig; do
scp /etc/kubernetes/$FILE $WORKER:/etc/kubernetes/${FILE}
done
scp -r /etc/kubernetes/manifests $WORKER:/etc/kubernetes
echo "=== Pushing systemd units ==="
for FILE in /etc/systemd/system/cri-docker.service /etc/systemd/system/cri-docker.socket /etc/systemd/system/kubelet.service /etc/systemd/system/kube-proxy.service; do
scp $FILE $WORKER:${FILE}
done
echo "=== Done. Now SSH to $WORKER and start services ==="
|
worker 上的 worker.sh:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| #!/bin/bash
# worker.sh
echo "=== Disable swap ==="
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab
echo "=== Set hostname ==="
hostnamectl set-hostname $(hostname)
echo "=== Install nfs-common (for nfs pv) ==="
apt install -y nfs-common
echo "=== Reload systemd ==="
systemctl daemon-reload
echo "=== Start services ==="
systemctl enable --now cri-docker.socket
systemctl enable --now cri-docker.service
systemctl enable --now kubelet.service
systemctl enable --now kube-proxy.service
systemctl restart kubelet.service
systemctl restart kube-proxy.service
echo "=== Status ==="
systemctl status cri-docker.socket
systemctl status cri-docker.service
systemctl status kubelet.service
systemctl status kube-proxy.service
echo "=== Logs ==="
journalctl -f -u kubelet
|
执行:
1
2
3
4
5
| # 在 worker 上
bash worker.sh worker1
# 在 master1 上
bash /data/softs/join_k8s.sh worker1
|
1.3 验证
1
2
| kubectl get node
# worker1 应该出现在列表中,STATUS 从 NotReady 变 Ready
|
2. 之前装过 kubeadm 的 worker 清理
kubeadm reset 是最干净的清理方式(如果还装在系统里):
1
2
| # 如果 kubeadm 还在
kubeadm reset
|
如果 kubeadm 已经删了,需要手动清:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| # 卸载相关包
sudo yum remove -y kubeadm kubectl kubelet kubernetes-cni kube*
sudo yum autoremove -y
# Ubuntu
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove
# 清残留
rm -rf ~/.kube
rm -rf /var/lib/kube*
systemctl stop kubelet.service
systemctl stop kube-proxy.service
systemctl disable kubelet.service
systemctl disable kube-proxy.service
# 清 iptables 规则(kubeadm 会留下)
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X
|
再走"全新 worker"流程。
3. 常见坑
3.1 日志采集不到
1
2
3
| failed to tail file, stat failed
stat /var/log/pods/kube-system_calico-node-95qkq_d7cbeb3d-5f17-40b9-bbb4-6e4cdb04a42e/calico-node/0.log:
no such file or directory
|
原因:docker 改过数据目录路径(比如 data-root)。
解决:改回 /var/lib/docker 或者把 journalctl 改用 journal 模式(不依赖文件)。
临时方案:删 calico-node Pod,让它自动重建:
1
| kubectl delete pod -n kube-system -l k8s-app=calico-node
|
3.2 kubelet 启动报 kubelet.go:2286] "Error updating node status"
通常是 node IP 选错(多网卡场景)。在 kubelet-conf.yml 加:
3.3 kubelet: failed to create kubelet: cgroup driver "cgroupfs" is different from docker "systemd"
Docker cgroupDriver 与 kubelet 不一致:
1
2
3
| {
"exec-opts": ["native.cgroupdriver=systemd"]
}
|
1
| systemctl restart docker
|
3.4 kubelet: failed to start ContainerManager ... cgroup ... not found
swap 没禁,或 cgroup 子系统没加载:
1
2
3
4
| swapoff -a
# 加载内核模块
modprobe br_netfilter
modprobe ip_vs
|
3.5 kubelet.go:1380] "Failed to start cAdvisor" ... Failed to get cgroup stats
cgroup v2 内核与 cadvisor 不兼容。降级到 cgroup v1:
1
2
3
4
5
6
| # 启动参数加 systemd.unified_cgroup_hierarchy=0
# /etc/default/grub
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
update-grub
reboot
|
3.6 kubelet: node "<name>" not found
node 没注册到 apiserver。检查:
1
2
3
4
| kubectl get csr
# 应该有 Pending 的 CSR(worker 发出的证书请求)
kubectl certificate approve <csr-name>
|
3.7 容器启动后立即退出
1
2
3
| kubectl describe pod <pod>
# Events 里有 "Back-off pulling image" / "Failed to pull image"
# 解决:检查 docker daemon.json 的 insecure-registries + imagePullSecrets
|
3.8 节点 NotReady
1
2
3
4
5
6
7
| kubectl describe node <name>
# Conditions:
# Ready False
# NetworkUnavailable True(calico 没装)
# MemoryPressure False
# DiskPressure False
# PIDPressure False
|
NetworkUnavailable = calico DaemonSet 还没起来,重启一下。
3.9 节点 Ready 但 Pod 起不来
1
| kubectl get events -A | grep -i error
|
常见:calico-kube-controllers CrashLoopBackOff → 重新 apply calico.yaml。
4. 缩容
如果要把 worker 退出集群:
1
2
3
4
5
6
| # 驱逐 Pod
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
# 标记不可调度
kubectl cordon <node>
# 删除节点
kubectl delete node <node>
|
worker 上清理:
1
2
3
4
5
6
7
8
9
| systemctl stop kubelet
systemctl stop kube-proxy
systemctl disable kubelet
systemctl disable kube-proxy
rm -rf /etc/kubernetes
rm -rf /var/lib/kubelet
rm -rf /usr/local/bin/kubelet
rm -rf /usr/local/bin/kube-proxy
|
5. 实战:worker10 加入(带坑排查)
1
2
3
4
5
6
7
8
9
10
11
12
| # 1. SSH 免密
sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker10
# 2. 跑一键脚本
bash join_k8s.sh worker10
# 3. 在 worker10 上启动
bash worker.sh
# 4. 等 30 秒
kubectl get node
# worker10 出现
|
日志采不到 → 删 calico-node 让它重建。
6. 小结
节点扩容 / 缩容看似简单,但隐藏的坑很多:
- 全新机器 走 join_k8s.sh + worker.sh 一键脚本
- kubeadm 残留 必须
kubeadm reset 或手动清包 + iptables - calico-node CrashLoopBackOff 删 Pod 重建最省事
- cgroup driver 一致 是节点 Ready 的前提
- cgroup v2 K8s 1.24 之前需要
systemd.unified_cgroup_hierarchy=0
下一步:[cert-manager 1.13 自动证书管理 + Ingress TLS 自动化](/p/k8s-cert-manager-zi dong-zhengshu-guanli-ingress-tls-zidonghua/)。