Featured image of post K8s 节点扩容:worker 加入集群一键脚本

K8s 节点扩容:worker 加入集群一键脚本

worker 节点加入现有 K8s 集群、证书推送、kubelet/kube-proxy 启动、kubeadm 残留清理

K8s 集群扩容的场景

集群跑了一段时间,资源不够了,要加新 worker?常见两种来源:

  1. 全新机器(裸金属 + Ubuntu 22.04)
  2. 之前装过 kubeadm 跑挂了,现在想重新加入

第二种特别棘手——kubeadm 残留的证书、配置文件、iptables 规则都会让 node join 失败。

适用版本:K8s 1.28.5 / Ubuntu 22.04


1. 全新 worker 加入(推荐)

1.1 master1 上操作

设置 hosts

1
2
3
4
cat << "EOF" >> /etc/hosts
<新 worker1 ip> worker1
<新 worker2 ip> worker2
EOF

免密登录

1
2
3
4
5
sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker1

# 如果 worker 之前有 known_hosts 冲突
ssh-keygen -f "/root/.ssh/known_hosts" -R "worker1"
sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker1

实际密码用占位符 <YOUR_SSH_PASSWORD> 替代(执行前请把 <YOUR_SSH_PASSWORD> 替换为真实 SSH 密码,或通过 export SSHPASS=...sshpass -p $SSHPASS 注入)。

推送二进制 + 证书 + 服务配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 1. 推送二进制
scp /usr/local/bin/kube{let,-proxy} worker1:/usr/local/bin/
scp /usr/local/bin/cri-dockerd worker1:/usr/local/bin/

# 2. 推送证书
ssh worker1 mkdir -p /etc/kubernetes/{manifests,pki}
cd /etc/kubernetes
for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem kubelet-conf.yml kube-proxy.yaml bootstrap-kubelet.kubeconfig kubelet.kubeconfig kube-proxy.kubeconfig; do
  scp /etc/kubernetes/$FILE worker1:/etc/kubernetes/${FILE}
done
scp -r /etc/kubernetes/manifests worker1:/etc/kubernetes

# 3. 推送 systemd 单元
for FILE in /etc/systemd/system/cri-docker.service /etc/systemd/system/cri-docker.socket /etc/systemd/system/kubelet.service /etc/systemd/system/kube-proxy.service; do
  scp $FILE worker1:${FILE}
done

1.2 一键脚本(推荐)

把上面 3 步封装到 join_k8s.sh,在 master1 上:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# join_k8s.sh <worker-name>
WORKER=$1
echo "=== Pushing binaries to $WORKER ==="
scp /usr/local/bin/kube{let,-proxy} $WORKER:/usr/local/bin/
scp /usr/local/bin/cri-dockerd $WORKER:/usr/local/bin/

echo "=== Pushing certs ==="
ssh $WORKER "mkdir -p /etc/kubernetes/{manifests,pki}"
for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem kubelet-conf.yml kube-proxy.yaml bootstrap-kubelet.kubeconfig kubelet.kubeconfig kube-proxy.kubeconfig; do
  scp /etc/kubernetes/$FILE $WORKER:/etc/kubernetes/${FILE}
done
scp -r /etc/kubernetes/manifests $WORKER:/etc/kubernetes

echo "=== Pushing systemd units ==="
for FILE in /etc/systemd/system/cri-docker.service /etc/systemd/system/cri-docker.socket /etc/systemd/system/kubelet.service /etc/systemd/system/kube-proxy.service; do
  scp $FILE $WORKER:${FILE}
done
echo "=== Done. Now SSH to $WORKER and start services ==="

worker 上的 worker.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# worker.sh
echo "=== Disable swap ==="
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab

echo "=== Set hostname ==="
hostnamectl set-hostname $(hostname)

echo "=== Install nfs-common (for nfs pv) ==="
apt install -y nfs-common

echo "=== Reload systemd ==="
systemctl daemon-reload

echo "=== Start services ==="
systemctl enable --now cri-docker.socket
systemctl enable --now cri-docker.service
systemctl enable --now kubelet.service
systemctl enable --now kube-proxy.service
systemctl restart kubelet.service
systemctl restart kube-proxy.service

echo "=== Status ==="
systemctl status cri-docker.socket
systemctl status cri-docker.service
systemctl status kubelet.service
systemctl status kube-proxy.service

echo "=== Logs ==="
journalctl -f -u kubelet

执行:

1
2
3
4
5
# 在 worker 上
bash worker.sh worker1

# 在 master1 上
bash /data/softs/join_k8s.sh worker1

1.3 验证

1
2
kubectl get node
# worker1 应该出现在列表中,STATUS 从 NotReady 变 Ready

2. 之前装过 kubeadm 的 worker 清理

kubeadm reset 是最干净的清理方式(如果还装在系统里):

1
2
# 如果 kubeadm 还在
kubeadm reset

如果 kubeadm 已经删了,需要手动清:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 卸载相关包
sudo yum remove -y kubeadm kubectl kubelet kubernetes-cni kube*
sudo yum autoremove -y

# Ubuntu
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove

# 清残留
rm -rf ~/.kube
rm -rf /var/lib/kube*
systemctl stop kubelet.service
systemctl stop kube-proxy.service
systemctl disable kubelet.service
systemctl disable kube-proxy.service

# 清 iptables 规则(kubeadm 会留下)
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X

再走"全新 worker"流程。


3. 常见坑

3.1 日志采集不到

1
2
3
failed to tail file, stat failed
stat /var/log/pods/kube-system_calico-node-95qkq_d7cbeb3d-5f17-40b9-bbb4-6e4cdb04a42e/calico-node/0.log:
no such file or directory

原因:docker 改过数据目录路径(比如 data-root)。

解决:改回 /var/lib/docker 或者把 journalctl 改用 journal 模式(不依赖文件)。

临时方案:删 calico-node Pod,让它自动重建:

1
kubectl delete pod -n kube-system -l k8s-app=calico-node

3.2 kubelet 启动报 kubelet.go:2286] "Error updating node status"

通常是 node IP 选错(多网卡场景)。在 kubelet-conf.yml 加:

1
nodeIP: <固定 IP>

3.3 kubelet: failed to create kubelet: cgroup driver "cgroupfs" is different from docker "systemd"

Docker cgroupDriver 与 kubelet 不一致:

1
2
3
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
1
systemctl restart docker

3.4 kubelet: failed to start ContainerManager ... cgroup ... not found

swap 没禁,或 cgroup 子系统没加载:

1
2
3
4
swapoff -a
# 加载内核模块
modprobe br_netfilter
modprobe ip_vs

3.5 kubelet.go:1380] "Failed to start cAdvisor" ... Failed to get cgroup stats

cgroup v2 内核与 cadvisor 不兼容。降级到 cgroup v1

1
2
3
4
5
6
# 启动参数加 systemd.unified_cgroup_hierarchy=0
# /etc/default/grub
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

update-grub
reboot

3.6 kubelet: node "<name>" not found

node 没注册到 apiserver。检查:

1
2
3
4
kubectl get csr
# 应该有 Pending 的 CSR(worker 发出的证书请求)

kubectl certificate approve <csr-name>

3.7 容器启动后立即退出

1
2
3
kubectl describe pod <pod>
# Events 里有 "Back-off pulling image" / "Failed to pull image"
# 解决:检查 docker daemon.json 的 insecure-registries + imagePullSecrets

3.8 节点 NotReady

1
2
3
4
5
6
7
kubectl describe node <name>
# Conditions:
#   Ready  False
#   NetworkUnavailable  True(calico 没装)
#   MemoryPressure    False
#   DiskPressure      False
#   PIDPressure       False

NetworkUnavailable = calico DaemonSet 还没起来,重启一下。

3.9 节点 Ready 但 Pod 起不来

1
kubectl get events -A | grep -i error

常见:calico-kube-controllers CrashLoopBackOff → 重新 apply calico.yaml。


4. 缩容

如果要把 worker 退出集群:

1
2
3
4
5
6
# 驱逐 Pod
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
# 标记不可调度
kubectl cordon <node>
# 删除节点
kubectl delete node <node>

worker 上清理:

1
2
3
4
5
6
7
8
9
systemctl stop kubelet
systemctl stop kube-proxy
systemctl disable kubelet
systemctl disable kube-proxy

rm -rf /etc/kubernetes
rm -rf /var/lib/kubelet
rm -rf /usr/local/bin/kubelet
rm -rf /usr/local/bin/kube-proxy

5. 实战:worker10 加入(带坑排查)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 1. SSH 免密
sshpass -p <YOUR_SSH_PASSWORD> ssh-copy-id -o StrictHostKeyChecking=no worker10

# 2. 跑一键脚本
bash join_k8s.sh worker10

# 3. 在 worker10 上启动
bash worker.sh

# 4. 等 30 秒
kubectl get node
# worker10 出现

日志采不到 → 删 calico-node 让它重建。


6. 小结

节点扩容 / 缩容看似简单,但隐藏的坑很多:

  1. 全新机器 走 join_k8s.sh + worker.sh 一键脚本
  2. kubeadm 残留 必须 kubeadm reset 或手动清包 + iptables
  3. calico-node CrashLoopBackOff 删 Pod 重建最省事
  4. cgroup driver 一致 是节点 Ready 的前提
  5. cgroup v2 K8s 1.24 之前需要 systemd.unified_cgroup_hierarchy=0

下一步:[cert-manager 1.13 自动证书管理 + Ingress TLS 自动化](/p/k8s-cert-manager-zi dong-zhengshu-guanli-ingress-tls-zidonghua/)。

使用 Hugo 构建
主题 StackJimmy 设计