Featured image of post K8s Kubeadm 一键部署:从节点规划到应用上线

K8s Kubeadm 一键部署:从节点规划到应用上线

Kubeadm 是 K8s 官方推荐的集群 bootstrap 工具,1 行命令起集群。本文覆盖主机规划、containerd 安装、kubeadm init/join、CNI Calico、kubeadm reset 清理、常见坑与版本升级路径。

写于 2022-06,背景:K8s 1.24 正式移除 dockershim,1.25 GA。本文以 kubeadm 1.25 + containerd 1.7 + Ubuntu 22.04 为模板,覆盖从单机到多节点 kubeadm 部署完整流程。

一、Kubeadm vs 二进制部署

维度kubeadm二进制
学习成本低(1 行 init)高(30+ 步骤)
灵活度中(kubeadm 配置项可调)极高(所有 systemd 单元自己写)
升级官方 kubeadm upgrade 一条命令手动替换二进制 + 改配置
适合场景生产首选学习 / 深度定制
证书管理kubeadm 自动签发 + 90 天续期自己签 + 自己续
集群自举kubeadm 自动完成所有控制平面组件部署自己启动 6+ 个 systemd 单元

结论生产用 kubeadm,学习用二进制。两者结合最理想。

二、集群规划

3 master + N worker HA 集群(生产推荐):

主机名IP角色OS
master1192.168.139.133masterUbuntu 22.04
master2192.168.139.134masterUbuntu 22.04
master3192.168.139.135masterUbuntu 22.04
worker1192.168.139.136workerUbuntu 22.04
worker2192.168.139.137workerUbuntu 22.04
VIP192.168.139.150浮动(Keepalived)

单 master 集群(测试):

主机名IP
master1192.168.139.133
worker1192.168.139.136

网络规划

  • Node 网络:192.168.139.0/24
  • Service 网络:10.96.0.0/12
  • Pod 网络:172.218.0.0/16(Calico)

三、所有节点系统配置

3.1 主机名 + hosts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 每个节点
hostnamectl set-hostname master1  # 各节点不同

# 所有节点
cat >> /etc/hosts <<EOF
192.168.139.133 master1
192.168.139.134 master2
192.168.139.135 master3
192.168.139.136 worker1
192.168.139.137 worker2
EOF

3.2 关闭 swap / 防火墙 / SELinux

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# swap(K8s 强制关)
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 防火墙(K8s 自己用 iptables)
systemctl stop ufw && systemctl disable ufw
# CentOS: systemctl stop firewalld && systemctl disable firewalld

# SELinux(CentOS 才有)
setenforce 0
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config

3.3 时间同步

1
2
3
timedatectl set-timezone Asia/Shanghai
apt install -y ntpdate
echo "0 */1 * * * ntpdate time1.aliyun.com" >> /var/spool/cron/crontabs/root

3.4 内核模块与参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 加载模块
cat <<EOF > /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

# 内核参数
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

3.5 ulimit / IPVS(master 节点)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ulimit -SHn 65535
cat >> /etc/security/limits.conf <<EOF
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
EOF

apt install -y ipset ipvsadm
cat <<EOF | tee /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF
systemctl restart systemd-modules-load.service
lsmod | grep ip_vs

四、安装 containerd(K8s 1.24+)

K8s 1.24 移除 dockershim,必须装 containerd / CRI-O。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# 下载
wget https://github.com/containerd/containerd/releases/download/v1.7.11/cri-containerd-1.7.11-linux-amd64.tar.gz

# 解压到根目录
tar Cxzvf / cri-containerd-1.7.11-linux-amd64.tar.gz

# 生成配置
mkdir /etc/containerd
containerd config default > /etc/containerd/config.toml

# 改 sandbox_image 为阿里云镜像
sed -i 's#k8s.gcr.io/pause:3.8#registry.aliyuncs.com/google_containers/pause:3.9#g' /etc/containerd/config.toml

# 改 SystemdCgroup
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml

# 启动
systemctl daemon-reload
systemctl enable --now containerd
systemctl status containerd

五、安装 kubeadm / kubelet / kubectl

5.1 阿里云镜像源(推荐 cn region)

1
2
3
4
5
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | \
  gpg --dearmor > /usr/share/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main' | \
  sudo tee /etc/apt/sources.list.d/kubernetes.list
apt update

5.2 安装

1
2
3
4
5
6
7
8
# 查看可用版本
apt-cache madison kubeadm

# 安装指定版本(1.28.2 阿里云镜像版本)
apt install -y kubelet=1.28.2-0 kubeadm=1.28.2-0 kubectl=1.28.2-0

# 锁定版本(防止 apt upgrade 误升)
apt-mark hold kubelet kubeadm kubectl

5.3 命令补全(所有节点)

1
2
3
echo 'source <(kubectl completion bash)' >> ~/.bashrc
alias k=kubectl
complete -F __start_kubectl k

六、初始化 master 节点

6.1 单 master 集群

1
2
3
4
5
6
7
8
9
# 预下载镜像(可选,但生产推荐)
kubeadm config images pull --image-repository=registry.aliyuncs.com/google_containers

# 初始化(默认 /etc/kubernetes/admin.conf)
kubeadm init \
  --image-repository=registry.aliyuncs.com/google_containers \
  --kubernetes-version=v1.28.2 \
  --pod-network-cidr=172.218.0.0/16 \
  --service-cidr=10.96.0.0/12

6.2 高可用集群(3 master)

先用 kubeadm-config.yaml 配置 VIP:

1
kubeadm config print init-defaults > kubeadm-config.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 编辑 kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.139.133   # master1 IP
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.2
controlPlaneEndpoint: "192.168.139.150:6443"  # VIP(Keepalived 漂移)
imageRepository: registry.aliyuncs.com/google_containers
networking:
  podSubnet: 172.218.0.0/16
  serviceSubnet: 10.96.0.0/12
etcd:
  local:
    extraArgs:
      listen-client-urls: "https://127.0.0.1:2379,https://192.168.139.133:2379"
      advertise-client-urls: "https://192.168.139.133:2379"
1
2
3
4
5
# 初始化
kubeadm init --config kubeadm-config.yaml --upload-certs

# 注意:--upload-certs 自动把证书上传到 etcd
# 其他 master 用 kubeadm join 时加 --certificate-key

保存 join 命令

1
2
3
4
5
6
# 输出末尾会有 join 命令,复制保存
# 其他 master:
kubeadm join 192.168.139.150:6443 --token xxx --discovery-token-ca-cert-hash sha256:xxx --control-plane --certificate-key xxx

# worker:
kubeadm join 192.168.139.150:6443 --token xxx --discovery-token-ca-cert-hash sha256:xxx

6.3 配置 kubectl

1
2
3
4
5
6
7
mkdir -p $HOME/.kube
cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 验证
kubectl get nodes
# master1   NotReady   control-plane   1m   v1.28.2  ← NotReady 因为还没装 CNI

七、安装 CNI(Calico)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Tigera Operator
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml

# Custom Resources
cat <<EOF | kubectl apply -f -
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
      - blockSize: 26
        cidr: 172.218.0.0/16
        encapsulation: VXLANCrossSubnet
        natOutgoing: Enabled
        nodeSelector: all()
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}
EOF

# 等待所有 calico pod running
kubectl get pods -n calico-system -w

八、加入其他节点

8.1 加入 master 节点

1
2
3
4
5
6
# 在 master2 执行
kubeadm join 192.168.139.150:6443 \
  --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:xxx \
  --control-plane \
  --certificate-key xxx

8.2 加入 worker 节点

1
2
3
4
# 在 worker1 执行
kubeadm join 192.168.139.150:6443 \
  --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:xxx

8.3 token 过期

master 节点重新生成:

1
2
kubeadm token create --print-join-command
# 输出直接可用的 join 命令

8.4 验证集群

1
2
3
4
5
6
7
kubectl get nodes
# NAME      STATUS   ROLES           AGE     VERSION
# master1   Ready    control-plane   10m     v1.28.2
# master2   Ready    control-plane   5m      v1.28.2
# master3   Ready    control-plane   3m      v1.28.2
# worker1   Ready    <none>          1m      v1.28.2
# worker2   Ready    <none>          1m      v1.28.2

九、部署测试应用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# 1. 部署 Deployment + Service(Nginx)
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
    nodePort: 30080
EOF

# 2. 查看
kubectl get pods
kubectl get svc nginx
# NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
# nginx   NodePort   10.108.108.103  <none>        80:30080/TCP   30s

# 3. 测试访问
curl http://<任意节点 IP>:30080/

十、清理集群

10.1 优雅重置 master

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 先 drain 节点
kubectl drain master1 --delete-emptydir-data --force --ignore-daemonsets

# 删除所有节点
kubectl delete node --all

# 重置
kubeadm reset
rm -rf /etc/cni/net.d/*
rm -rf /var/lib/cni/calico
rm -rf $HOME/.kube
systemctl daemon-reload && systemctl restart kubelet

10.2 清理 worker 节点

1
2
3
kubeadm reset
# 清理 ipvs 规则
ipvsadm --clear

10.3 手动清理(kubeadm reset 失败时)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 停服务
systemctl stop kubelet kube-proxy containerd
systemctl disable kubelet kube-proxy containerd

# 删文件
rm -rf /var/lib/kubelet
rm -rf /var/lib/containerd
rm -rf /etc/kubernetes
rm -rf /etc/cni/net.d
rm -rf /opt/cni/bin
rm -rf /var/log/pods
rm -rf /var/log/containers

# 清 iptables
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear

十一、升级集群

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 1. 查看升级目标
kubeadm upgrade plan

# 2. 升级 kubeadm(先升第一个 master)
apt-mark unhold kubeadm kubelet kubectl
apt install -y kubeadm=1.28.5-00
apt-mark hold kubeadm kubelet kubectl

# 3. drain 节点
kubectl drain master1 --ignore-daemonsets

# 4. 升级 control-plane
kubeadm upgrade apply v1.28.5

# 5. 升级 kubelet + kubectl
apt install -y kubelet=1.28.5-00 kubectl=1.28.5-00
systemctl daemon-reload
systemctl restart kubelet

# 6. uncordon
kubectl uncordon master1

# 7. 其他 master / worker 重复

十二、常见坑

  1. token 过期:默认 24h,过期后 worker 无法 join,重新生成
  2. single master 改 HA:kubeadm 不支持,必须 reset 重装
  3. CNI 没装先 join worker:worker 卡 NotReady,先装 Calico
  4. controlPlaneEndpoint 用了 master1 IP:master1 挂了就完蛋,必须用 VIP 或 DNS
  5. image-repository 没设:国内环境拉镜像超时
  6. kubeadm reset 失败:手动清理(见 10.3)
  7. Calico / CalicoTypha 内存不足:节点 < 2GB 会卡,加 –memory 限制

十三、前置知识 / 下一步

前置

下一步

  1. K8s 集群插件(2021-09-15)—— CNI/CoreDNS/Metrics/Dashboard
  2. K8s 资源限制与探针(2022-03-15)—— Pod 调优
  3. K8s 集群管理(2021-12-15)—— 升级 / 节点隔离

参考资料

使用 Hugo 构建
主题 StackJimmy 设计