Featured image of post K8s API Server 高可用:Nginx 四层 + Keepalived VIP 漂移

K8s API Server 高可用:Nginx 四层 + Keepalived VIP 漂移

3 master 集群怎么对外提供统一入口?本文用 Nginx stream 四层代理 + Keepalived VRRP 实现 VIP 漂移,含完整编译、配置、systemd 守护、健康检查脚本与故障演练。

写于 2021-03,背景:K8s 1.21 GA 阶段,二进制高可用部署仍然是国内大厂主流。本文用编译版 Nginx 1.24 + Keepalived 2.2 实现 API Server 的统一入口 VIP 漂移。

一、为什么需要 VIP

3 master 集群时,apiserver 监听在 3 个不同 IP:6443:

1
2
3
master1 → 192.168.139.133:6443
master2 → 192.168.139.134:6443
master3 → 192.168.139.135:6443

但 kubectl 只能配 1 个 --server

1
2
kubectl --server=https://192.168.139.133:6443 get nodes
# 如果 master1 挂了?

解决思路:在 master 节点前再加一层"虚拟 IP",3 个 master 通过 VRRP 协议竞选 VIP,VIP 永远在 1 个能用的 master 上。

1
2
3
4
5
kubectl → VIP:6443 (192.168.139.150)
              ↓ VRRP
     ┌────┴────┬────────┐
   master1   master2  master3
   :6443     :6443    :6443

二、方案选型

方案优点缺点
Nginx + Keepalived成熟稳定,二层/四层都能做需要编译
HAProxy + Keepalived性能比 Nginx 高配置稍复杂
Envoy + Keepalived支持高级路由太重
公有云 SLB零运维绑定云厂商

本文选 Nginx stream + Keepalived——最常见、配置清晰。

三、集群规划

主机名IP角色
master1192.168.139.133nginx + keepalived MASTER
master2192.168.139.134nginx + keepalived BACKUP
master3192.168.139.135nginx + keepalived BACKUP
VIP192.168.139.150漂移地址

端口约定

  • apiserver:6443
  • nginx stream 监听:8443(外部访问 VIP:8443 → 内部 apiserver 6443)
  • keepalived VRRP:协议号 112

四、编译 Nginx(master1 + master2)

只在两台装 nginx + keepalived 即可(master3 不需要,节省资源)。如果所有 3 台都装,keepalived 仍能正常选举。

4.1 装编译依赖

1
apt install -y make g++ openssl libssl-dev libpcre3 libpcre3-dev zlib1g-dev libgd-dev

4.2 编译 Nginx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
cd /k8s/softs
tar xf nginx-1.24.0.tar.gz && cd nginx-1.24.0

./configure \
  --prefix=/usr/local/nginx \
  --with-http_ssl_module \
  --with-http_stub_status_module \
  --with-pcre \
  --with-http_gzip_static_module \
  --with-stream \
  --with-stream_ssl_module

make -j$(nproc) && make install

关键参数:**--with-stream**(TCP 四层代理),**--with-stream_ssl_module**(与 apiserver HTTPS 通信)

4.3 Nginx 主配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
cat > /usr/local/nginx/conf/nginx.conf << 'EOF'
worker_processes  1;
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    sendfile        on;
    keepalive_timeout  65;
    autoindex  on;
    gzip  on;
    gzip_buffers 32 4K;
    gzip_comp_level 6;
    gzip_min_length 1k;
    gzip_types application/javascript text/css text/xml;
    gzip_disable "MSIE [1-6]\.";
    gzip_http_version 1.1;
    gzip_vary on;

    client_body_buffer_size 1024k;
    client_max_body_size 2048m;

    client_header_buffer_size    128k;
    large_client_header_buffers  4  128k;

    proxy_buffer_size  256k;
    proxy_buffering  on;
    proxy_buffers 64 128k;
    proxy_busy_buffers_size 512k;

    include conf.d/*.conf;
}

stream {
    log_format proxy '[$time_local] $protocol status:$status, '
                     'client: $remote_addr, upstream:"$upstream_addr", '
                     'bytes: $bytes_sent/$bytes_received '
                     'time: $session_time/$upstream_connect_time';

    include stream/*.conf;
}
EOF

mkdir -p /usr/local/nginx/conf/{conf.d,stream}

4.4 stream 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cat > /usr/local/nginx/conf/stream/stream.conf << 'EOF'
upstream k8sapiserver {
    hash $remote_addr consistent;
    server 192.168.139.133:6443  max_fails=3 fail_timeout=30s;
    server 192.168.139.134:6443  max_fails=3 fail_timeout=30s;
    server 192.168.139.135:6443  max_fails=3 fail_timeout=30s;
}

server {
    error_log  logs/8443_error.log;
    access_log logs/8443_access.log proxy;
    listen 8443;
    proxy_connect_timeout 1s;
    proxy_pass k8sapiserver;
}
EOF

关键参数

  • hash $remote_addr consistent:同一客户端始终连同一后端(会话保持)
  • max_fails=3 fail_timeout=30s:3 次失败标记 down,30 秒后重试
  • proxy_connect_timeout 1s:连不上后端立即失败,让 keepalived 切主

五、systemd 单元

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[Unit]
Description=kube-apiserver nginx proxy
After=network.target network-online.target
Wants=network-online.target

[Service]
Type=forking
ExecStartPre=/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf -p /usr/local/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf -p /usr/local/nginx
ExecReload=/bin/kill -s HUP $MAINPID
PrivateTmp=true
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

启动:

1
2
3
4
systemctl daemon-reload
systemctl enable --now kube-nginx
systemctl status kube-nginx
curl 192.168.139.133:8443  # 不报错即可

六、Keepalived 配置

6.1 编译 Keepalived(master1 + master2)

1
2
3
4
5
cd /k8s/softs
tar xf keepalived-2.2.8.tar.gz && cd keepalived-2.2.8
./configure --prefix=/usr/local/keepalived
make -j$(nproc) && make install
mkdir -p /usr/local/keepalived/conf

6.2 健康检查脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cat > /usr/local/keepalived/conf/check_nginx.sh << 'EOF'
#!/bin/bash
A=$(ps -C nginx --no-header | wc -l)
if [ $A -eq 0 ]; then
    /usr/bin/systemctl start kube-nginx
    sleep 2
    if [ $(ps -C nginx --no-header | wc -l) -eq 0 ]; then
        /usr/bin/systemctl stop kube-keepalived
    fi
fi
EOF
chmod +x /usr/local/keepalived/conf/check_nginx.sh

逻辑:nginx 挂了 → 尝试重启 → 启不来就停 keepalived,让 VIP 漂到其他节点。

6.3 master1 配置(priority 100 = MASTER)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > /usr/local/keepalived/conf/kube-keepalived.conf << 'EOF'
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_nginx {
    script "/usr/local/keepalived/conf/check_nginx.sh"
    interval 5
    weight -5
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state MASTER
    interface ens33
    mcast_src_ip 192.168.139.133
    virtual_router_id 51
    priority 100
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        192.168.139.150
    }
    track_script {
        check_nginx
    }
}
EOF

6.4 master2 配置(priority 90 = BACKUP)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > /usr/local/keepalived/conf/kube-keepalived.conf << 'EOF'
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_nginx {
    script "/usr/local/keepalived/conf/check_nginx.sh"
    interval 5
    weight -5
    fall 2
    rise 1
}
vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    mcast_src_ip 192.168.139.134
    virtual_router_id 51
    priority 90
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        192.168.139.150
    }
    track_script {
        check_nginx
    }
}
EOF

关键参数

  • virtual_router_id 51:同一集群必须一致(不同集群必须不同!)
  • priority 100 > 90:数字大 = 高优先级
  • nopreempt:原 master 恢复后不抢回 VIP(避免脑裂)
  • track_script:nginx 挂了直接降 priority 5,触发 VIP 漂移
  • auth_pass K8SHA_KA_AUTH:明文 VRRP 认证,生产请用 16 位随机字符串

6.5 systemd 单元

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[Unit]
Description=VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target
Documentation=https://keepalived.org/manpage.html

[Service]
Type=forking
KillMode=process
ExecStart=/usr/local/keepalived/sbin/keepalived -D -f /usr/local/keepalived/conf/kube-keepalived.conf
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

启动:

1
2
3
4
systemctl daemon-reload
systemctl enable --now kube-keepalived
systemctl status kube-keepalived
journalctl -f -u kube-keepalived

七、验证

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# master1 上看 VIP 是否绑定
ip addr show ens33
# inet 192.168.139.133/24 ...
# inet 192.168.139.150/32 ...  ← 看到这行说明 VIP 成功

# 验证 VIP 可访问 apiserver
curl -k https://192.168.139.150:8443/version
# {
#   "major": "1",
#   "gitVersion": "v1.28.5",
#   ...
# }

配置 kubectl 使用 VIP

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mkdir -p ~/.kube
cat > ~/.kube/config << EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
    server: https://192.168.139.150:8443
    certificate-authority-data: <base64 编码的 ca.pem>
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: default
current-context: default
users:
- name: kubernetes-admin
  user:
    client-certificate-data: <base64 编码的 admin.pem>
    client-key-data: <base64 编码的 admin-key.pem>
EOF

kubectl get nodes

八、故障演练

8.1 模拟 master1 整机宕机

1
2
# 在 master1 上
shutdown -h now

观察 master2:

1
2
3
4
5
6
7
# master2 上
ip addr show ens33
# inet 192.168.139.134/24 ...
# inet 192.168.139.150/32 ...  ← VIP 漂到 master2

# 验证 VIP 仍可访问
curl -k https://192.168.139.150:8443/version

预期 5~10 秒后 VIP 漂到 master2,kubectl 无感。

8.2 模拟 master1 nginx 进程挂

1
2
# 在 master1 上
pkill nginx

观察:

  • check_nginx.sh 检测到 nginx 死 → 尝试 systemctl start kube-nginx
  • 如果能起来:keepalived 继续,VIP 不漂
  • 如果起不来(概率低):脚本 systemctl stop kube-keepalived,VIP 强制漂走

8.3 恢复 master1

1
2
# master1
systemctl start kube-apiserver kube-nginx kube-keepalived

由于 nopreempt,VIP 不会抢回(继续在 master2)。这是有意设计——避免脑裂。

九、生产环境加固

  1. 关闭 mcast_src_ip 改 unicast(云环境 VRRP multicast 被禁用):

    1
    2
    3
    4
    5
    6
    7
    
    vrrp_instance VI_1 {
        ...
        unicast_src_ip 192.168.139.133
        unicast_peer {
            192.168.139.134
        }
    }
    
  2. 认证密码用随机 16 位

    1
    
    openssl rand -base64 12
    
  3. keepalived 配 3 台而不是 2 台:3 节点可以容忍 1 节点挂,2 节点只容忍 0 节点挂(脑裂风险大)。

  4. 监控 VIP:把 192.168.139.150 加入 Prometheus blackbox_exporter 监控。

  5. DNS 解析:把 kubernetes-api.example.com 解析到 VIP,kubectl 用域名而不是 IP。

十、常见坑

  1. virtual_router_id 不一致:3 节点 VRRP 选举失败
  2. 多播被防火墙挡:VRRP 协议号 112,iptables 默认会拦
  3. nopreempt 没配:master1 恢复时把 VIP 抢回,可能导致短暂脑裂
  4. check_nginx.sh 权限不足:keepalived 跑不了脚本,nginx 挂 VIP 不漂
  5. VIP 绑定到错的 interfaceinterface ens33 配错网卡,VIP 上不去

十一、前置知识 / 下一步

前置

下一步

  1. K8s 集群插件(2021-09-15)—— CNI / CoreDNS / Metrics / Dashboard
  2. K8s 集群管理(2021-12-15)—— 升级、节点隔离、排错

参考资料

使用 Hugo 构建
主题 StackJimmy 设计