GlusterFS 在 K8s 时代的角色
GlusterFS 是一个用户态分布式文件系统,2011 年由 Red Hat 收购,是 OpenStack 默认后端存储之一。它有 7 种卷类型,从最简单的"分布式"(类似 RAID0)到"分布式复制卷"(类似 RAID10),可以覆盖几乎所有文件存储场景。
适用版本:GlusterFS 10.5 / Kadalu Operator 1.3.0 / K8s 1.28.5
1. GlusterFS 七种卷类型
| 卷类型 | 英文 | 是否支持(10.5) | 特点 |
|---|
| 分布式卷 | Distributed | ✅ | 文件 HASH 到所有 Brick,无冗余,类似 RAID0 |
| 复制卷 | Replicated | ✅ | 多副本同步,有容错,类似 RAID1 |
| 分布式复制卷 | Distributed Replicated | ✅ | 兼具分布式+复制,生产推荐 |
| 条带卷 | Stripe | ❌(10.5 移除) | 数据分块轮询,类似 RAID0 |
| 分布式条带卷 | Distributed Stripe | ❌ | 大文件场景 |
| 条带复制卷 | Stripe Replica | ❌ | RAID10 |
| 分布式条带复制卷 | Distributed Stripe Replica | ❌ | 三种基本卷复合 |
| 纠删码卷 | Dispersed | ✅ | 类似 RAID5/6 |
| 分布式纠删码卷 | Distributed Dispersed | ✅ | 兼具分布式+纠删码 |
2. 节点与目录准备
4 节点集群示例:
1
2
3
4
| 10.0.0.5 worker5
10.0.0.6 worker6
10.0.0.7 worker7
10.0.0.8 worker8
|
共享目录 /gfs(100GB),所有节点免密登录:
1
2
3
4
5
6
7
| apt install -y sshpass
ssh-keygen -f /root/.ssh/id_rsa -P ''
export IP="worker5 worker6 worker7 worker8"
export SSHPASS=<YOUR_SSH_PASSWORD>
for HOST in $IP; do
sshpass -e ssh-copy-id -o StrictHostKeyChecking=no $HOST
done
|
实际密码用占位符 <YOUR_SSH_PASSWORD> 替代(执行前请 export SSHPASS=真实密码 后用 sshpass -e 形式调用,避免密码出现在 shell history)。
3. 安装 GlusterFS
3.1 4 节点同时执行
1
2
3
4
5
6
7
8
9
10
11
| mkdir -p /gfs
# Ubuntu 22.04 添加 glusterfs 10 源
apt install software-properties-common -y
add-apt-repository ppa:gluster/glusterfs-10 -y
apt update
apt install glusterfs-server -y
systemctl start glusterd
systemctl enable glusterd
gluster --version
# glusterfs 10.5
|
3.2 加入可信池(任意一节点执行)
1
2
3
4
5
6
7
8
9
10
| gluster peer probe 10.0.0.6
gluster peer probe 10.0.0.7
gluster peer probe 10.0.0.8
gluster pool list
# UUID Hostname State
# 6c831f26-52ee-4895-9df8-ea9f16670cab worker6 Connected
# 48b796bf-3d08-43c6-976c-4e3f586345e5 worker7 Connected
# 69d9b73c-d8e5-42fa-865f-5db204b9b14f worker8 Connected
# 4ea470e9-710a-458d-acd1-1be3c3937609 localhost Connected
|
注意:peer probe 一次就够(任意一节点执行),其他节点自动同步。
4. 创建分布式卷(无冗余)
master 节点(这里是 worker5/6/7/8 4 节点做演示),先把磁盘做成 XFS:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| mkfs.xfs -f /dev/sdb
mkfs.xfs -f /dev/sdc
mkdir -p /mnt/gluster/sdb
mkdir -p /mnt/gluster/sdc
mount /dev/sdb /mnt/gluster/sdb
mount /dev/sdc /mnt/gluster/sdc
echo "/dev/sdb /mnt/gluster/sdb xfs defaults,noatime 0 0" >> /etc/fstab
echo "/dev/sdc /mnt/gluster/sdc xfs defaults,noatime 0 0" >> /etc/fstab
# 创建子目录
mkdir -p /mnt/gluster/sdb/brick
mkdir -p /mnt/gluster/sdc/brick
umount /mnt/gluster/sdb
|
创建分布式卷:
1
2
3
4
5
6
7
8
9
10
11
12
| gluster volume create dv \
transport tcp \
10.0.0.5:/mnt/gluster/sdb/brick \
10.0.0.5:/mnt/gluster/sdc/brick \
10.0.0.6:/mnt/gluster/sdb/brick \
10.0.0.6:/mnt/gluster/sdc/brick \
10.0.0.7:/mnt/gluster/sdb/brick \
10.0.0.7:/mnt/gluster/sdc/brick
gluster volume start dv
gluster volume quota dv enable
gluster volume info dv
|
4.1 开机自动启动
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| cat << "EOF" > /etc/systemd/system/gdv.service
[Unit]
Description=Start GlusterFS Volume dv on Boot
After=glusterd.service
Requires=glusterd.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/sbin/gluster --mode=script volume start dv
ExecStop=/usr/sbin/gluster --mode=script volume stop dv force
ExecReload=/usr/sbin/gluster --mode=script volume stop dv -f && /usr/sbin/gluster --mode=script volume start dv
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable gdv
systemctl start gdv
|
4.2 客户端使用
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 安装客户端
apt install glusterfs-client -y
# 挂载
mkdir -p /mnt/glusterfs_dv
mount -t glusterfs 10.0.0.5:/dv /mnt/glusterfs_dv
df -h
# 10.0.0.5:/dv 5.3T 92G 5.2T 2% /mnt/glusterfs_dv
# 开机自动挂载
echo "10.0.0.5:/dv /mnt/glusterfs_dv glusterfs defaults,_netdev 0 0" >> /etc/fstab
mount -a
|
4.3 性能压测(fio)
1
2
3
4
5
6
7
8
9
10
11
12
13
| apt install fio
fio --name=randread \
--filename=/mnt/glusterfs_dv/testfile \
--bs=16k \
--size=1G \
--time_based \
--runtime=60 \
--rw=randread \
--ioengine=libaio \
--direct=1 \
--iodepth=16
# 结果:108MiB/s 吞吐、6886 IOPS,平均延迟 2.3ms
|
5. 分布式复制卷(生产推荐)
1
2
3
4
5
6
7
8
9
10
11
| gluster volume create gv replica 2 transport tcp \
10.0.0.5:/gfs \
10.0.0.6:/gfs \
10.0.0.7:/gfs \
10.0.0.8:/gfs \
force
# Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this
gluster volume start gv
gluster volume quota gv enable
gluster volume info gv
|
重要警告:
volume create: gv: failed: The brick worker5:/gfs is being created in the root partition. ... use 'force' → 用 force 绕过- Replica 2 易脑裂,生产建议 Replica 3(6 节点起步)
删除卷:
1
2
3
4
5
| gluster volume stop gv
gluster volume delete gv
gluster peer detach worker6
gluster peer detach worker7
gluster peer detach worker8
|
6. 客户端验证
1
2
3
4
5
6
7
8
9
10
11
12
| # 在 worker5 上挂
mount -t glusterfs 10.0.0.5:/gv /mnt/glusterfs_gv
# 写 9 个文件
touch /mnt/test{1..9}
# 验证分布(前 2 节点一个复制集,后 2 节点一个复制集)
ls /gfs
# worker5 看到 test1 test2 test4 test5 test8 test9
# worker6 看到 test1 test2 test4 test5 test8 test9
# worker7 看到 test3 test6 test7
# worker8 看到 test3 test6 test7
|
7. Kadalu Operator 接入 K8s
Kadalu 是 GlusterFS 官方推荐的 K8s CSI 方案,把 GlusterFS 集群变成 K8s StorageClass。
7.1 镜像准备
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| docker login -u admin -p {{HARBOR_PASSWORD}} <harbor-ip>:13001
docker pull kadalu/kadalu-operator:1.3.0
docker tag kadalu/kadalu-operator:1.3.0 <harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0
docker push <harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0
docker pull kadalu/kadalu-csi:1.3.0
docker tag kadalu/kadalu-csi:1.3.0 <harbor-ip>:13001/base/kadalu/kadalu-csi:1.3.0
docker push <harbor-ip>:13001/base/kadalu/kadalu-csi:1.3.0
docker pull raspbernetes/csi-node-driver-registrar:2.0.1
docker tag raspbernetes/csi-node-driver-registrar:2.0.1 <harbor-ip>:13001/base/raspbernetes/csi-node-driver-registrar:2.0.1
docker push <harbor-ip>:13001/base/raspbernetes/csi-node-driver-registrar:2.0.1
docker pull raspbernetes/csi-external-provisioner:2.0.2
docker tag raspbernetes/csi-external-provisioner:2.0.2 <harbor-ip>:13001/base/raspbernetes/csi-external-provisioner:2.0.2
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-provisioner:2.0.2
docker pull raspbernetes/csi-external-attacher:3.0.0
docker tag raspbernetes/csi-external-attacher:3.0.0 <harbor-ip>:13001/base/raspbernetes/csi-external-attacher:3.0.0
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-attacher:3.0.0
docker pull raspbernetes/csi-external-resizer:1.0.0
docker tag raspbernetes/csi-external-resizer:1.0.0 <harbor-ip>:13001/base/raspbernetes/csi-external-resizer:1.0.0
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-resizer:1.0.0
docker pull busybox
docker tag busybox <harbor-ip>:13001/base/library/busybox
docker push <harbor-ip>:13001/base/library/busybox
|
7.2 SSH 密钥
1
2
3
4
5
| kubectl create ns kadalu
kubectl create secret generic glusterquota-ssh-secret \
--from-literal=glusterquota-ssh-username=root \
--from-file=ssh-privatekey=/root/.ssh/id_rsa \
-n kadalu
|
7.3 部署 Operator
1
2
3
4
5
| curl -L -o kadalu-operator.yaml https://github.com/kadalu/kadalu/releases/download/1.3.0/kadalu-operator.yaml
# 替换镜像
sed -i "s#docker.io/kadalu/kadalu-operator:1.3.0#<harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0#g" kadalu-operator.yaml
kubectl apply -f kadalu-operator.yaml
|
7.4 接入外部 GlusterFS 集群
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
name: gfs
spec:
type: External
kadalu_format: native
details:
gluster_hosts:
- 10.0.0.5
- 10.0.0.6
- 10.0.0.7
- 10.0.0.8
gluster_volname: gv
gluster_options: log-level=DEBUG
|
1
| kubectl apply -f /data/k8scnf/kadalu/kadalu.yml
|
7.5 设置默认 StorageClass
1
| kubectl patch storageclass kadalu.gfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
|
7.6 测试
1
2
3
4
5
6
7
8
9
10
11
12
13
| # gfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-gfs
namespace: testgfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: kadalu.gfs
|
1
2
3
4
5
6
| kubectl create ns testgfs
kubectl apply -f gfs-pvc.yaml
kubectl apply -f gfs-nginx.yaml
# 访问 nginx(通过 nodeport)
http://<vip-internal>:30052
|
上传文件到 /mnt/subvol/22/88/pvc-xxx/(glusterfs 物理路径),页面能立刻看到。
8. 常见问题
8.1 chown: changing ownership of '/data': Transport endpoint is not connected
1
2
| gluster pool list
# 看哪些 peer Disconnected
|
修复:检查防火墙、24007 端口、glusterd 服务。
8.2 volume create: failed: The brick ... is being created in the root partition
brick 必须在独立挂载点(或子目录),不能直接用根分区。用 force 绕过。
8.3 Replica 2 脑裂
Replica 2 易脑裂。生产用 Replica 3 或 Arbiter Volume。
8.4 卷扩容
1
| gluster volume add-brick gv node03:/data/brick1 node04:/data/brick1
|
8.5 Kadalu 排错
1
2
3
4
5
| # 节点上能访问 gluster
nc -zv 10.0.0.5 24007
# 容器内能查
kubectl exec -it kadalu-csi-provisioner-0 -n kadalu -c kadalu-logging -- sh
|
9. 小结
GlusterFS 在 K8s 上依然有它的位置——比 Ceph 简单,比 NFS 强:
- 分布式卷 无冗余,胜在容量大
- 分布式复制卷 兼顾容量和安全,生产推荐
- Kadalu 是把 GlusterFS 集群接入 K8s 的官方方式
- Replica 3 才能保证脑裂不丢数据
下一步:MinIO / JuiceFS / OpenEBS:K8s 对象存储与本地 PV 全方案。