Featured image of post GlusterFS 10 分布式文件系统:七种卷类型 + Kadalu CSI 接入

GlusterFS 10 分布式文件系统:七种卷类型 + Kadalu CSI 接入

GlusterFS 10 五种卷类型详解、brick 与 peer 池、fio 压测、Kadalu Operator 接入 K8s

GlusterFS 在 K8s 时代的角色

GlusterFS 是一个用户态分布式文件系统,2011 年由 Red Hat 收购,是 OpenStack 默认后端存储之一。它有 7 种卷类型,从最简单的"分布式"(类似 RAID0)到"分布式复制卷"(类似 RAID10),可以覆盖几乎所有文件存储场景。

适用版本:GlusterFS 10.5 / Kadalu Operator 1.3.0 / K8s 1.28.5


1. GlusterFS 七种卷类型

卷类型英文是否支持(10.5)特点
分布式卷Distributed文件 HASH 到所有 Brick,无冗余,类似 RAID0
复制卷Replicated多副本同步,有容错,类似 RAID1
分布式复制卷Distributed Replicated兼具分布式+复制,生产推荐
条带卷Stripe❌(10.5 移除)数据分块轮询,类似 RAID0
分布式条带卷Distributed Stripe大文件场景
条带复制卷Stripe ReplicaRAID10
分布式条带复制卷Distributed Stripe Replica三种基本卷复合
纠删码卷Dispersed类似 RAID5/6
分布式纠删码卷Distributed Dispersed兼具分布式+纠删码

2. 节点与目录准备

4 节点集群示例:

1
2
3
4
10.0.0.5  worker5
10.0.0.6  worker6
10.0.0.7  worker7
10.0.0.8  worker8

共享目录 /gfs(100GB),所有节点免密登录:

1
2
3
4
5
6
7
apt install -y sshpass
ssh-keygen -f /root/.ssh/id_rsa -P ''
export IP="worker5 worker6 worker7 worker8"
export SSHPASS=<YOUR_SSH_PASSWORD>
for HOST in $IP; do
  sshpass -e ssh-copy-id -o StrictHostKeyChecking=no $HOST
done

实际密码用占位符 <YOUR_SSH_PASSWORD> 替代(执行前请 export SSHPASS=真实密码 后用 sshpass -e 形式调用,避免密码出现在 shell history)。


3. 安装 GlusterFS

3.1 4 节点同时执行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
mkdir -p /gfs

# Ubuntu 22.04 添加 glusterfs 10 源
apt install software-properties-common -y
add-apt-repository ppa:gluster/glusterfs-10 -y
apt update
apt install glusterfs-server -y
systemctl start glusterd
systemctl enable glusterd
gluster --version
# glusterfs 10.5

3.2 加入可信池(任意一节点执行)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
gluster peer probe 10.0.0.6
gluster peer probe 10.0.0.7
gluster peer probe 10.0.0.8

gluster pool list
# UUID                                    Hostname        State
# 6c831f26-52ee-4895-9df8-ea9f16670cab    worker6         Connected
# 48b796bf-3d08-43c6-976c-4e3f586345e5    worker7         Connected
# 69d9b73c-d8e5-42fa-865f-5db204b9b14f    worker8         Connected
# 4ea470e9-710a-458d-acd1-1be3c3937609    localhost       Connected

注意:peer probe 一次就够(任意一节点执行),其他节点自动同步。


4. 创建分布式卷(无冗余)

master 节点(这里是 worker5/6/7/8 4 节点做演示),先把磁盘做成 XFS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mkfs.xfs -f /dev/sdb
mkfs.xfs -f /dev/sdc

mkdir -p /mnt/gluster/sdb
mkdir -p /mnt/gluster/sdc

mount /dev/sdb /mnt/gluster/sdb
mount /dev/sdc /mnt/gluster/sdc

echo "/dev/sdb /mnt/gluster/sdb xfs defaults,noatime 0 0" >> /etc/fstab
echo "/dev/sdc /mnt/gluster/sdc xfs defaults,noatime 0 0" >> /etc/fstab

# 创建子目录
mkdir -p /mnt/gluster/sdb/brick
mkdir -p /mnt/gluster/sdc/brick

umount /mnt/gluster/sdb

创建分布式卷:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
gluster volume create dv \
  transport tcp \
  10.0.0.5:/mnt/gluster/sdb/brick \
  10.0.0.5:/mnt/gluster/sdc/brick \
  10.0.0.6:/mnt/gluster/sdb/brick \
  10.0.0.6:/mnt/gluster/sdc/brick \
  10.0.0.7:/mnt/gluster/sdb/brick \
  10.0.0.7:/mnt/gluster/sdc/brick

gluster volume start dv
gluster volume quota dv enable
gluster volume info dv

4.1 开机自动启动

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
cat << "EOF" > /etc/systemd/system/gdv.service
[Unit]
Description=Start GlusterFS Volume dv on Boot
After=glusterd.service
Requires=glusterd.service

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/sbin/gluster --mode=script volume start dv
ExecStop=/usr/sbin/gluster --mode=script volume stop dv force
ExecReload=/usr/sbin/gluster --mode=script volume stop dv -f && /usr/sbin/gluster --mode=script volume start dv

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable gdv
systemctl start gdv

4.2 客户端使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 安装客户端
apt install glusterfs-client -y

# 挂载
mkdir -p /mnt/glusterfs_dv
mount -t glusterfs 10.0.0.5:/dv /mnt/glusterfs_dv

df -h
# 10.0.0.5:/dv  5.3T  92G  5.2T  2% /mnt/glusterfs_dv

# 开机自动挂载
echo "10.0.0.5:/dv /mnt/glusterfs_dv glusterfs defaults,_netdev 0 0" >> /etc/fstab
mount -a

4.3 性能压测(fio)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apt install fio
fio --name=randread \
    --filename=/mnt/glusterfs_dv/testfile \
    --bs=16k \
    --size=1G \
    --time_based \
    --runtime=60 \
    --rw=randread \
    --ioengine=libaio \
    --direct=1 \
    --iodepth=16

# 结果:108MiB/s 吞吐、6886 IOPS,平均延迟 2.3ms

5. 分布式复制卷(生产推荐)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
gluster volume create gv replica 2 transport tcp \
  10.0.0.5:/gfs \
  10.0.0.6:/gfs \
  10.0.0.7:/gfs \
  10.0.0.8:/gfs \
  force

# Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this
gluster volume start gv
gluster volume quota gv enable
gluster volume info gv

重要警告

  • volume create: gv: failed: The brick worker5:/gfs is being created in the root partition. ... use 'force' → 用 force 绕过
  • Replica 2 易脑裂,生产建议 Replica 3(6 节点起步)

删除卷:

1
2
3
4
5
gluster volume stop gv
gluster volume delete gv
gluster peer detach worker6
gluster peer detach worker7
gluster peer detach worker8

6. 客户端验证

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 在 worker5 上挂
mount -t glusterfs 10.0.0.5:/gv /mnt/glusterfs_gv

# 写 9 个文件
touch /mnt/test{1..9}

# 验证分布(前 2 节点一个复制集,后 2 节点一个复制集)
ls /gfs
# worker5 看到 test1 test2 test4 test5 test8 test9
# worker6 看到 test1 test2 test4 test5 test8 test9
# worker7 看到 test3 test6 test7
# worker8 看到 test3 test6 test7

7. Kadalu Operator 接入 K8s

Kadalu 是 GlusterFS 官方推荐的 K8s CSI 方案,把 GlusterFS 集群变成 K8s StorageClass。

7.1 镜像准备

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
docker login -u admin -p {{HARBOR_PASSWORD}} <harbor-ip>:13001

docker pull kadalu/kadalu-operator:1.3.0
docker tag kadalu/kadalu-operator:1.3.0 <harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0
docker push <harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0

docker pull kadalu/kadalu-csi:1.3.0
docker tag kadalu/kadalu-csi:1.3.0 <harbor-ip>:13001/base/kadalu/kadalu-csi:1.3.0
docker push <harbor-ip>:13001/base/kadalu/kadalu-csi:1.3.0

docker pull raspbernetes/csi-node-driver-registrar:2.0.1
docker tag raspbernetes/csi-node-driver-registrar:2.0.1 <harbor-ip>:13001/base/raspbernetes/csi-node-driver-registrar:2.0.1
docker push <harbor-ip>:13001/base/raspbernetes/csi-node-driver-registrar:2.0.1

docker pull raspbernetes/csi-external-provisioner:2.0.2
docker tag raspbernetes/csi-external-provisioner:2.0.2 <harbor-ip>:13001/base/raspbernetes/csi-external-provisioner:2.0.2
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-provisioner:2.0.2

docker pull raspbernetes/csi-external-attacher:3.0.0
docker tag raspbernetes/csi-external-attacher:3.0.0 <harbor-ip>:13001/base/raspbernetes/csi-external-attacher:3.0.0
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-attacher:3.0.0

docker pull raspbernetes/csi-external-resizer:1.0.0
docker tag raspbernetes/csi-external-resizer:1.0.0 <harbor-ip>:13001/base/raspbernetes/csi-external-resizer:1.0.0
docker push <harbor-ip>:13001/base/raspbernetes/csi-external-resizer:1.0.0

docker pull busybox
docker tag busybox <harbor-ip>:13001/base/library/busybox
docker push <harbor-ip>:13001/base/library/busybox

7.2 SSH 密钥

1
2
3
4
5
kubectl create ns kadalu
kubectl create secret generic glusterquota-ssh-secret \
  --from-literal=glusterquota-ssh-username=root \
  --from-file=ssh-privatekey=/root/.ssh/id_rsa \
  -n kadalu

7.3 部署 Operator

1
2
3
4
5
curl -L -o kadalu-operator.yaml https://github.com/kadalu/kadalu/releases/download/1.3.0/kadalu-operator.yaml
# 替换镜像
sed -i "s#docker.io/kadalu/kadalu-operator:1.3.0#<harbor-ip>:13001/base/kadalu/kadalu-operator:1.3.0#g" kadalu-operator.yaml

kubectl apply -f kadalu-operator.yaml

7.4 接入外部 GlusterFS 集群

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
  name: gfs
spec:
  type: External
  kadalu_format: native
  details:
    gluster_hosts:
      - 10.0.0.5
      - 10.0.0.6
      - 10.0.0.7
      - 10.0.0.8
    gluster_volname: gv
    gluster_options: log-level=DEBUG
1
kubectl apply -f /data/k8scnf/kadalu/kadalu.yml

7.5 设置默认 StorageClass

1
kubectl patch storageclass kadalu.gfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

7.6 测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# gfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-gfs
  namespace: testgfs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: kadalu.gfs
1
2
3
4
5
6
kubectl create ns testgfs
kubectl apply -f gfs-pvc.yaml
kubectl apply -f gfs-nginx.yaml

# 访问 nginx(通过 nodeport)
http://<vip-internal>:30052

上传文件到 /mnt/subvol/22/88/pvc-xxx/(glusterfs 物理路径),页面能立刻看到。


8. 常见问题

8.1 chown: changing ownership of '/data': Transport endpoint is not connected

1
2
gluster pool list
# 看哪些 peer Disconnected

修复:检查防火墙、24007 端口、glusterd 服务。

8.2 volume create: failed: The brick ... is being created in the root partition

brick 必须在独立挂载点(或子目录),不能直接用根分区。用 force 绕过。

8.3 Replica 2 脑裂

Replica 2 易脑裂。生产用 Replica 3 或 Arbiter Volume。

8.4 卷扩容

1
gluster volume add-brick gv node03:/data/brick1 node04:/data/brick1

8.5 Kadalu 排错

1
2
3
4
5
# 节点上能访问 gluster
nc -zv 10.0.0.5 24007

# 容器内能查
kubectl exec -it kadalu-csi-provisioner-0 -n kadalu -c kadalu-logging -- sh

9. 小结

GlusterFS 在 K8s 上依然有它的位置——比 Ceph 简单,比 NFS 强:

  1. 分布式卷 无冗余,胜在容量大
  2. 分布式复制卷 兼顾容量和安全,生产推荐
  3. Kadalu 是把 GlusterFS 集群接入 K8s 的官方方式
  4. Replica 3 才能保证脑裂不丢数据

下一步:MinIO / JuiceFS / OpenEBS:K8s 对象存储与本地 PV 全方案

使用 Hugo 构建
主题 StackJimmy 设计