K8s 集群运维命令大全：kubectl 速查 + 节点维护 + harbor 私有仓库

kubectl 是 K8s 运维的"瑞士军刀"

K8s 1.28 时代的运维 90% 都靠 kubectl 完成。本文是"速查手册"——按场景分类，所有命令都可以直接复制粘贴。

适用版本：kubectl ≥ 1.28 / K8s 1.28.5

1. 基础环境

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 启用命令补全
apt install -y bash-completion
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc

# 所有用户生效
kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null

# 设置别名
echo 'alias k=kubectl' >> ~/.bashrc
source ~/.bashrc

# 设置默认 namespace
kubectl config set-context --current --namespace=rook-ceph

2. 资源查看

2.1 kubectl get 简写表

名称	简写	示例	说明
`pods`	`po`	`k get po -A`	Pod
`services`	`svc`	`k get svc -A`	Service
`deployments`	`deploy`	`k get deploy -A`	Deployment
`statefulsets`	`sts`	`k get sts -A`	StatefulSet
`daemonsets`	`ds`	`k get ds -A`	DaemonSet
`nodes`	`no`	`k get no`	Node
`namespaces`	`ns`	`k get ns`	Namespace
`configmaps`	`cm`	`k get cm -A`	ConfigMap
`secrets`		`k get secret -A`	Secret
`persistentvolumes`	`pv`	`k get pv`	PV
`persistentvolumeclaims`	`pvc`	`k get pvc -A`	PVC
`ingresses`	`ing`	`k get ing -A`	Ingress
`storageclasses`	`sc`	`k get sc`	StorageClass
`events`	`ev`	`k get ev -A`	Event

2.2 常用查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 看所有 kind
kubectl api-resources --verbs=list --namespaced -o name

# 全集群资源
kubectl get all -o wide -A

# 命名空间下所有资源
kubectl api-resources --verbs=list --namespaced -o name | \
  xargs -n 1 kubectl get --show-kind --ignore-not-found -n <namespace>

# 看 image
kubectl get deployment <name> -n <ns> -o jsonpath='{.spec.template.spec.containers[0].image}'

# 镜像与版本
kubectl api-versions

2.3 节点负载

1
2
kubectl top node
kubectl top pod -A

3. 节点管理

3.1 节点状态

1
2
3
4
5
6
7
8
# 查询节点
kubectl get node

# 节点标签
kubectl label nodes worker2 mysql-node=tenant1
kubectl label nodes worker2 mysql-node-           # 删除
kubectl get nodes --show-labels
kubectl label nodes worker9 --list

3.2 节点污点（Taints）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 添加污点
kubectl taint nodes master3 node-role.kubernetes.io/master3=:NoSchedule
# 去除
kubectl taint nodes master3 node-role.kubernetes.io/master3-

# Rook-Ceph 专用污点（不让普通 Pod 调度到 mon 节点）
kubectl taint nodes master3 dedicated=ceph:NoSchedule --overwrite

# Rook-Ceph Pod 加容忍度
# kubectl -n rook-ceph edit deploy rook-ceph-operator
# 在 spec.template.spec.tolerations 中加：
tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "ceph"
    effect: "NoSchedule"

3.3 节点隔离与驱逐

1
2
3
4
5
6
7
8
# 标记不可调度
kubectl cordon master1

# 驱逐 Pod
kubectl drain master1 --delete-emptydir-data --ignore-daemonsets --force

# 恢复调度
kubectl uncordon master1

3.4 删除节点

1
2
kubectl delete node master1
# 节点上的 kubelet 进程会自动停

4. Pod 运维

4.1 查 Pod

1
2
3
4
5
kubectl get po -owide -n <ns>
kubectl get pods -n <ns> | grep redis | awk '{print $1}'

# 详细
kubectl describe po -n kube-system metrics-server-7745c6dfc5-8sjjc

4.2 日志

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 单容器
kubectl logs -f <pod> -n <ns>

# 多容器
kubectl logs -f <pod> -c <container> -n <ns>

# 用标签
kubectl logs -f --tail=100 -l app=nginx -n test

# 时间范围
kubectl logs --since=9h -l app=app -n test
kubectl logs --since-time="2024-12-15T00:00:00Z" -l app=app -n test

4.3 执行命令

1
2
3
4
5
6
7
8
9
# 单次
kubectl exec <pod> -n <ns> -- nslookup kubernetes

# 交互
kubectl exec -it <pod> -- sh
kubectl exec -it <pod> -c <container> -- sh

# 临时容器（K8s 1.23+，调试 CrashLoopBackOff 神器）
kubectl debug -it <pod> --image=busybox:1.28 --target=<container>

4.4 删除

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 正常
kubectl delete po <pod> -n <ns>
kubectl delete -f deployment.yaml

# 强制
kubectl delete po <pod> -n <ns> --grace-period=0 --force
kubectl delete po calico-kube-controllers-546dcc6886-mf8rt -n kube-system --force --grace-period=0

# 批量删 Error
kubectl delete po $(kubectl get po | grep Error | awk '{print $1}')

# 批量删非 Running
kubectl delete pod -n <ns> $(kubectl get pods -n <ns> | grep -v "Running" | awk '{print $1}') --force --grace-period=0

4.5 副本数

1
2
kubectl scale --replicas=1 deployment/<name> -n <ns>
kubectl -n kube-system scale --replicas=10 deployment/coredns

4.6 强制重启

1
2
kubectl rollout restart deployment/<name> -n <ns>
kubectl rollout status deployment/<name> -n <ns>

5. Deployment 高级

5.1 版本回滚

1
2
3
4
5
6
7
8
9
# 记录版本（apply 时 --record 已废弃，用 annotate）
kubectl annotate deployment/<name> kubernetes.io/change-cause="image updated"

# 历史
kubectl rollout history deployment/<name>

# 回滚
kubectl rollout undo deployment/<name>
kubectl rollout undo deployment/<name> --to-revision=2

5.2 HPA（自动扩缩）

1
2
3
# 设置 HPA（注意：设了 HPA 后不能手动 scale）
kubectl autoscale deployment <name> -n <ns> --min=1 --max=1
kubectl edit hpa <name> -n <ns>

5.3 测试压测

1
2
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- \
  /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

6. Secret 与 harbor

6.1 创建 harbor secret

1
2
3
4
5
kubectl create secret docker-registry harbor-secret \
  --docker-server=<harbor-host>:13001 \
  --docker-username=admin \
  --docker-password={{HARBOR_PASSWORD}} \
  -n <ns>

真实密码用占位符 {{HARBOR_PASSWORD}} 替代。

6.2 Pod 引用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Pod
metadata:
  name: pod-harbor
spec:
  containers:
  - name: c1
    image: <harbor-host>:13001/test/nginx:v1
  imagePullSecrets:
  - name: harbor-secret

6.3 docker daemon.json（提前配）

所有节点：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": { "max-size": "50m", "max-file": "1" },
  "registry-mirrors": [
    "https://<your-mirror>.mirror.aliyuncs.com",
    "https://docker.m.daocloud.io",
    "https://hub-mirror.c.163.com"
  ],
  "insecure-registries": ["<harbor-host>:13001"]
}

6.4 Token（admin 登录）

1
kubectl -n kube-system create token admin-user

7. 内部网络测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 创建测试 pod
kubectl run -it --rm busybox --image=busybox:1.28 -- nslookup mine-backend.test

# 测试 DNS
kubectl run busybox --image=busybox:1.28 -- sleep 3600
kubectl exec -it busybox -- sh
nslookup kubernetes.default
nslookup extend-service.test

# 跨 namespace 解析
kubectl exec busybox -n default -- nslookup safety-auth.safety-hnpre.svc.cluster.local

修改 CoreDNS 配置后重启：

1
2
3
4
5
6
7
8
9
kubectl edit configmap coredns -n kube-system

# 改 forward 段
forward . __PILLAR__CLUSTER__DNS__ {
    prefer_udp
    max_concurrent 1000
}

kubectl rollout restart deploy/coredns -n kube-system

8. 命名空间与 finalizers

8.1 命名空间管理

1
2
3
kubectl create ns <name>
kubectl delete ns <name>
kubectl get ns

8.2 namespace 卡 Terminating

1
2
3
4
5
6
ns=cattle-fleet-system
kubectl get namespace $ns -o json | jq '.metadata.finalizers=[]' | \
  kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f -

# 通用 patch
kubectl patch ns rook-ceph --type=merge -p '{"metadata":{"finalizers":null}}'

8.3 PVC 卡 Terminating

1
2
3
4
kubectl patch pvc jenkins-agent-pvc -n kube-ops --type=merge -p '{"metadata":{"finalizers":null}}'

kubectl patch pv pvc-c87554c7-6b20-42ed-8cdd-8be1d630a744 \
  -p '{"metadata":{"finalizers":null}}' --type=merge

9. 排错速查表

现象	命令	解决
节点 NotReady	`kubectl describe node <n>`	查 Conditions
Pod Pending	`kubectl describe po <p>`	查 Events
Pod CrashLoopBackOff	`kubectl logs <p> --previous`	看上一次容器日志
Service 不通	`kubectl get ep`	ep 是不是有后端 Pod
NodePort 外网访问不了	`kubectl get svc -o yaml`	type=NodePort, nodePort 范围
DNS 不通	`kubectl exec <p> -- nslookup kubernetes`	CoreDNS Pod 状态
镜像拉不到	`kubectl describe po <p>`	检查 imagePullSecrets
PVC Pending	`kubectl describe pvc`	StorageClass 配额 / 节点亲和
HPA 不会缩	`kubectl describe hpa`	看 metrics-server 是否正常
升级失败	`kubectl rollout undo`	回滚

10. 小结

kubectl 命令覆盖 K8s 运维的 80% 场景：

get + describe + logs 是排障三件套
rollout undo 是版本回滚神器
drain + cordon 是节点维护的标配
drain + delete node 是缩容的标配
patch … finalizers=null 是 namespace 卡死的杀手锏

下一步：K8s 集群升级与卸载：版本升级 + finalizers 清理。