K8s 日志采集的 3 种方式
| 方式 | 资源占用 | 适用 |
|---|
| DaemonSet 采集(默认) | 低 | 中小集群 |
| Sidecar 采集 | 高 | 大集群、多租户隔离 |
| 应用主动推送 | 取决于应用 | 自定义需求 |
本文部署 Loki + Promtail——K8s 时代最主流的日志方案(比 ELK 轻量 10 倍,查询语法像 PromQL)。
适用版本:Loki 2.9.3 / Promtail / K8s 1.28.5
1. Loki 架构
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| ┌──────────────────┐
│ Grafana │ ← 查询
└─────────┬────────┘
│ LogQL
┌─────────▼────────┐
│ Loki │ ← 存储 + 索引
│ (write/read/ │
│ backend/ │
│ gateway) │
└─────────┬────────┘
│ push
┌──────────────────┴──────────────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ Promtail│ ←─ 采集 /var/log/pods/ │ Promtail│
│ (Node1)│ │ (Node2)│
└─────────┘ └─────────┘
▲ ▲
/var/log/pods/ /var/log/pods/
/var/lib/docker/containers/ /var/lib/docker/containers/
|
3 大模块:
- Grafana Agent / Promtail:每节点一个 DaemonSet,抓容器日志
- Loki:存储 + 查询(分 write / read / backend / gateway 4 个微服务)
- Grafana:可视化查询
2. helm 部署(推荐)
2.1 添加仓库
1
2
3
4
5
6
7
| helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# 拉 chart
helm pull grafana/loki-stack
tar xvf loki-stack-2.10.0.tgz
cd loki-stack
|
2.2 备份默认 values
1
| cp values.yaml values-prod.yaml
|
2.3 修改 values-prod.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| loki:
enabled: true
persistence:
enabled: true
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
size: 10Gi
promtail:
enabled: true
grafana:
enabled: true
service:
type: NodePort
persistence:
enabled: true
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
size: 10Gi
|
2.4 处理冲突 RBAC
1
2
3
4
5
6
| k get ClusterRole
# ClusterRole "loki-promtail" in namespace "" exists and cannot be imported into the current release
cd loki-stack
kubectl delete ClusterRole loki-promtail
kubectl delete ClusterRole loki-grafana-clusterrole
|
2.5 部署
1
2
3
4
5
6
7
| kubectl create ns logging
helm upgrade --install loki . \
-f values-prod.yaml \
-n logging
kubectl logs -f -n logging loki-0
|
2.6 验证
1
2
3
4
| kubectl get pod -n logging
# loki-0 1/1 Running
# loki-promtail-xxx 1/1 Running
# loki-grafana-xxx 1/1 Running
|
3. 私有 yaml 部署(更可控)
如果不想用 helm 那一堆微服务,可以直接用 statefulset + configmap。
3.1 准备镜像
1
2
3
| docker pull grafana/loki:2.9.3
docker tag grafana/loki:2.9.3 <harbor>/base/grafana/loki:2.9.3
docker push <harbor>/base/grafana/loki:2.9.3
|
3.2 完整 yaml
/data/k8scnf/loki.yaml(节选关键部分):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
| ---
apiVersion: v1
kind: ServiceAccount
metadata:
name: loki
namespace: logging
---
apiVersion: v1
kind: ConfigMap
metadata:
name: loki
namespace: logging
labels:
app: loki
data:
loki.yaml: |
auth_enabled: false
ingester:
chunk_idle_period: 3m
chunk_block_size: 262144
chunk_retain_period: 1m
max_transfer_retries: 0
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
wal:
enabled: true
dir: /data/wal
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 8h
schema_config:
configs:
- from: "2024-01-19"
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 48h
compactor:
working_directory: /data/loki/boltdb-shipper-compactor
shared_store: filesystem
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: logging
spec:
podManagementPolicy: OrderedReady
replicas: 1
serviceName: loki
template:
metadata:
labels:
app: loki
spec:
serviceAccountName: loki
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
initContainers:
- name: fix-permissions
image: busybox:latest
securityContext:
privileged: true
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
command:
- sh
- -c
- >-
id;
mkdir -p /data/loki;
chown 10001:10001 /data -R;
ls -la /data/
volumeMounts:
- mountPath: /data
name: loki-storage
containers:
- name: loki
image: grafana/loki:2.9.3
args:
- -config.file=/etc/loki/config/loki.yaml
volumeMounts:
- name: config
mountPath: /etc/loki/config/loki.yaml
subPath: loki.yaml
- name: loki-storage
mountPath: /data
ports:
- name: http-metrics
containerPort: 3100
livenessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
readinessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
securityContext:
readOnlyRootFilesystem: true
volumeClaimTemplates:
- metadata:
name: loki-storage
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: Service
metadata:
name: loki
namespace: logging
spec:
type: ClusterIP
ports:
- port: 3100
name: http-metrics
selector:
app: loki
---
apiVersion: v1
kind: Service
metadata:
name: loki-outer
namespace: logging
spec:
type: NodePort
ports:
- port: 3100
nodePort: 32537
selector:
app: loki
|
3.3 部署
1
2
| kubectl create ns logging
kubectl apply -f /data/k8scnf/loki.yaml
|
4. Promtail 部署
Promtail 是 Loki 的"客户端",每节点跑一个 DaemonSet。
4.1 ConfigMap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| apiVersion: v1
kind: ConfigMap
metadata:
name: promtail
namespace: logging
data:
promtail.yaml: |
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_name]
target_label: __service__
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- action: replace
source_labels: [__meta_kubernetes_pod_name]
target_label: pod
|
4.2 DaemonSet + RBAC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
| apiVersion: v1
kind: ServiceAccount
metadata:
name: promtail
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: promtail
rules:
- apiGroups: [""]
resources: [nodes, nodes/proxy, services, endpoints, pods]
verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: promtail
subjects:
- kind: ServiceAccount
name: promtail
namespace: logging
roleRef:
kind: ClusterRole
name: promtail
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail
namespace: logging
spec:
selector:
matchLabels:
name: promtail
template:
metadata:
labels:
name: promtail
spec:
serviceAccountName: promtail
containers:
- name: promtail
image: grafana/promtail:2.9.3
args:
- -config.file=/etc/promtail/promtail.yaml
volumeMounts:
- name: config
mountPath: /etc/promtail
- name: docker
mountPath: /var/lib/docker/containers
readOnly: true
- name: pods
mountPath: /var/log/pods
readOnly: true
volumes:
- name: config
configMap:
name: promtail
- name: docker
hostPath:
path: /var/lib/docker/containers
- name: pods
hostPath:
path: /var/log/pods
|
5. 验证与查询
5.1 Loki 健康
1
2
3
4
| kubectl -n logging port-forward svc/loki 3100:3100
# 浏览器
http://localhost:3100/ready
# 看到 "ready" 即 OK
|
5.2 Grafana 配置 Loki 数据源
Grafana → Configuration → Data sources → Add data source → Loki
URL:http://loki:3100
5.3 查日志
Grafana Explore → 数据源选 Loki → 查询:
1
2
3
| {namespace="kube-system"}
{namespace="default"} |= "error"
{app="my-app"} | json | line_format "{{.msg}}"
|
{job="kubernetes-pods"} |= "ERROR" 查所有 ERROR 日志。
6. 常见问题
6.1 loki-0 1/2 CrashLoopBackOff
容器权限问题。yaml 里加 initContainer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| initContainers:
- name: fix-permissions
image: busybox:latest
securityContext:
privileged: true
runAsGroup: 0
runAsUser: 0
command:
- sh
- -c
- >-
mkdir -p /data/loki;
chown 10001:10001 /data -R;
ls -la /data/
volumeMounts:
- mountPath: /data
name: storage
|
实测 2.7.3 有 bug,2.6.1 反而正常——怀疑是 Loki 配置变更导致容器启动顺序问题。直接换成 2.6.1。
6.2 no such file or directory: /var/log/pods/...
docker 数据目录迁移过 → Promtail 找不到容器日志。
解决:把 Promtail 的 /var/lib/docker/containers 路径改成实际路径;或重装 docker 到默认路径。
临时方案:重启故障 Pod:
1
| kubectl delete po <pod> -n kube-system
|
6.3 Promtail 一直 Buffer filling up
Promtail 默认 buffer 满了就丢日志。调大:
1
2
3
4
5
6
| positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
|
6.4 日志保留时间太短
1
2
3
4
5
| limits_config:
retention_period: 168h # 7 天
table_manager:
retention_deletes_enabled: true
retention_period: 168h
|
6.5 helm 部署 RBAC 冲突
1
2
| kubectl delete ClusterRole loki-promtail
kubectl delete ClusterRole loki-grafana-clusterrole
|
7. 小结
Loki + Promtail 是 K8s 时代"轻量级日志方案"的事实标准:
- DaemonSet 采集 占资源最少,适合中小集群
- Loki 4 微服务(write / read / backend / gateway)支持水平扩展
- LogQL 语法像 PromQL,习惯 Prometheus 的人秒上手
- 容器权限问题 用 initContainer 修
- 保留策略 默认 48h,生产建议 7-30 天
下一步:K8s 实战:Deployment/StatefulSet/DaemonSet 区别 + 探针 + hostNetwork 调度。