[TOC]
在使用数据库 Redis 过程中需要对 Redis 运行状态进行监控,以便了解 Redis 服务是否运行正常,排查 Redis 故障等。云监控 Prometheus 服务提供基于 Exporter 的方式来监控 Redis 运行状态,并提供了开箱即用的 Grafana 监控大盘。 正常的生产环境集群,redis都有N个集群。 因此搭建一个监控N给集群的redis成为了一个较为挑战的任务。
前提条件
- 创建了Prometheus。
- 创建了redis或者redis集群。
部署redis-exporter
直接给出redis的yaml配置文件. 如果是多个集群, 则依次添加即可.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: qa
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: redis
serviceName: redis
template:
metadata:
creationTimestamp: null
labels:
app: redis
spec:
containers:
- args:
- /etc/redis/redis.conf
env:
- name: TZ
value: Asia/Shanghai
image: redis:5.0
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
protocol: TCP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: redis-data
- mountPath: /etc/redis/redis.conf
name: redis-configmap
subPath: redis.conf
- name: redis-exporter
env:
- name: TZ
value: Asia/Shanghai
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: redispasswd
image: oliver006/redis_exporter:v1.15.1
ports:
- containerPort: 9121
resources:
requests:
cpu: 100m
memory: 100Mi
dnsPolicy: ClusterFirst
initContainers:
- command:
- /bin/sh
- -c
- echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo 65535 >
/proc/sys/net/core/somaxconn && sleep 1
image: busybox
imagePullPolicy: Always
name: disable-thp
securityContext:
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /sys
name: sys
nodeSelector:
env: non_prod
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: group
operator: Equal
value: non_prod
volumes:
- configMap:
defaultMode: 420
name: redis-redis.conf
name: redis-configmap
- hostPath:
path: /sys
name: sys
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: alicloud-disk-common-hangzhou-b
volumeMode: Filesystem
status:
phase: Pending
service资源清单文件. 主要是在service里面添加annotations
.
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9121"
name: redis-svc
namespace: qa
spec:
clusterIP: None
ports:
- port: 6379
name: redis
- name: prom
port: 9121
targetPort: 9121
selector:
app: redis
redis 和 redis-exporter部署完毕后. 添加promethues的动态发现配置文件. 使用动态配置文件比静态的优势在于, 可以有多套集群.
- job_name: 'redis_exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (redis.*)
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: keep
regex: (.*redis.*)
target_label: kubernetes_pod_name
这个时候显示的是endpoint的ip加上port. 在查看的时候其实是不大友好的. 可以用svc名字来代替.