前言

由于业务与ELK都使用了Kafka作为消息队列,因此考虑到业务的稳定性与可用性,使用prometheus监控kafka集群。使用的监控方式为:kafka_exporter+prometheus。


提示

  • 如果监控kafka集群的话,kafka_exporter只需在集群的一个节点安装部署即可
  • prometheus部署在k8s之上

项目地址
https://github.com/danielqsj/kafka_exporter

下载地址
https://github.com/danielqsj/kafka_exporter/releases/download/v1.4.2/kafka_exporter-1.4.2.linux-amd64.tar.gz


一、部署kafka_exporter

[root@kafka ~]# tar xf kafka_exporter-1.4.2.linux-amd64.tar.gz -C /usr/local
[root@kafka ~]# mv /usr/local/kafka_exporter-1.4.2.linux-amd64/ /usr/local/kafka_exporter
[root@kafka ~]# useradd -s /sbin/nologin kafka
[root@kafka ~]# chown -R kafka:kafka /usr/local/kafka_exporter
[root@kafka ~]# vim /usr/lib/systemd/system/kafka_exporter.service
[Unit]
Description=kafka_exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/kafka_exporter/kafka_exporter --kafka.server=172.18.244.164:9092 --web.listen-address=:9308 --zookeeper.server=172.18.244.164:2181
[Install]
WantedBy=multi-user.target

–kafka.server=172.18.244.164:9092 #需要监控的kafka连接地址
–web.listen-address=:9308 #kafka_exporter监听地址
–zookeeper.server=172.18.244.164:2181 #需要监控的zookeeper连接地址

[root@kafka ~]# systemctl daemon-reload
[root@kafka ~]# systemctl start kafka_exporter
[root@kafka ~]# netstat -lntup | grep 9308

二、prometheus配置

[root@k8s-master ~]# vim prometh_configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-configmap
  namespace: monitoring
data:
  prometheus.yml: |
    # my global config
    global:
      scrape_interval:     5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.

    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:

    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
    scrape_configs:
      # 监控业务kafka
      - job_name: 'kafka'
        static_configs:
        - targets:
          - 172.18.244.164:9308
[root@k8s-master ~]# kubectl apply -f  prometh_configmap.yaml

三、grafana展示

官方仪表板:7589,可以导入进去使用。

但是我自己根据业务需求需要看到的数据进行了修改。以下是我自己的仪表板配置
变量名称 查询语句(PromQL)

job		label_values(kafka_consumergroup_current_offset, job)
instance	label_values(kafka_consumergroup_current_offset{job=~"$job"}, instance)
consumergroup	label_values(kafka_consumergroup_current_offset{instance="$instance"},consumergroup)
topic	label_values(kafka_consumergroup_current_offset{instance="$instance",consumergroup=~"$consumergroup"}, topic)
time        1m,2m,3m,5m,10m,30m,1h,6h,12h,1d,7d,14d,30d

在这里插入图片描述

图表

Kafka 运行时间:
up{instance="$instance"}

Broker 数量:
kafka_brokers{instance="$instance"}

Topic 分区数:
sum by(topic) (kafka_topic_partitions{instance="$instance",topic=~"$topic"})

每秒消费完成的次数 (CURRENT-OFFSET)sum(rate(kafka_topic_partition_current_offset{instance="$instance", topic=~"$topic"}[$time])) by (topic)

当前队列消费堆积数量  (LAG)sum(kafka_consumergroup_lag{instance="$instance",topic=~"$topic"}) by (consumergroup, topic) 

对文章中的yaml文件与grafana仪表板json文件有兴趣的可查看我的github项目:https://github.com/shaxiaozz/prometheus

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐