首页 » Web前端 » requirephponrestart技巧_基于prometheusgrafanaalertmanager监控系统设备钉钉告警

requirephponrestart技巧_基于prometheusgrafanaalertmanager监控系统设备钉钉告警

访客 2024-11-23 0

扫一扫用手机浏览

文章目录 [+]

下载地址:https://studygolang.com/dl

1、解压

requirephponrestart技巧_基于prometheusgrafanaalertmanager监控系统设备钉钉告警

# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/

2、配置环境变量

requirephponrestart技巧_基于prometheusgrafanaalertmanager监控系统设备钉钉告警
(图片来自网络侵删)

echo \"大众export PATH=$PATH:/usr/local/go/bin\公众 >> /etc/profilesource /etc/profile

3、测试

验证一下是否成功,用go version 来验证

# go version

二、配置钉钉机器人

1、机器人管理

2、选择Webhook

3、选择群组

4、查看机器人设置

二、将钉钉接入 Prometheus AlertManager WebHook

插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk

1、安装Webhook

--源码编译(把稳在golang的src目录下新建)mkdir -p /usr/local/go/src/github.com/timonwong/cd /usr/local/go/src/github.com/timonwong/git clone https://github.com/timonwong/prometheus-webhook-dingtalk.gitcd prometheus-webhook-dingtalkmake--二进制包安装wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

2、解压

# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

安装后会天生prometheus-webhook-dingtalk发送钉钉告警模版文件:

/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl

3、启动prometheus-webhook-dingtalk

nohup ./prometheus-webhook-dingtalk --ding.profile=\"大众ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f\"大众 >dingding.log 2>&1 &

5、配置系统做事

# vim /etc/systemd/system/prometheus-webhook-dingtalk.service[Unit]Description=prometheus-webhook-dingtalkAfter=network-online.target​[Service]Restart=on-failureExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f​[Install]WantedBy=multi-user.target​# chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service# systemctl daemon-reload# systemctl start prometheus-webhook-dingtalk# systemctl status prometheus-webhook-dingtalk

三、配置 alertmanager 的邮件发送方和对接钉钉 webhook

/usr/local/alertmanager/alertmanager.yml

global: resolve_timeout: 5m # 配置邮件发送方信息 smtp_smarthost: 'smtp.qq.com:465' smtp_from: '1275758000@qq.com' smtp_auth_username: '1275758000@qq.com' smtp_auth_password: 'nxxxegb' smtp_require_tls: falseroute: group_by: ['alertname', 'cluster', 'service'] receiver: default-receiver group_wait: 30s group_interval: 2m repeat_interval: 30mreceivers: - name: 'default-receiver' email_configs: - to: '1430985018@qq.com,644642050@qq.com' # 配置连接 prometheus-webhook-dingtalk启动的做事 webhook_configs: #ops_dingding是前面启动webhook所定义的值 - url: 'http://localhost:8060/dingtalk/sre/send' send_resolved: true

repeat_interval: 这个字段是发送的频率,可以根据自己的须要进行设置,在调试过程中可以设置轻微短一点

查看状态:

四、prometheus配置(参考)

配置文件rules.yml:

groups: - name: host_monitoring rules: - alert: 内存报警 expr: netdata_system_ram_MiB_average{chart=\公众system.ram\"大众,dimension=\"大众free\"大众,family=\公众ram\"大众} < 800 for: 2m labels: team: node annotations: Alert_type: 内存报警 Server: '{{$labels.instance}}' #summary: \"大众{{$labels.instance}}: High Memory usage detected\"大众 explain: \"大众内存利用量超过90%,目前剩余量为:{{ $value }}M\公众 #description: \公众{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})\公众 - alert: CPU报警 expr: netdata_system_cpu_percentage_average{chart=\"大众system.cpu\"大众,dimension=\公众idle\公众,family=\"大众cpu\"大众} < 20 for: 2m labels: team: node annotations: Alert_type: CPU报警 Server: '{{$labels.instance}}' explain: \"大众CPU利用量超过80%,目前剩余量为:{{ $value }}\公众 #summary: \公众{{$labels.instance}}: High CPU usage detected\"大众 #description: \公众{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})\公众 - alert: 磁盘报警 expr: netdata_disk_space_GiB_average{chart=\"大众disk_space._\公众,dimension=\公众avail\"大众,family=\公众/\公众} < 4 for: 2m labels: team: node annotations: Alert_type: 磁盘报警 Server: '{{$labels.instance}}' explain: \"大众磁盘利用量超过90%,目前剩余量为:{{ $value }}G\公众 - alert: 做事告警 expr: up == 0 for: 2m labels: team: node annotations: Alert_type: 做事报警 Server: '{{$labels.instance}}' explain: \"大众netdata做事已关闭\"大众

这个配置文件是改过的,yaml文件对格式哀求和其他文件不一样,详细的可以自己去看一下,改完之后可以检测一下自己的格式是否精确

这个是一个格式化工具,紧张是可以检讨一下你的文件是否精确

http://www.bejson.com/validators/yaml_editor/五、查看告警

停滞cadvisor:docker stop cadvisor

日志:

重启做事后:

好吧,便是告警模板有点丑,后面在做改进,先测试到这里。

后面会分享更多关于prometheus方面的内容,感兴趣的朋友可以关注下!

相关文章

介绍百度码,技术革新背后的智慧之光

随着科技的飞速发展,互联网技术已经成为我们生活中不可或缺的一部分。而在这个信息爆炸的时代,如何快速、准确地获取信息,成为了人们关注...

Web前端 2025-01-03 阅读1 评论0

介绍皮箱密码,开启神秘之门的钥匙

皮箱,作为日常生活中常见的收纳工具,承载着我们的珍贵物品。面对紧闭的皮箱,许多人却束手无策。如何才能轻松打开皮箱呢?本文将为您揭秘...

Web前端 2025-01-03 阅读1 评论0

介绍盗号器,网络安全的隐忧与应对步骤

随着互联网的快速发展,网络安全问题日益突出。盗号器作为一种非法工具,对网民的个人信息安全构成了严重威胁。本文将深入剖析盗号器的原理...

Web前端 2025-01-03 阅读1 评论0