下载地址:https://studygolang.com/dl
1、解压
# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/
2、配置环境变量

echo \"大众export PATH=$PATH:/usr/local/go/bin\公众 >> /etc/profilesource /etc/profile
3、测试
验证一下是否成功,用go version 来验证
# go version
二、配置钉钉机器人1、机器人管理
2、选择Webhook
3、选择群组
4、查看机器人设置
二、将钉钉接入 Prometheus AlertManager WebHook
插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk
1、安装Webhook
--源码编译(把稳在golang的src目录下新建)mkdir -p /usr/local/go/src/github.com/timonwong/cd /usr/local/go/src/github.com/timonwong/git clone https://github.com/timonwong/prometheus-webhook-dingtalk.gitcd prometheus-webhook-dingtalkmake--二进制包安装wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
2、解压
# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
安装后会天生prometheus-webhook-dingtalk发送钉钉告警模版文件:
/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl
3、启动prometheus-webhook-dingtalk
nohup ./prometheus-webhook-dingtalk --ding.profile=\"大众ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f\"大众 >dingding.log 2>&1 &
5、配置系统做事
# vim /etc/systemd/system/prometheus-webhook-dingtalk.service[Unit]Description=prometheus-webhook-dingtalkAfter=network-online.target[Service]Restart=on-failureExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f[Install]WantedBy=multi-user.target# chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service# systemctl daemon-reload# systemctl start prometheus-webhook-dingtalk# systemctl status prometheus-webhook-dingtalk
三、配置 alertmanager 的邮件发送方和对接钉钉 webhook/usr/local/alertmanager/alertmanager.yml
global: resolve_timeout: 5m # 配置邮件发送方信息 smtp_smarthost: 'smtp.qq.com:465' smtp_from: '1275758000@qq.com' smtp_auth_username: '1275758000@qq.com' smtp_auth_password: 'nxxxegb' smtp_require_tls: falseroute: group_by: ['alertname', 'cluster', 'service'] receiver: default-receiver group_wait: 30s group_interval: 2m repeat_interval: 30mreceivers: - name: 'default-receiver' email_configs: - to: '1430985018@qq.com,644642050@qq.com' # 配置连接 prometheus-webhook-dingtalk启动的做事 webhook_configs: #ops_dingding是前面启动webhook所定义的值 - url: 'http://localhost:8060/dingtalk/sre/send' send_resolved: true
repeat_interval: 这个字段是发送的频率,可以根据自己的须要进行设置,在调试过程中可以设置轻微短一点
查看状态:
四、prometheus配置(参考)
配置文件rules.yml:
groups: - name: host_monitoring rules: - alert: 内存报警 expr: netdata_system_ram_MiB_average{chart=\公众system.ram\"大众,dimension=\"大众free\"大众,family=\公众ram\"大众} < 800 for: 2m labels: team: node annotations: Alert_type: 内存报警 Server: '{{$labels.instance}}' #summary: \"大众{{$labels.instance}}: High Memory usage detected\"大众 explain: \"大众内存利用量超过90%,目前剩余量为:{{ $value }}M\公众 #description: \公众{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})\公众 - alert: CPU报警 expr: netdata_system_cpu_percentage_average{chart=\"大众system.cpu\"大众,dimension=\公众idle\公众,family=\"大众cpu\"大众} < 20 for: 2m labels: team: node annotations: Alert_type: CPU报警 Server: '{{$labels.instance}}' explain: \"大众CPU利用量超过80%,目前剩余量为:{{ $value }}\公众 #summary: \公众{{$labels.instance}}: High CPU usage detected\"大众 #description: \公众{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})\公众 - alert: 磁盘报警 expr: netdata_disk_space_GiB_average{chart=\"大众disk_space._\公众,dimension=\公众avail\"大众,family=\公众/\公众} < 4 for: 2m labels: team: node annotations: Alert_type: 磁盘报警 Server: '{{$labels.instance}}' explain: \"大众磁盘利用量超过90%,目前剩余量为:{{ $value }}G\公众 - alert: 做事告警 expr: up == 0 for: 2m labels: team: node annotations: Alert_type: 做事报警 Server: '{{$labels.instance}}' explain: \"大众netdata做事已关闭\"大众
这个配置文件是改过的,yaml文件对格式哀求和其他文件不一样,详细的可以自己去看一下,改完之后可以检测一下自己的格式是否精确
这个是一个格式化工具,紧张是可以检讨一下你的文件是否精确
http://www.bejson.com/validators/yaml_editor/五、查看告警
停滞cadvisor:docker stop cadvisor
日志:
重启做事后:
好吧,便是告警模板有点丑,后面在做改进,先测试到这里。
后面会分享更多关于prometheus方面的内容,感兴趣的朋友可以关注下!