phpleveldb技巧_Prometheus事理详解

文章目录 [+]

我们从上面的架构图可以看出 Prometheus 的紧张模块包含:Server, Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。
我们逐一认识一下各个模块的功能浸染。

模块先容

phpleveldb技巧_Prometheus事理详解

Retrieval是卖力定时去暴露的目标页面上去抓取采样指标数据。
Storage 是卖力将采样数据写入指定的时序数据库存储。
PromQL 是Prometheus供应的查询措辞模块。
可以和一些webui比如grfana集成。
Jobs / Exporters:Prometheus 可以从 Jobs 或 Exporters 中拉取监控数据。
Exporter 以 Web API 的形式对外暴露数据采集接口。
Prometheus Server:Prometheus 还可以从其他的 Prometheus Server 中拉取数据。
Pushgateway:对付一些以临时性 Job 运行的组件，Prometheus 可能还没有来得及从中 pull 监控数据的情形下，这些 Job 已经结束了，Job 运行时可以在运行时将监控数据推送到 Pushgateway 中，Prometheus 从 Pushgateway 中拉取数据，防止监控数据丢失。
Service discovery:是指 Prometheus 可以动态的创造一些做事，拉取数据进行监控，如从DNS，Kubernetes，Consul 中创造, file_sd 是静态配置的文件。
AlertManager:是一个独立于 Prometheus 的外部组件，用于监控系统的告警，通过配置文件可以配置一些告警规则，Prometheus 会把告警推送到 AlertManager。
4）事情流程

大概的事情流程如下：

（图片来自网络侵删）

Prometheus server 定期从配置好的 jobs 或者 exporters 中拉 metrics，或者吸收来自 Pushgateway 发过来的 metrics，或者从其他的 Prometheus server 中拉 metrics。
Prometheus server 在本地存储网络到的 metrics，并运行已定义好的 alert.rules，记录新的韶光序列或者向 Alertmanager 推送警报。
Alertmanager 根据配置文件，对吸收到的警报进行处理，发出告警。
在图形界面中，可视化采集数据。
二、安装

Prometheus+Grafana+Altermanager等组件的安装可以看这里

Prometheus：http://192.168.0.113:30081/Grafana：http://192.168.0.113:30081/Alertmanager：http://192.168.0.113:30082/三、Prometheus干系观点1）内部存储机制

Prometheus有着非常高效的韶光序列数据存储方法，每个采样数据仅仅占用3.5byte旁边空间，上百万条韶光序列，30秒间隔，保留60天，大概花了200多G（引用官方PPT）。

Prometheus内部紧张分为三大块：

Retrieval是卖力定时去暴露的目标页面上去抓取采样指标数据Storage是卖力将采样数据写磁盘PromQL是Prometheus供应的查询措辞模块。

2）数据模型

Prometheus 存储的所有数据都是韶光序列数据（Time Serie Data，简称时序数据）。
时序数据是具有韶光戳的数据流，该数据流属于某个度量指标（Metric）和该度量指标下的多个标签（Label）。

每个Metric name代表了一类的指标，他们可以携带不同的Labels，每个Metric name + Label组合成代表了一条韶光序列的数据。

在Prometheus的天下里面，所有的数值都是64bit的。
每条韶光序列里面记录的实在便是64bit timestamp(韶光戳) + 64bit value(采样值)。

Metric name（指标名称）：该名字该当具有语义，一样平常用于表示 metric 的功能，例如：http_requests_total, 表示 http 要求的总数。
个中，metric 名字由 ASCII 字符，数字，下划线，以及冒号组成，且必须知足正则表达式 [a-zA-Z_:][a-zA-Z0-9_:]。
Lables（标签）：使同一个韶光序列有了不同维度的识别。
例如 http_requests_total{method=“Get”} 表示所有 http 要求中的 Get 要求。
当 method=“post” 时，则为新的一个 metric。
标签中的键由 ASCII 字符，数字，以及下划线组成，且必须知足正则表达式 [a-zA-Z_:][a-zA-Z0-9_:]。
timestamp(韶光戳)：数据点的韶光，表示数据记录的韶光。
Sample Value（采样值）：实际的韶光序列，每个序列包括一个 float64 的值和一个毫秒级的韶光戳。

例如图上的数据：

http_requests_total{status="200",method="GET"}http_requests_total{status="404",method="GET"}

根据上面的剖析，韶光序列的存储彷佛可以设计成key-value存储的办法（基于BigTable）。

进一步拆分，可以像下面这样子：

上图的第二条样式便是现在Prometheus内部的表现形式了，__name__是特定的label标签，代表了metric name。

再回顾一下Prometheus的整体流程：

上面提到了K-V存储，当然是利用了LevelDB的引擎，它的特点是顺序读写性能非常高，这是非常符合韶光序列的存储的。

官方资料

链接：https://pan.baidu.com/s/1NL0iGYf-7ReqqhUEyr9Oiw提取码：8888

3）Metric类型

Prometheus定义了4种不同的指标类型(metric type)：Counter（计数器）、Gauge（仪表盘）、Histogram（直方图）、Summary（择要）

1、Counter（计数器）

一种累加的 metric，范例的运用如：要求的个数，结束的任务数，涌现的缺点数等等。

【例如】查询 http_requests_total{method=“get”, job=“Prometheus”, handler=“query”} 返回 8，10 秒后，再次查询，则返回 14。

2、Gauge（仪表盘）

数据是一个瞬市价，如果当前内存用量，它随着韶光变革忽高忽低。

【例如】go_goroutines{instance=“172.17.0.2”, job=“Prometheus”} 返回值 147，10 秒后返回 124。

3、Histogram（直方图）

Histogram 取样不雅观测的结果（一样平常是要求持续韶光或相应大小）并在一个可配置的分布区间（bucket）内打算这些结果。
其也供应所有不雅观测结果的总和。
Histogram 有一个基本 metric名称 <basename>，在一次抓取中展现多个韶光序列：累加的 counter，代表不雅观测区间：<basename>_bucket{le=""}所有不雅观测值的总数：<basename>_sum不雅观测到的事宜数量：<basenmae>_count

例如 Prometheus server 中prometheus_local_storage_series_chunks_persisted, 表示 Prometheus 中每个时序须要存储的 chunks 数量，我们可以用它打算待持久化的数据的分位数。

4、Summary（择要）

和 histogram 相似，summary 取样不雅观测的结果（一样平常是要求持续韶光或相应大小）。
但是它还供应不雅观测的次数和所有值的总和，它通过一个滑动的韶光窗口打算可配置的分位数。
Summary 有一个基本的 metric名称 <basename>，在一次抓取中展现多个韶光序列：不雅观测事宜的流式φ-分位数（0 ≤ φ ≤ 1）：{quantile="φ"}所有不雅观测值的总和：<basename>_sum不雅观测的事宜数量：<basename>_count

例如 Prometheus server 中 prometheus_target_interval_length_seconds。

4）Histogram 和Summary的比拟

序号

histogram

Summary

配置

区间配置

分位数和滑动窗口

客户端性能

只需增加counters代价小

须要流式打算代价高

做事端性能

打算分位数花费大，可能会耗时

无需打算，代价小

时序数量

_sum、_count、bucket

_sum、_count、quantile

分位数偏差

bucket的大小有关

φ的配置有关

φ和滑动窗口

Prometheus 表达式设置

客户端设置

聚合

根据表达式聚合

一样平常不可聚合

以下是类型为histogram和summary的样本输出示例：

# A histogram, which has a pretty complex representation in the text format:# HELP http_request_duration_seconds A histogram of the request duration.# TYPE http_request_duration_seconds histogramhttp_request_duration_seconds_bucket{le="0.05"} 24054http_request_duration_seconds_bucket{le="0.1"} 33444http_request_duration_seconds_bucket{le="0.2"} 100392http_request_duration_seconds_bucket{le="+Inf"} 144320http_request_duration_seconds_sum 53423http_request_duration_seconds_count 144320# Finally a summary, which has a complex representation, too:# HELP rpc_duration_seconds A summary of the RPC duration in seconds.# TYPE rpc_duration_seconds summaryrpc_duration_seconds{quantile="0.01"} 3102rpc_duration_seconds{quantile="0.05"} 3272rpc_duration_seconds{quantile="0.5"} 4773rpc_duration_seconds_sum 1.7560473e+07rpc_duration_seconds_count 26935）任务（JOBS）与实例（INSTANCES）用Prometheus术语来说，可以抓取的端点称为instance，常日对应于单个进程。
具有相同目的的instances 的凑集（例如，出于可伸缩性或可靠性而复制的过程）称为job。

例如，一个具有四个复制实例的API做事器作业:

job: api-serverinstance 1: 1.2.3.4:5670instance 2: 1.2.3.4:5671instance 3: 5.6.7.8:5670instance 4: 5.6.7.8:5671instance: 一个单独 scrape 的目标，一样平常对应于一个进程。
:jobs: 一组同种类型的 instances（紧张用于担保可扩展性和可靠性）5）Node exporter

Node exporter 紧张用于暴露 metrics 给 Prometheus，个中 metrics 包括：cpu 的负载，内存的利用情形，网络等。

6）Pushgateway

Pushgateway 是 Prometheus 生态中一个主要工具，利用它的缘故原由紧张是：

Prometheus 采取 pull 模式，可能由于不在一个子网或者防火墙缘故原由，导致Prometheus 无法直接拉取各个 target数据。
在监控业务数据的时候，须要将不同数据汇总, 由 Prometheus 统一网络。

由于以上缘故原由，不得不该用 pushgateway，但在利用之前，有必要理解一下它的一些弊端：

将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了，受影响比多个 target 大。
Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个节点有效。
Pushgateway 可以持久化推送给它的所有监控数据。
因此，纵然你的监控已经下线，prometheus 还会拉取到旧的监控数据，须要手动清理 pushgateway 不要的数据。
三、TSDB简介

TSDB(Time Series Database)时序列数据库，我们可以大略的理解为一个优化后用来处理韶光序列数据的软件，并且数据中的数组是由韶光进行索引的。

1）韶光序列数据库的特点大部分韶光都是写入操作。
写入操作险些是顺序添加，大多数时候数据到达后都以韶光排序。
写操作很少写入良久之前的数据，也很少更新数据。
大多数情形在数据被采集到数秒或者数分钟后就会被写入数据库。
删除操作一样平常为区块删除，选定开始的历史韶光并指定后续的区块。
很少单独删除某个韶光或者分开的随机韶光的数据。
基本数据大，一样平常超过内存大小。
一样平常选取的只是其一小部分且没有规律，缓存险些不起任何浸染。
读操作是十分范例的升序或者降序的顺序读。
高并发的读操作十分常见。
2）常见的韶光序列数据库

TSDB项目

官网

influxDB

https://influxdata.com/

RRDtool

http://oss.oetiker.ch/rrdtool/

Graphite

http://graphiteapp.org/

OpenTSDB

http://opentsdb.net/

Kdb+

http://kx.com/

Druid

http://druid.io/

KairosDB

http://kairosdb.github.io/

Prometheus

https://prometheus.io/

四、PromQL查询表达式

PromQL的四种数据类型：

即时向量（Instant vector）：包含每个韶光序列单个样品的一组韶光序列，共享相同的韶光戳范围向量（Range vector）：包含一个范围内数据点的一组韶光序列标量（Scalar）：一个大略的数字浮点值字符串（String）：一个大略的字符串值；当前未利用1）即时矢量选择器

即时向量选择器许可选择一组韶光序列，或者某个给定的韶光戳的样本数据。
下面这个例子选择了具有http_requests_total的韶光序列：

http_requests_total

你可以通过附加一组标签，并用{}括起来，来进一步筛选这些韶光序列。
下面这个例子只选择有http_requests_total名称的、有prometheus事情标签的、有canary组标签的韶光序列：

http_requests_total{job="prometheus",group="canary"}

其余，也可以也可以将标签值反向匹配，或者对正则表达式匹配标签值。
下面列举匹配操作符：

=：选择恰好相等的字符串标签!=：选择不相等的字符串标签=~：选择匹配正则表达式的标签（或子标签）!~：选择不匹配正则表达式的标签（或子标签）

例如，选择staging、testing、development环境下的，GET之外的HTTP方法的http_requests_total的韶光序列：

http_requests_total{environment=~"staging|testing|development",method!="GET"}

2）范围矢量选择器

范围向量表达式正如即时向量表达式一样运行，但是前者返回从当前时候开始的一定韶光范围的韶光序列凑集回来。
语法是，在一个向量表达式之后添加[]来表示韶光范围，持续韶光用数字表示，后接下面单元之一：

s：secondsm：minutesh：hoursd：daysw：weeksy：years

不才面这个例子中，我们选择末了5分钟的记录，metric名称为http_requests_total、作业标签为prometheus的韶光序列的所有值：

http_requests_total{job="prometheus"}[5m]

3）偏移量修正器

所述offset可以改变韶光为查询中的个别时候和范围矢量偏移。
例如，以下表达式返回http_requests_total相对付当前查询评估韶光的过去5分钟值：

http_requests_total offset 5m

同样适用于范围向量。
这将返回http_requests_total一周前的5分钟费率：

rate(http_requests_total[5m] offset 1w)

更多操作符，请参考官方文档

4）利用聚合操作

PromQL供应的聚合操作可以用来对这些韶光序列进行处理，形成一条新的韶光序列

# 查询系统所有http要求的总量sum(http_request_total)# 按照mode打算主机CPU的均匀利用韶光avg(node_cpu) by (mode)# 按照主机查询各个主机的CPU利用率sum(sum(irate(node_cpu{mode!='idle'}[5m])) / sum(irate(node_cpu[5m]))) by (instance)

常见的聚合函数

sum (求和)min (最小值)max (最大值)avg (均匀值)stddev (标准差)stdvar (标准方差)count (计数)count_values (对value进行计数)bottomk (后n条时序)topk (前n条时序)quantile (分位数)

更多函数，请参考官方文档

五、Exporter先容

Exporter是prometheus监控中主要的组成部分，卖力数据指标的采集。
广义上讲所有可以向Prometheus供应监控样本数据的程序都可以被称为一个Exporter。
而Exporter的一个实例称为target。
官方给出的插件有blackbox_exporter、node_exporter、mysqld_exporter、snmp_exporter等，第三方的插件有redis_exporter，cadvisor等。

官方和一些社区供应好多exproter, 我们可以直接拿过来采集我们的数据。
官方的exporter地址： https://prometheus.io/docs/instrumenting/exporters/

1）常见的Exporter简介1、blackbox_exporter

GitHub地址：https://github.com/prometheus/blackbox_exporter

bloackbox exporter是prometheus社区供应的黑盒监控办理方案，运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的办法对网络进行探测。
这里通过blackbox对我们的站点信息进行采集。

2、node_exporter（本章重点讲解）

GitHub地址：https://github.com/prometheus/node_exporter

node_exporter紧张用来采集机器的性能指标数据，包括cpu，内存，磁盘，io等基本信息。

3、mysqld_exporter

mysql_exporter是用来网络MysQL或者Mariadb数据库干系指标的，mysql_exporter须要连接到数据库并有干系权限。

GitHub地址：https://github.com/prometheus/mysqld_exporter

4、snmp_exporter

SNMP Exporter 从 SNMP 做事中采集信息供应给 Promethers 监控系统利用。

GitHub地址：https://github.com/prometheus/snmp_exporter

2）Exporter的来源

从Exporter的来源上来讲，紧张分为两类：

1、社区供应

Prometheus社区供应了丰富的Exporter实现，涵盖了从根本举动步伐，中间件以及网络等各个方面的监控功能。
这些Exporter可以实现大部分通用的监控需求。
下表列举一些社区中常用的Exporter：

范围

常用Exporter

数据库

MySQL Exporter, Redis Exporter, MongoDB Exporter, MSSQL Exporter等

硬件

Apcupsd Exporter，IoT Edison Exporter， IPMI Exporter, Node Exporter等

行列步队

Beanstalkd Exporter, Kafka Exporter, NSQ Exporter, RabbitMQ Exporter等

存储

Ceph Exporter, Gluster Exporter, HDFS Exporter, ScaleIO Exporter等

HTTP做事

Apache Exporter, HAProxy Exporter, Nginx Exporter等

API做事

AWS ECS Exporter， Docker Cloud Exporter, Docker Hub Exporter, GitHub Exporter等

日志

Fluentd Exporter, Grok Exporter等

监控系统

Collectd Exporter, Graphite Exporter, InfluxDB Exporter, Nagios Exporter, SNMP Exporter等

其它

Blockbox Exporter, JIRA Exporter, Jenkins Exporter， Confluence Exporter等

2、用户自定义

除了直策应用社区供应的Exporter程序以外，用户还可以基于Prometheus供应的Client Library创建自己的Exporter程序，目前Promthues社区官方供应了对以下编程措辞的支持：Go、Java/Scala、Python、Ruby。
同时还有第三方实现的如：Bash、C++、Common Lisp、Erlang,、Haskeel、Lua、Node.js、PHP、Rust等。

3）Exporter的运行办法

从Exporter的运行办法上来讲，又可以分为：

1、独立运行

由于操作系统本身并不直接支持Prometheus，同时用户也无法通过直接从操作系统层面上供应对Prometheus的支持。
因此，用户只能通过独立运行一个程序的办法，通过操作系统供应的干系接口，将系统的运行状态数据转换为可供Prometheus读取的监控数据。
除了Node Exporter以外，比如MySQL Exporter、Redis Exporter等都是通过这种办法实现的。
这些Exporter程序扮演了一个中间代理人的角色（数据转换）。

2、集成到运用中（推举）

为了能够更好的监控系统的内部运行状态，有些开源项目如Kubernetes，ETCD等直接在代码中利用了Prometheus的Client Library，供应了对Prometheus的直接支持。
这种办法冲破的监控的界线，让运用程序可以直接将内部的运行状态暴露给Prometheus，适宜于一些须要更多自定义监控指标需求的项目。

4）Exporter规范

所有的Exporter程序都须要按照Prometheus的规范，返回监控的样本数据。
以Node Exporter为例，当访问/metrics地址时会返回以下内容：

直接curl拿不到数据，就得授权，可以参考我之前的文章

# 取前面10行$ curl -s -k --header "Authorization: Bearer $TOKEN" https://192.168.0.113:6443/metrics|head -10

# HELP aggregator_openapi_v2_regeneration_count [ALPHA] Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason.# TYPE aggregator_openapi_v2_regeneration_count counteraggregator_openapi_v2_regeneration_count{apiservice="",reason="startup"} 0aggregator_openapi_v2_regeneration_count{apiservice="k8s_internal_local_delegation_chain_0000000002",reason="update"} 0aggregator_openapi_v2_regeneration_count{apiservice="v1beta1.metrics.k8s.io",reason="add"} 0aggregator_openapi_v2_regeneration_count{apiservice="v1beta1.metrics.k8s.io",reason="update"} 0# HELP aggregator_openapi_v2_regeneration_duration [ALPHA] Gauge of OpenAPI v2 spec regeneration duration in seconds.# TYPE aggregator_openapi_v2_regeneration_duration gaugeaggregator_openapi_v2_regeneration_duration{reason="add"} 0.929158077aggregator_openapi_v2_regeneration_duration{reason="startup"} 0.509336209

Exporter返回的样本数据，紧张由三个部分组成：样本的一样平常注释信息（HELP），样本的类型注释信息（TYPE）和样本。

Prometheus会对Exporter相应的内容逐行解析：

如果当前行以# HELP开始，Prometheus将会按照以下规则对内容进行解析，得到当前的指标名称以及相应的解释信息：

# HELP <metrics_name> <doc_string>

如果当前行以# TYPE开始，Prometheus会按照以下规则对内容进行解析，得到当前的指标名称以及指标类型:

# TYPE <metrics_name> <metrics_type>

TYPE注释行必须涌如今指标的第一个样本之前。
如果没有明确的指标类型须要返回为untyped。
除了# 开头的所有行都会被视为是监控样本数据。
每一行样本须要知足以下格式规范:

metric_name ["{" label_name "=" " label_value " { "," label_name "=" " label_value " } [ "," ] "}"] value [ timestamp ]

五、node-exporter简介

Exporter是Prometheus的指标数据网络组件。
它卖力从目标Jobs网络数据，并把网络到的数据转换为Prometheus支持的时序数据格式。
和传统的指标数据网络组件不同的是，他只卖力网络，并不向Server端发送数据，而是等待Prometheus Server 主动抓取。

node-exporter用于采集node的运行指标，包括node的cpu、load、filesystem、meminfo、network等根本监控指标，类似于zabbix监控系统的的zabbix-agent

事理图如下：

1）检讨node-exporter做事

$ kubectl get pods -n monitoring -o wide|grep node-exporter# 查看pod内的node_exporter进程$ kubectl exec -it node-exporter-dc65j -n monitoring -- ps -ef|grep node_exporter# 获取容器ID$ docker ps |grep node_exporter# 查看docker 容器的pid$ docker inspect -f {{.State.Pid}} 8b3f0c3ea055# 再通过pid进入命名空间$ nsenter -n -t8303# 再查看进程$ ps -ef|grep node_exporter# 退出当前命名空间$ exit

设计到yaml文件

node-exporter-clusterRoleBinding.yaml # 角色绑定node-exporter-clusterRole.yaml # 角色node-exporter-daemonset.yaml # daemonset，容器配置，node-exporter配置node-exporter-prometheusRule.yaml # 采集规则node-exporter-serviceAccount.yaml # 做事账号# K8s集群内的Prometheus抓取监测数据是通过servicemonitor这个crd来完成的。 # 每个servicemonitor对应Prometheus中的一个target。 # 每个servicemonitor对应一个或多个service，卖力获取这些service上指定端口暴露的监测数据，并向Prometheus上报。 node-exporter-serviceMonitor.yaml node-exporter-service.yaml # 做事

2）做事自动创造

任何被监控的目标都须要事先纳入到监控系统中才能进行时序数据采集、存储、告警和展示，监控目标可以通过配置信息以静态形式指定，也可以让Prometheus通过做事创造的机制进行动态管理。

讲做事创造之前，先来讲一下传统配置办法

首先须要安装node-exporter，获取node metrics，并且暴露一个端口；然后去Prometheus Server的prometheus.yaml文件中在scarpe_config中添加node-exporter的job，配置node-exporter的地址和端口等信息；再然后，须要重启Prometheus做事；末了等待prometheus做事来拉取监控信息，就完成添加一个node-exporter监控的任务。

示例配置如下（prometheus.yml）：

- job_name: 'node-exporter' static_configs: - targets: ['192.168.0.113:9090'] #这里我修正了端口为9090

重启做事

$ systemctl restart prometheus

kube-prometheus做事自动创造

首先第一步和传统办法一样，支配一个node-exporter来获取监控项；然后编写一个ServiceMonitor通过labelSelector选择刚才支配的node-exporter，由于Operator在支配Prometheus的时候默认指定了Prometheus选择label为：prometheus: kube-prometheus的ServiceMonitor，以是只须要在ServiceMonitor上打上prometheus: kube-prometheus标签就可以被Prometheus选择了；完成以上两步就完成了对主机资源的监控，不须要改Prometheus配置文件，也不须要重启Prometheus做事，是不是很方便，Operator不雅观察到ServiceMonitor发生变革，会动态天生Prometheus配置文件，并担保配置文件实时生效。
六、添加k8s外部监控1）配置过程

一个项目开始可能很难实现全部容器化，比如数据库、CDH集群。
但是我们依然须要监控他们，如果分成两套prometheus不利于管理，以是我们统一添加这些监控到kube-prometheus中。

关于 additionalScrapeConfigs 属性的详细先容，我们可以利用 kubectl explain 命令来理解详细信息：

$ kubectl explain prometheus.spec.additionalScrapeConfigs

那么接下来我们新建 prometheus-additional.yaml 文件，添加额外监控组件配置scrape_configs。

$ cat << EOF > prometheus-additional.yaml- job_name: 'node-exporter-others' static_configs: - targets: - ...113:31190 - ...114:31190 - ...115:31190- job_name: 'mysql-exporter' static_configs: - targets: - ...104:9592 - ...125:9592 - ...128:9592- job_name: 'nacos-exporter' metrics_path: '/nacos/actuator/prometheus' static_configs: - targets: - ...113:8848 - ...114:8848 - ...115:8848- job_name: 'elasticsearch-exporter' static_configs: - targets: - ...113:9597 - ...114:9597 - ...115:9597- job_name: 'zookeeper-exporter' static_configs: - targets: - ...113:9595 - ...114:9595 - ...115:9595- job_name: 'nginx-exporter' static_configs: - targets: - ...113:9593 - ...114:9593 - ...115:9593- job_name: 'redis-exporter' static_configs: - targets: - ...113:9594- job_name: 'redis-exporter-targets' static_configs: - targets: - redis://...113:7090 - redis://...114:7090 - redis://...115:7091 metrics_path: /scrape relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: ...113:9594EOF

然后我们须要将这些监控配置以secret资源类型存储到k8s集群中。

$ kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring2）修正 prometheus 文件

additionalScrapeConfigs：增加额外监控项配置，详细配置查看第五部分“添加k8s外部监控”。

$ vi prometheus-prometheus.yaml

添加如下内容

additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml

检讨

$ grep -n -C5 'additionalScrapeConfigs' prometheus-prometheus.yaml

是配置生效

$ kubectl apply -f prometheus-prometheus.yaml七、Pushgateway 简介

Pushgateway 是 Prometheus 生态中一个主要工具，利用它的缘故原由紧张是：

Prometheus 采取 pull 模式，可能由于不在一个子网或者防火墙缘故原由，导致Prometheus 无法直接拉取各个 target 数据。
在监控业务数据的时候，须要将不同数据汇总, 由 Prometheus 统一网络。

由于以上缘故原由，不得不该用 pushgateway，但在利用之前，有必要理解一下它的一些弊端：

将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了，受影响比多个 target 大。
Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个节点有效。
Pushgateway 可以持久化推送给它的所有监控数据。
纵然你的监控已经下线，prometheus 还会拉取到旧的监控数据，须要手动清理 pushgateway 不要的数据。

未完待续~