开发一个接口监控的Prometheus Exporter

!! 大家好，我是乔克，一个爱折腾的运维工程，一个睡觉都被自己丑醒的云原生爱好者。
作者：乔克
公众号：运维开发故事
博客：www.jokerbai.com

想必大家对于黑盒监控都不陌生，我们经常使用blackbox_exporter来进行黑盒监控，在K8s中进行黑盒监控可以参考这里。
既然已经有成熟的工具，为何自己还要再来尝试开发一个？
我说是为了学习，你信吗？
既然是为了学习，整体逻辑就不用太复杂，主要需要实现以下功能：
可以通过配置文件的方式增加监控项
吐出Prometheus可收集指标
支持tcp和http探测
支持配置检测频率
写在前面
在正式开始之前，先简单介绍一下Prometheus以及Prometheus Exporter。
Prometheus是CNCF的一个开源监控工具，是近几年非常受欢迎的开源项目之一。在云原生场景下，经常使用它来进行指标监控。
Prometheus支持4种指标类型：
Counter（计数器）：只增不减的指标，比如请求数，每来一个请求，该指标就会加1。
Gauge（仪表盘）：动态变化的指标，比如CPU，可以看到它的上下波动。
Histogram（直方图）：数据样本分布情况的指标，它将数据按Bucket进行划分，并计算每个Bucket内的样本的一些统计信息，比如样本总量、平均值等。
Summary（摘要）：类似于Histogram，也用于表示数据样本的分布情况，但同时展示更多的统计信息，如样本数量、总和、平均值、上分位数、下分位数等。
在实际使用中，常常会将这些指标组合起来使用，以便能更好的观测系统的运行状态和性能指标。
这些指标从何而来？
Prometheus Exporter就是用来收集和暴露指标的工具，通常情况下是Prometheus Exporter收集并暴露指标，然后Prometheus收集并存储指标，使用Grafana或者Promethues UI可以查询并展示指标。
Prometheus Exporter主要包含两个重要的组件：
Collector：收集应用或者其他系统的指标，然后将其转化为Prometheus可识别收集的指标。
Exporter：它会从Collector获取指标数据，并将其转成为Prometheus可读格式。
那Prometheus Exporter是如何生成Prometheus所支持的4种类型指标（Counter、Gauge、Histogram、Summary）的呢？
Prometheus提供了客户端包github.com/prometheus/client_golang，通过它可以声明不通类型的指标，比如：
（1）针对Counter类型
import (
“net/http”

“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/client_golang/prometheus/promhttp”
)

func main() {
// 创建一个Counter指标
counterMetric := prometheus.NewCounter(prometheus.CounterOpts{
Name: “example_counter”, // 指标名称
Help: “An example counter metric.”, // 指标帮助信息
})

// 注册指标
prometheus.MustRegister(counterMetric)

// 增加指标值
counterMetric.Inc()

// 创建一个HTTP处理器来暴露指标
http.Handle(“/metrics”, promhttp.Handler())

// 启动Web服务器
http.ListenAndServe(“:8080”, nil)
}
（2）针对Grauge类型
import (
“net/http”

“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/client_golang/prometheus/promhttp”
)

func main() {
// 创建一个Gauge指标
guageMetric := prometheus.NewGauge(prometheus.GaugeOpts{
Name: “example_gauge”, // 指标名称
Help: “An example gauge metric.”, // 指标帮助信息
})

// 注册指标
prometheus.MustRegister(guageMetric)

// 设置指标值
guageMetric.Set(100)

// 创建一个HTTP处理器来暴露指标
http.Handle(“/metrics”, promhttp.Handler())

// 启动Web服务器
http.ListenAndServe(“:8080”, nil)
}
（3）针对Histogram类型
import (
“math/rand”
“net/http”
“time”

“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/client_golang/prometheus/promhttp”
)

func main() {
// 创建一个Histogram指标
histogramMetric := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: “example_histogram”, // 指标名称
Help: “An example histogram metric.”, // 指标帮助信息
Buckets: prometheus.LinearBuckets(0, 10, 10), // 设置桶宽度
})

// 注册指标
prometheus.MustRegister(histogramMetric)

// 定期更新指标值
go func() {
for {
time.Sleep(time.Second)
histogramMetric.Observe(rand.Float64() * 100)
}
}()

// 创建一个HTTP处理器来暴露指标
http.Handle(“/metrics”, promhttp.Handler())

// 启动Web服务器
http.ListenAndServe(“:8080”, nil)
}
（4）针对Summary类型
import (
“math/rand”
“net/http”
“time”

“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/client_golang/prometheus/promhttp”
)

func main() {
// 创建一个Summary指标
summaryMetric := prometheus.NewSummary(prometheus.SummaryOpts{
Name: “example_summary”, // 指标名称
Help: “An example summary metric.”, // 指标帮助信息
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}, // 设置分位数和偏差
})

// 注册指标
prometheus.MustRegister(summaryMetric)

// 定期更新指标值
go func() {
for {
time.Sleep(time.Second)
summaryMetric.Observe(rand.Float64() * 100)
}
}()

// 创建一个HTTP处理器来暴露指标
http.Handle(“/metrics”, promhttp.Handler())

// 启动Web服务器
http.ListenAndServe(“:8080”, nil)
}
上面的例子都是直接在创建指标的时候声明了指标描述，我们也可以先声明描述，再创建指标，比如：
import (
“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/client_golang/prometheus/promhttp” “net/http”)

// 1. 定义一个结构体，用于存放描述信息
type Exporter struct {
summaryDesc *prometheus.Desc
}

// 2. 定义一个Collector接口，用于存放两个必备函数，Describe和Collect
type Collector interface {
Describe(chan<- *prometheus.Desc)
Collect(chan<- prometheus.Metric)
}

// 3. 定义两个必备函数Describe和Collect
func (e *Exporter) Describe(ch chan<- *prometheus.Desc) {
// 将描述信息放入队列
ch <- e.summaryDesc
}

func (e *Exporter) Collect(ch chan<- prometheus.Metric) {
// 采集业务指标数据
ch <- prometheus.MustNewConstSummary(
e.summaryDesc, // 将指标数据与自定义描述信息绑定
4711, 403.34, // 是该指标数据的值，这里表示该 Summary 指标的计数值和总和值。
map[float64]float64{0.5: 42.3, 0.9: 323.3}, // 是一个 map，其中包含了 Summary 指标的 quantile 值及其对应的值。例如，0.5 表示 50% 的样本值处于这个值以下，0.9 表示 90% 的样本值处于这个值以下
“200”, “get”, // 是指标的标签值，用于标识和区分指标实例的特征。这些标签值与在 NewExporter 中创建的 prometheus.NewDesc 函数的第三个参数相对应。
)
}

// 4. 定义一个实例化函数，用于生成prometheus数据
func NewExporter() *Exporter {
return &Exporter{
summaryDesc: prometheus.NewDesc(
“example_summary”, // 指标名
“An example summary metric.”, // 帮助信息
[]string{“code”, “method”}, // 变量标签名，值是可变的
prometheus.Labels{“owner”: “joker”}, // 常量标签，固定的
),
}
}

func main() {
// 实例化exporter
exporter := NewExporter()

// 注册指标
prometheus.MustRegister(exporter)

// 创建一个HTTP处理器来暴露指标
http.Handle(“/metrics”, promhttp.Handler())

// 启动Web服务器
http.ListenAndServe(“:8080”, nil)
}
通过上面的介绍，对于怎么创建一个Prometheus Exporter是不是有了初步的了解？主要可分为下面几步：
定义一个Exporter结构体，用于存放描述信息
实现Collector接口
实例化exporter
注册指标
暴露指标
现在开始
有了一定的基本知识后，我们开始开发自己的Exporter。
我们再来回顾一下需要实现的功能：
可以通过配置文件的方式增加监控项
吐出Prometheus可收集指标
支持tcp和http探测
支持配置检测频率
（1）我们的采集对象是通过配置文件加载的，所以我们可以先确定配置文件的格式，我希望的是如下格式：

url: “http://www.baidu.com”
name: “百度测试”
protocol: “http”
check_interval: 2s
url: “localhost:2222”
name: “本地接口2222检测”
protocol: “tcp”
其中check_interval是检测频率，如果不写，默认是1s。
我们需要解析配置文件里的内容，所以需要先定义配置文件的结构体，如下：
// InterfaceConfig 定义接口配置结构
type InterfaceConfig struct {
Name string yaml:"name"
URL string yaml:"url"
Protocol string yaml:"protocol"
CheckInterval time.Duration yaml:"check_interval,omitempty"
}
然后，我们使用的是yaml格式的配置文件，保存在config.yaml文件中，意味着我们需要解析config.yaml这个文件，然后再解析。
// loadConfig 从配置文件加载接口配置
func loadConfig(configFile string) ([]InterfaceConfig, error) {
config := []InterfaceConfig{} // 从文件加载配置
data, err := ioutil.ReadFile(configFile)
if err != nil {
return nil, err
} // 解析配置文件
err = yaml.Unmarshal(data, &config)
if err != nil {
return nil, err
} // 设置默认的检测时间间隔为1s
for i := range config {
if config[i].CheckInterval == 0 {
config[i].CheckInterval = time.Second
}
} return config, nil
}
因为监控对象可以是多个，所以使用[]InterfaceConfig{}来保存多个对象。
（2）定义接口探测的Collector接口，实现Promethues Collector接口
type HealthCollector struct {
interfaceConfigs []InterfaceConfig
healthStatus prometheus.Desc } 这里将配置文件也放进去，期望在初始化HealthCollector的时候将配置文件一并加载了。 // NewHealthCollector 创建HealthCollector实例 func NewHealthCollector(configFile string) (HealthCollector, error) {
// 从配置文件加载接口配置
config, err := loadConfig(configFile)
if err != nil {
return nil, err
} // 初始化HealthCollector
collector := &HealthCollector{
interfaceConfigs: config,
healthStatus: prometheus.NewDesc(
“interface_health_status”,
“Health status of the interfaces”,
[]string{“name”, “url”, “protocol”},
nil, ),
} return collector, nil
}
在这里定义了[]string{“name”, “url”, “protocol”}动态标签，方便使用PromQL查询指标和做监控告警。
（3）实现Prometheus Collector接口的Describe和Collect方法
// Describe 实现Prometheus Collector接口的Describe方法
func (c *HealthCollector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.healthStatus
}

// Collect 实现Prometheus Collector接口的Collect方法
func (c *HealthCollector) Collect(ch chan<- prometheus.Metric) {
var wg sync.WaitGroup

for _, iface := range c.interfaceConfigs {
wg.Add(1)

  go func(iface InterfaceConfig) {  
     defer wg.Done()  

     // 检测接口健康状态  
     healthy := c.checkInterfaceHealth(iface)  

     // 创建Prometheus指标  
     var metricValue float64  
     if healthy {  
        metricValue = 1  
     } else {  
        metricValue = 0  
     }  
     ch <- prometheus.MustNewConstMetric(  
        c.healthStatus,  
        prometheus.GaugeValue,  
        metricValue,  
        iface.Name,  
        iface.URL,  
        iface.Protocol,  
     )  
  }(iface)

}

wg.Wait()
}
在Collect方法中，我们通过checkInterfaceHealth来获取检测对象的监控状态，然后创建Prometheus对应的指标，这里规定1就是存活状态，0就是异常状态。
（4）实现http和tcp检测方法
// checkInterfaceHealth 检测接口健康状态
func (c *HealthCollector) checkInterfaceHealth(iface InterfaceConfig) bool {
switch iface.Protocol {
case “http”:
return c.checkHTTPInterfaceHealth(iface)
case “tcp”:
return c.checkTCPInterfaceHealth(iface)
default:
return false
}
}

// checkHTTPInterfaceHealth 检测HTTP接口健康状态
func (c *HealthCollector) checkHTTPInterfaceHealth(iface InterfaceConfig) bool {
client := &http.Client{
Timeout: 5 * time.Second,
}

resp, err := client.Get(iface.URL)
if err != nil {
return false
}
defer resp.Body.Close()

return resp.StatusCode == http.StatusOK
}

// checkTCPInterfaceHealth 检测TCP接口健康状态
func (c HealthCollector) checkTCPInterfaceHealth(iface InterfaceConfig) bool { conn, err := net.DialTimeout(“tcp”, iface.URL, 5time.Second)
if err != nil {
return false
}
defer conn.Close()

return true
}
http和tcp的检测方法这里比较粗暴，http的就请求一次查看状态码，tcp的就检查能不能建立连接。
（5）创建main方法，完成开发
func main() {
// 解析命令行参数
configFile := flag.String(“config”, “”, “Path to the config file”)
flag.Parse()

if *configFile == “” {
// 默认使用当前目录下的config.yaml
*configFile = “config.yaml”
}

// 加载配置文件
collector, err := NewHealthCollector(*configFile)
if err != nil {
fmt.Println(“Failed to create collector:”, err)
return
}

// 注册HealthCollector
prometheus.MustRegister(collector)

// 启动HTTP服务，暴露Prometheus指标
http.Handle(“/metrics”, promhttp.Handler())
err = http.ListenAndServe(“:2112”, nil)
if err != nil {
fmt.Println(“Failed to start HTTP server:”, err)
os.Exit(1)
}
}
在这里增加了解析命令行参数，支持通过–config的方式来指定配置文件，如果不指定默认使用config.yaml。
到这里就开发完了，虽然没有严格在写在前面中梳理的开发步骤，但是整体大差不差。
应用部署
开发出来的东西如果不上线，那就等于没做，你的KPI是0，领导才不关心你做事的过程，只看结果。所以不论好或是不好，先让它跑起来才是真的好。
（1）编写Dockerfile，当然要用容器来运行应用了。
FROM golang:1.19 AS build-env
ENV GOPROXY https://goproxy.cn
ADD . /go/src/app
WORKDIR /go/src/app
RUN go mod tidy
RUN GOOS=linux GOARCH=386 go build -v -o /go/src/app/go-interface-health-check

FROM alpine
COPY –from=build-env /go/src/app/go-interface-health-check /usr/local/bin/go-interface-health-check
COPY –from=build-env /go/src/app/config.yaml /opt/
WORKDIR /opt
EXPOSE 2112
CMD [ “go-interface-health-check”,”–config=/opt/config.yaml” ]
（2）编写docker-compose配置文件，这里直接使用docker-compose部署，相比K8s的yaml来说更简单快捷。
version: ‘3.8’
services:
haproxy:
image: go-interface-health-check:v0.3
container_name: interface-health-check
network_mode: host
restart: unless-stopped
command: [ “go-interface-health-check”,”–config=/opt/config.yaml” ]
volumes:
– /u01/interface-health-check:/opt
– /etc/localtime:/etc/localtime:ro
user: root
logging:
driver: json-file
options:
max-size: 20m
max-file: 100
使用docker-compose up -d运行容器后，就可以使用curl http://127.0.0.1:2112/metrics查看指标。

收集展示
Prometheus的搭建这里不再演示，如果有不清楚的，可以移步这里。
在Prometheus里配置抓取指标的配置：
scrape_configs:
– job_name: ‘interface-health-check’
static_configs:
– targets: [‘127.0.0.1:2112’]
配置完重载prometheus，可以查看抓取的target是否存活。

最后，为了方便展示，可以创建一个Grafana面板，比如：

当然，可以根据需要创建告警规则，当interface_health_status==0表示接口异常。
最后
以上就完成了自己开发一个Prometheus Exporter，上面的例子写的比较简单粗暴，可以根据实际情况来进行调整。
前两天刷到冯唐的一句话：“越是底层的人，处理人际关系的能力就越差，你越往上走，你就会发现，你以为人家天天在研究事，其实他们在研究人。”
你怎么理解这句话？
链接
[1] https://www.yuque.com/coolops/kubernetes/dff1cg
[2] https://www.yuque.com/coolops/kubernetes/wd2vts
[3] https://github.com/prometheus/client_golang/blob/main/prometheus/examples_test.go
[4] https://www.cnblogs.com/0x00000/p/17557743.html

最后，求关注。如果你还想看更多优质原创文章，欢迎关注我们的公众号「运维开发故事」。

如果我的文章对你有所帮助，还请帮忙点赞、在看、转发一下，你的支持会激励我输出更高质量的文章，非常感谢！

你还可以把我的公众号设为「星标」，这样当公众号文章更新时，你会在第一时间收到推送消息，避免错过我的文章更新。

我是乔克，《运维开发故事》公众号团队中的一员，一线运维农民工，云原生实践者，这里不仅有硬核的技术干货，还有我们对技术的思考和感悟，欢迎关注我们的公众号，期待和你一起成长！

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/412352.html

开发一个接口监控的Prometheus Exporter

作者专栏