spring boot actuator搭配jenkins实现发布健康检查和报警
背景
当前项目使用jenkins发布,服务重启使用systemctl restart xxx.service,但是具体服务有没有起来,jenkins并没有感知,需要人工介入观察
目标
实现jenkins自动检查服务启动状态,如果失败并报警
方案
1、使用spring boot actuator实现服务状态观察,提供http接口方式
2、curl请求健康检查接口,解析返回结果感知服务状态
3、如果失败,通过钉钉报警
实施
服务集成spring boot actuator
maven坐标
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
如果需要显示components详情,需要在配置文件加上配置
management:
endpoint:
health:
show-details: always
健康检查地址:http://ip:port/${server.servlet.context-path}/actuator/health
比如:manager-api服务,ip:127.0.0.1,端口:8086,context-path:/manager,那么健康检查地址就是http://127.0.0.1:8086/manager/actuator/health
可以curl访问下,响应如下
{"status":"UP","components":{"db":{"status":"UP","details":{"database":"MySQL","validationQuery":"isValid()"}},"discoveryComposite":{"status":"UP","components":{"discoveryClient":{"status":"UP","details":{"services":["api-gateway","infrastructure-api","manager-api","ticket-api","user-api"]}}}},"diskSpace":{"status":"UP","details":{"total":62799216640,"free":51376033792,"threshold":10485760,"exists":true}},"nacosConfig":{"status":"UP"},"nacosDiscovery":{"status":"UP"},"ping":{"status":"UP"},"reactiveDiscoveryClients":{"status":"UP","components":{"Simple Reactive Discovery Client":{"status":"UP","details":{"services":[]}}}},"redis":{"status":"UP","details":{"version":"6.2.6"}},"refreshScope":{"status":"UP"},"sentinel":{"status":"UP","details":{"dataSource":{},"enabled":true,"dashboard":{"description":"dashboard isn't configured","status":"UNKNOWN"}}}}}
如果没有开启show-details
响应比较简单
{"status":"UP"}
status = UP表示服务启动成功
jenkins检查脚本
print("${server_name} heatlh-check begin") // debug
def url = "http://127.0.0.1:" + service_map[server_name]["port"] + service_map[server_name]["path"] + "/actuator/health" // 构建健康检查地址;因为防火墙的原因,并不是直接从jenkins直接curl,而是通过ssh远程执行,具体见下文
def i = 0
def success = false
while (i < 60) { // 服务启动需要一定时间,因此需要重试等待
i += 1
print("${server_name} heatlh-check try $i")
def response = sh script: """ssh -p ${__Deploy_SSH_Port} ${__Deploy_Account}@${__Deploy_IP} "curl -s -w'%{http_code}' '$url' || ls > /dev/null " """, returnStdout: true, returnStatus: false // returnStdout: true 表示返回标准输出,这样才可以使用变量接收;服务未起来时curl命令返回状态码7会导致脚本退出,因此加上 || xxx保证返回状态码为0
print(response)
if (response == "000") { // curl失败时,http_status = 000
print("${server_name} heatlh-check response null")
sleep(5)
continue
}
def status_code = response.substring(response.size() - 3, response.size()) // 解析http_status
if (status_code == "000") {
print("${server_name} heatlh-check response null")
sleep(5)
continue
}
success = true
if (status_code == "404") {
print("${server_name} 未配置heatlh-check")
break
}
if (status_code != "200" && status_code != "503") {
print("${server_name} heatlh-check错误")
break
}
def body = response.substring(0, response.size() - 3) // 解析body
def json = readJSON text: body // json解析
if ("UP" == json.status) {
print("${server_name} heatlh-check passed")
break
}
// 启动失败
def message = "[监控报警]\n服务部署失败\njob=${JOB_NAME}\nserver_name=${server_name}\nip=${__Deploy_IP}\ndetail=${body}"
def param = ["msgtype": "text", "text": ["content": message]]
def data = writeJSON returnText: true, json: param // object转json
data = data.replaceAll(/'/, """'"'"'""") // 单引号转义成'"'"',方便shell拼接
sh """curl -H 'Content-Type: application/json; charset=utf-8' -d '${data}' ${ding_url} """ // 通过钉钉webhook发送报警
break
}
if (!success) {
// 启动失败
def message = "[监控报警]\n服务部署失败\njob=${JOB_NAME}\nserver_name=${server_name}\nip=${__Deploy_IP}\ndetail=${url}"
def param = ["msgtype": "text", "text": ["content": message]]
def data = writeJSON returnText: true, json: param
sh """curl -H 'Content-Type: application/json; charset=utf-8' -d '${data}' ${ding_url} """
}