SpringCloud 健康检查机制深度解析
·
SpringCloud 健康检查机制深度解析
作者:薪火铺子(薪火铺子)
如果本文对你有帮助,欢迎关注【薪火铺子】,回复「深入理解SpringCloud与实战」获取全套学习笔记
一、问题背景
在微服务架构中,服务实例的健康状态是流量调度和故障剔除的关键依据:
- 注册中心需要知道哪些实例可用
- 负载均衡器需要避免将请求发送到故障实例
- 运维平台需要实时监控服务状态
- 自动扩缩容需要根据健康实例数决策
Spring Cloud 提供了完整的健康检查体系,涵盖应用健康、服务发现健康、自定义健康等多个维度。
二、健康检查整体架构
2.1 组件关系
2.2 健康检查端点
GET /actuator/health
GET /actuator/health/liveness
GET /actuator/health/readiness
响应示例:
{
"status": "UP",
"components": {
"discovery": {
"status": "UP",
"details": {
"clients": {
"consul": {
"status": "UP"
}
}
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 500000000000,
"free": 450000000000,
"threshold": 10485760
}
},
"ping": {
"status": "UP"
}
}
}
三、核心接口解析
3.1 HealthIndicator 接口
public interface HealthIndicator {
/**
* 返回健康状态
* @return 健康信息
*/
Health health();
}
这是一个函数式接口,所有健康检查器都需要实现。
3.2 Health 响应模型
public class Health {
// 健康状态
private Status status;
// 详细信息
private Map<String, Object> details;
// 枚举状态
public enum Status {
UP, // 健康
DOWN, // 故障
OUT_OF_SERVICE, // 停服
UNKNOWN, // 未知
UPGRADED // 升级中
}
// Builder 模式构建
public static Builder status(Status status) {
return new Builder(status);
}
}
3.3 ReactiveHealthIndicator 响应式接口
public interface ReactiveHealthIndicator {
/**
* 异步返回健康状态
* @return 健康信息 Mono
*/
Mono<Health> health();
// 默认实现
default Mono<Health> healthWithDefault(String name) {
return health().onErrorResume(ex ->
Mono.just(Health.down(ex).withDetail("error", name).build())
);
}
}
四、聚合机制
4.1 CompositeHealthIndicator 组合模式
4.2 聚合策略实现
public class OrderedHealthAggregator implements HealthAggregator {
@Override
public Health aggregate(Map<String, Health> healths) {
// 按优先级排序
List<String> sortedKeys = healths.keySet().stream()
.sorted(Comparator.comparingInt(this::getOrder))
.collect(Collectors.toList());
// 检查是否有 DOWN 或 OUT_OF_SERVICE
for (String key : sortedKeys) {
Health health = healths.get(key);
if (health.getStatus() == Status.DOWN) {
return build(Status.DOWN, healths, key);
}
}
// 检查是否有 UNKNOWN
for (String key : sortedKeys) {
Health health = healths.get(key);
if (health.getStatus() == Status.UNKNOWN) {
return build(Status.UNKNOWN, healths, key);
}
}
// 全部 UP
return build(Status.UP, healths, null);
}
private int getOrder(String key) {
// 获取排序顺序
return Ordered.LOWEST_PRECEDENCE;
}
}
4.3 聚合流程图
五、内置健康检查器
5.1 内置指标一览
| 检查器 | 状态 | 说明 |
|---|---|---|
| PingHealthIndicator | ✅ | ICMP Ping 检测 |
| DiskSpaceHealthIndicator | ✅ | 磁盘空间检测 |
| DataSourceHealthIndicator | ✅ | 数据库连接池 |
| RedisHealthIndicator | ✅ | Redis 连接 |
| RabbitHealthIndicator | ✅ | RabbitMQ 连接 |
| MongoHealthIndicator | ✅ | MongoDB 连接 |
| ElasticsearchHealthIndicator | ✅ | ES 集群状态 |
| DiscoveryClientHealthIndicator | ✅ | 服务发现状态 |
5.2 DiscoveryClientHealthIndicator
public class DiscoveryClientHealthIndicator
implements HealthIndicator, Ordered {
private final DiscoveryClient discoveryClient;
private final ServiceRegistry<?> serviceRegistry;
@Override
public Health health() {
try {
// 注册中心查询
String description = discoveryClient.description();
// 尝试获取自身实例
List<ServiceInstance> instances =
discoveryClient.getInstances("sc-cloud-discovery");
if (instances.isEmpty()) {
return Health.unknown()
.withDetail("error", "no instances found")
.build();
}
return Health.up()
.withDetail("services", discoveryClient.getServices().size())
.build();
} catch (Exception e) {
return Health.down(e)
.withDetail("error", e.getMessage())
.build();
}
}
}
5.3 ServiceRegistryHealthIndicator
public class ServiceRegistryHealthIndicator
implements HealthIndicator, Ordered {
private final ServiceRegistry<?> serviceRegistry;
@Override
public Health health() {
Registration registration =
serviceRegistry.getRegistration();
if (registration == null) {
return Health.unknown()
.withDetail("status", "UNKNOWN")
.withDetail("error", "No registration found")
.build();
}
return Health.up()
.withDetail("status", "REGISTERED")
.withDetail("serviceId", registration.getServiceId())
.withDetail("instanceId", registration.getInstanceId())
.build();
}
}
六、健康状态传播
6.1 注册中心健康检查
6.2 Nacos 健康检查机制
// 客户端心跳续约
public class NacosAutoServiceRegistration
extends AbstractAutoServiceRegistration<Registration> {
@Override
public void start() {
// 发送心跳
namingService.registerInstance(
registration.getServiceId(),
registration.getHost(),
registration.getPort(),
registration.getMetadata()
);
}
}
6.3 Consul 健康检查
spring:
cloud:
consul:
discovery:
healthCheckPath: /actuator/health
healthCheckInterval: 10s
deregister: true
fail-fast: true
七、自定义健康检查器
7.1 实现 HealthIndicator
@Component
public class CustomServiceHealthIndicator implements HealthIndicator {
@Override
public Health health() {
try {
// 自定义检查逻辑
boolean healthy = checkCustomService();
if (healthy) {
return Health.up()
.withDetail("service", "custom-service")
.withDetail("version", "1.0")
.build();
} else {
return Health.down()
.withDetail("service", "custom-service")
.withDetail("reason", "service unavailable")
.build();
}
} catch (Exception e) {
return Health.down(e)
.withDetail("service", "custom-service")
.build();
}
}
private boolean checkCustomService() {
// 实现健康检查逻辑
return true;
}
}
7.2 实现 ReactiveHealthIndicator
@Component
public class CustomReactiveHealthIndicator
implements ReactiveHealthIndicator {
private final WebClient webClient;
@Override
public Mono<Health> health() {
return webClient.get()
.uri("/health")
.retrieve()
.bodyToMono(JsonNode.class)
.map(response -> Health.up()
.withDetail("custom", "OK")
.build())
.onErrorResume(ex ->
Mono.just(Health.down(ex).build())
);
}
}
7.3 带依赖检查的健康检查器
@Component
public class DependencyHealthIndicator implements HealthIndicator {
@Autowired
private DataSource dataSource;
@Autowired
private RestTemplate restTemplate;
@Override
public Health health() {
Health.Builder builder = new Health.Builder();
// 检查数据库
try {
jdbcTemplate.execute("SELECT 1");
builder.up().withDetail("database", "OK");
} catch (Exception e) {
builder.down().withDetail("database", e.getMessage());
}
// 检查外部服务
try {
restTemplate.getForObject(
"http://external-service/health",
String.class
);
builder.withDetail("external", "OK");
} catch (Exception e) {
builder.withDetail("external", e.getMessage());
}
return builder.build();
}
}
八、实战演示
8.1 配置健康检查端点
management:
endpoints:
web:
exposure:
include: health,info,metrics
base-path: /actuator
endpoint:
health:
show-details: always
probes:
enabled: true
health:
livenessState:
enabled: true
readinessState:
enabled: true
8.2 自定义聚合策略
@Configuration
public class CustomHealthAggregatorConfig {
@Bean
public HealthAggregator customHealthAggregator() {
return new CustomHealthAggregator();
}
}
public class CustomHealthAggregator implements HealthAggregator {
@Override
public Health aggregate(Map<String, Health> healths) {
// 自定义聚合逻辑:计算平均分
int totalHealth = 0;
int count = 0;
for (Map.Entry<String, Health> entry : healths.entrySet()) {
Status status = entry.getValue().getStatus();
totalHealth += getScore(status);
count++;
}
double avgScore = count > 0 ? totalHealth / (double) count : 0;
Status overallStatus = avgScore >= 80 ? Status.UP :
avgScore >= 50 ? Status.UNKNOWN : Status.DOWN;
return Health.status(overallStatus)
.withDetail("averageScore", avgScore)
.withDetail("healths", healths)
.build();
}
private int getScore(Status status) {
return switch (status) {
case UP -> 100;
case UPGRADED -> 90;
case UNKNOWN -> 50;
case OUT_OF_SERVICE -> 20;
case DOWN -> 0;
};
}
}
8.3 Kubernetes Probes 集成
# application.yml
spring:
cloud:
kubernetes:
discovery:
health-endpoint: /actuator/health/liveness
management:
health:
livenessState:
enabled: true
readinessState:
enabled: true
// 自定义 Liveness 和 Readiness
@Component
public class KubernetesHealthIndicator
implements LivenessStateHealthIndicator,
ReadinessStateHealthIndicator {
@Override
public Health health(boolean status) {
// 返回 Liveness 健康状态
return status ? Health.up() : Health.down();
}
}
九、避坑指南
坑 1:健康检查影响应用性能
问题:频繁的健康检查导致性能下降
解决:
# 调整检查间隔
spring:
cloud:
consul:
discovery:
healthCheckInterval: 30s # 延长检查间隔
坑 2:注册中心心跳超时导致实例被剔除
问题:网络抖动导致实例被误删
解决:
spring:
cloud:
nacos:
discovery:
heartBeatInterval: 5000
heartBeatTimeout: 15000
ipDeleteTimeout: 30000
坑 3:@RefreshScope 下健康检查器未刷新
问题:配置更新后健康检查器状态未变
原因:HealthIndicator 默认是单例
解决:
@RefreshScope
@Component
public class RefreshableHealthIndicator implements HealthIndicator {
// 每次健康检查都会重新创建
}
坑 4:多注册中心时健康检查冲突
问题:同时使用 Nacos 和 Consul 导致冲突
解决:
spring:
cloud:
nacos:
discovery:
enabled: true
consul:
discovery:
enabled: false # 禁用 Consul
坑 5:自定义 HealthIndicator 未注册
问题:自定义健康检查器不生效
原因:Bean 未被扫描到或未实现正确接口
解决:
// 确保添加 @Component
@Component
public class MyHealthIndicator implements HealthIndicator {
@Override
public Health health() {
return Health.up().build();
}
}
// 或者显式注册
@Configuration
public class HealthConfig {
@Bean
public HealthIndicator myHealthIndicator() {
return new MyHealthIndicator();
}
}
十、Kubernetes 探针集成
10.1 三种探针类型
| 探针 | 用途 | 失败后果 |
|---|---|---|
| LivenessProbe | 容器是否存活 | 重启容器 |
| ReadinessProbe | 容器是否就绪 | 移除流量 |
| StartupProbe | 启动完成检查 | 启动期间禁用其他探针 |
10.2 Spring Boot Actuator 探针配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
base-path: /actuator
endpoint:
health:
# 显示详细健康信息
show-details: always
# 启用 K8s 探针
probes:
enabled: true
# 自定义探针分组
group:
liveness:
include: livenessState,db,redis
readiness:
include: readinessState,db,redis,consul
health:
# Liveness 探针配置
livenessState:
enabled: true
# Readiness 探针配置
readinessState:
enabled: true
10.3 Kubernetes Pod 配置
apiVersion: v1
kind: Pod
metadata:
name: user-service
spec:
containers:
- name: user-service
image: user-service:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30
periodSeconds: 10
10.4 探针执行流程
10.5 自定义探针分组
@Configuration
public class HealthGroupConfig {
@Bean
public HealthIndicator customLivenessIndicator() {
return () -> {
// 自定义 Liveness 检查逻辑
boolean healthy = checkCriticalResources();
return healthy ?
Health.up().build() :
Health.down().build();
};
}
@Bean
public HealthIndicator customReadinessIndicator() {
return () -> {
// 自定义 Readiness 检查逻辑
boolean ready = checkDependencies();
return ready ?
Health.up().build() :
Health.down().build();
};
}
private boolean checkCriticalResources() {
// 检查关键资源
return true;
}
private boolean checkDependencies() {
// 检查依赖服务
return true;
}
}
10.6 Nacos 健康检查集成
spring:
cloud:
nacos:
discovery:
# 启用健康检查
health-check:
enabled: true
# 健康检查路径
path: /actuator/health
# 检查间隔
interval: 5000
# 失败次数阈值
failure-threshold: 3
10.7 注册中心与 K8s 探针联动
十一、总结
Spring Cloud 健康检查机制是保障服务可用性的关键组件:
| 维度 | 内容 |
|---|---|
| 核心接口 | HealthIndicator、ReactiveHealthIndicator |
| 聚合机制 | CompositeHealthIndicator + HealthAggregator |
| 内置检查器 | DiskSpace、DataSource、Redis、Consul、Nacos |
| 状态传播 | 注册中心 → 负载均衡 → 请求路由 |
| 自定义扩展 | 实现接口或继承抽象类 |
| K8s 探针 | Liveness + Readiness + Startup Probe |
| 探针分组 | livenessState + readinessState 自定义 |
健康检查不仅是运维监控的基础,更是服务网格、弹性伸缩、故障剔除等高级特性的前提。深入理解其原理,才能构建真正可靠的服务治理体系。
AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。
更多推荐

所有评论(0)