SpringCloud 健康检查机制深度解析

作者:薪火铺子(薪火铺子)
如果本文对你有帮助,欢迎关注【薪火铺子】,回复「深入理解SpringCloud与实战」获取全套学习笔记

一、问题背景

在微服务架构中,服务实例的健康状态是流量调度和故障剔除的关键依据:

  • 注册中心需要知道哪些实例可用
  • 负载均衡器需要避免将请求发送到故障实例
  • 运维平台需要实时监控服务状态
  • 自动扩缩容需要根据健康实例数决策

Spring Cloud 提供了完整的健康检查体系,涵盖应用健康服务发现健康自定义健康等多个维度。

二、健康检查整体架构

2.1 组件关系

LB

Registry

Actuator

HealthAggregator

HealthIndicator

CompositeHealthIndicator

Endpoint Handler

DiscoveryHealthIndicator

ServiceRegistryHealthIndicator

LoadBalancerClient

DiscoveryClientServiceInstanceListSupplier

2.2 健康检查端点

GET /actuator/health
GET /actuator/health/liveness
GET /actuator/health/readiness

响应示例:

{
  "status": "UP",
  "components": {
    "discovery": {
      "status": "UP",
      "details": {
        "clients": {
          "consul": {
            "status": "UP"
          }
        }
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 500000000000,
        "free": 450000000000,
        "threshold": 10485760
      }
    },
    "ping": {
      "status": "UP"
    }
  }
}

三、核心接口解析

3.1 HealthIndicator 接口

public interface HealthIndicator {
    /**
     * 返回健康状态
     * @return 健康信息
     */
    Health health();
}

这是一个函数式接口,所有健康检查器都需要实现。

3.2 Health 响应模型

public class Health {
    // 健康状态
    private Status status;
    
    // 详细信息
    private Map<String, Object> details;
    
    // 枚举状态
    public enum Status {
        UP,           // 健康
        DOWN,         // 故障
        OUT_OF_SERVICE, // 停服
        UNKNOWN,      // 未知
        UPGRADED      // 升级中
    }
    
    // Builder 模式构建
    public static Builder status(Status status) {
        return new Builder(status);
    }
}

3.3 ReactiveHealthIndicator 响应式接口

public interface ReactiveHealthIndicator {
    /**
     * 异步返回健康状态
     * @return 健康信息 Mono
     */
    Mono<Health> health();
    
    // 默认实现
    default Mono<Health> healthWithDefault(String name) {
        return health().onErrorResume(ex -> 
            Mono.just(Health.down(ex).withDetail("error", name).build())
        );
    }
}

四、聚合机制

4.1 CompositeHealthIndicator 组合模式

contains

«interface»

HealthIndicator

+health() : Health

CompositeHealthIndicator

-HealthAggregator aggregator

-Map<String, HealthIndicator> indicators

+health() : Health

«interface»

HealthAggregator

+aggregate(Map<String, Health>) : Health

«enumeration»

OrderedAggregators

MACRO_ORDERED // 宏排序

LOWER_MURDEROUS // 最差优先

FIRST_IS_CRITICAL // 第一优先

4.2 聚合策略实现

public class OrderedHealthAggregator implements HealthAggregator {
    
    @Override
    public Health aggregate(Map<String, Health> healths) {
        // 按优先级排序
        List<String> sortedKeys = healths.keySet().stream()
            .sorted(Comparator.comparingInt(this::getOrder))
            .collect(Collectors.toList());
        
        // 检查是否有 DOWN 或 OUT_OF_SERVICE
        for (String key : sortedKeys) {
            Health health = healths.get(key);
            if (health.getStatus() == Status.DOWN) {
                return build(Status.DOWN, healths, key);
            }
        }
        
        // 检查是否有 UNKNOWN
        for (String key : sortedKeys) {
            Health health = healths.get(key);
            if (health.getStatus() == Status.UNKNOWN) {
                return build(Status.UNKNOWN, healths, key);
            }
        }
        
        // 全部 UP
        return build(Status.UP, healths, null);
    }
    
    private int getOrder(String key) {
        // 获取排序顺序
        return Ordered.LOWEST_PRECEDENCE;
    }
}

4.3 聚合流程图

有 DOWN

有 UNKNOWN

有 OUT_OF_SERVICE

全部 UP

开始聚合

遍历健康指标

返回 DOWN

返回 UNKNOWN

返回 OUT_OF_SERVICE

返回 UP

构建 DOWN 状态

构建 UNKNOWN 状态

构建 OUT_OF_SERVICE 状态

构建 UP 状态

添加详细信息

五、内置健康检查器

5.1 内置指标一览

检查器 状态 说明
PingHealthIndicator ICMP Ping 检测
DiskSpaceHealthIndicator 磁盘空间检测
DataSourceHealthIndicator 数据库连接池
RedisHealthIndicator Redis 连接
RabbitHealthIndicator RabbitMQ 连接
MongoHealthIndicator MongoDB 连接
ElasticsearchHealthIndicator ES 集群状态
DiscoveryClientHealthIndicator 服务发现状态

5.2 DiscoveryClientHealthIndicator

public class DiscoveryClientHealthIndicator 
    implements HealthIndicator, Ordered {
    
    private final DiscoveryClient discoveryClient;
    private final ServiceRegistry<?> serviceRegistry;
    
    @Override
    public Health health() {
        try {
            // 注册中心查询
            String description = discoveryClient.description();
            
            // 尝试获取自身实例
            List<ServiceInstance> instances = 
                discoveryClient.getInstances("sc-cloud-discovery");
            
            if (instances.isEmpty()) {
                return Health.unknown()
                    .withDetail("error", "no instances found")
                    .build();
            }
            
            return Health.up()
                .withDetail("services", discoveryClient.getServices().size())
                .build();
                
        } catch (Exception e) {
            return Health.down(e)
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

5.3 ServiceRegistryHealthIndicator

public class ServiceRegistryHealthIndicator 
    implements HealthIndicator, Ordered {
    
    private final ServiceRegistry<?> serviceRegistry;
    
    @Override
    public Health health() {
        Registration registration = 
            serviceRegistry.getRegistration();
        
        if (registration == null) {
            return Health.unknown()
                .withDetail("status", "UNKNOWN")
                .withDetail("error", "No registration found")
                .build();
        }
        
        return Health.up()
            .withDetail("status", "REGISTERED")
            .withDetail("serviceId", registration.getServiceId())
            .withDetail("instanceId", registration.getInstanceId())
            .build();
    }
}

六、健康状态传播

6.1 注册中心健康检查

Health Check Service Provider Service Registry LoadBalancer Service Consumer Health Check Service Provider Service Registry LoadBalancer Service Consumer 请求 user-service 获取可用实例 查询健康状态 返回 UP/DOWN 过滤后的实例列表 路由请求 返回响应

6.2 Nacos 健康检查机制

// 客户端心跳续约
public class NacosAutoServiceRegistration 
    extends AbstractAutoServiceRegistration<Registration> {
    
    @Override
    public void start() {
        // 发送心跳
        namingService.registerInstance(
            registration.getServiceId(),
            registration.getHost(),
            registration.getPort(),
            registration.getMetadata()
        );
    }
}

Server

Client

心跳

存储

检查

实例状态

心跳线程

NamingService

Cluster Manager

Health Check

6.3 Consul 健康检查

spring:
  cloud:
    consul:
      discovery:
        healthCheckPath: /actuator/health
        healthCheckInterval: 10s
        deregister: true
        fail-fast: true

七、自定义健康检查器

7.1 实现 HealthIndicator

@Component
public class CustomServiceHealthIndicator implements HealthIndicator {
    
    @Override
    public Health health() {
        try {
            // 自定义检查逻辑
            boolean healthy = checkCustomService();
            
            if (healthy) {
                return Health.up()
                    .withDetail("service", "custom-service")
                    .withDetail("version", "1.0")
                    .build();
            } else {
                return Health.down()
                    .withDetail("service", "custom-service")
                    .withDetail("reason", "service unavailable")
                    .build();
            }
        } catch (Exception e) {
            return Health.down(e)
                .withDetail("service", "custom-service")
                .build();
        }
    }
    
    private boolean checkCustomService() {
        // 实现健康检查逻辑
        return true;
    }
}

7.2 实现 ReactiveHealthIndicator

@Component
public class CustomReactiveHealthIndicator 
    implements ReactiveHealthIndicator {
    
    private final WebClient webClient;
    
    @Override
    public Mono<Health> health() {
        return webClient.get()
            .uri("/health")
            .retrieve()
            .bodyToMono(JsonNode.class)
            .map(response -> Health.up()
                .withDetail("custom", "OK")
                .build())
            .onErrorResume(ex -> 
                Mono.just(Health.down(ex).build())
            );
    }
}

7.3 带依赖检查的健康检查器

@Component
public class DependencyHealthIndicator implements HealthIndicator {
    
    @Autowired
    private DataSource dataSource;
    
    @Autowired
    private RestTemplate restTemplate;
    
    @Override
    public Health health() {
        Health.Builder builder = new Health.Builder();
        
        // 检查数据库
        try {
            jdbcTemplate.execute("SELECT 1");
            builder.up().withDetail("database", "OK");
        } catch (Exception e) {
            builder.down().withDetail("database", e.getMessage());
        }
        
        // 检查外部服务
        try {
            restTemplate.getForObject(
                "http://external-service/health", 
                String.class
            );
            builder.withDetail("external", "OK");
        } catch (Exception e) {
            builder.withDetail("external", e.getMessage());
        }
        
        return builder.build();
    }
}

八、实战演示

8.1 配置健康检查端点

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics
      base-path: /actuator
  endpoint:
    health:
      show-details: always
      probes:
        enabled: true
  health:
    livenessState:
      enabled: true
    readinessState:
      enabled: true

8.2 自定义聚合策略

@Configuration
public class CustomHealthAggregatorConfig {
    
    @Bean
    public HealthAggregator customHealthAggregator() {
        return new CustomHealthAggregator();
    }
}

public class CustomHealthAggregator implements HealthAggregator {
    
    @Override
    public Health aggregate(Map<String, Health> healths) {
        // 自定义聚合逻辑:计算平均分
        int totalHealth = 0;
        int count = 0;
        
        for (Map.Entry<String, Health> entry : healths.entrySet()) {
            Status status = entry.getValue().getStatus();
            totalHealth += getScore(status);
            count++;
        }
        
        double avgScore = count > 0 ? totalHealth / (double) count : 0;
        
        Status overallStatus = avgScore >= 80 ? Status.UP :
                               avgScore >= 50 ? Status.UNKNOWN : Status.DOWN;
        
        return Health.status(overallStatus)
            .withDetail("averageScore", avgScore)
            .withDetail("healths", healths)
            .build();
    }
    
    private int getScore(Status status) {
        return switch (status) {
            case UP -> 100;
            case UPGRADED -> 90;
            case UNKNOWN -> 50;
            case OUT_OF_SERVICE -> 20;
            case DOWN -> 0;
        };
    }
}

8.3 Kubernetes Probes 集成

# application.yml
spring:
  cloud:
    kubernetes:
      discovery:
        health-endpoint: /actuator/health/liveness
        
management:
  health:
    livenessState:
      enabled: true
    readinessState:
      enabled: true
// 自定义 Liveness 和 Readiness
@Component
public class KubernetesHealthIndicator 
    implements LivenessStateHealthIndicator,
               ReadinessStateHealthIndicator {
    
    @Override
    public Health health(boolean status) {
        // 返回 Liveness 健康状态
        return status ? Health.up() : Health.down();
    }
}

九、避坑指南

坑 1:健康检查影响应用性能

问题:频繁的健康检查导致性能下降

解决

# 调整检查间隔
spring:
  cloud:
    consul:
      discovery:
        healthCheckInterval: 30s  # 延长检查间隔

坑 2:注册中心心跳超时导致实例被剔除

问题:网络抖动导致实例被误删

解决

spring:
  cloud:
    nacos:
      discovery:
        heartBeatInterval: 5000
        heartBeatTimeout: 15000
        ipDeleteTimeout: 30000

坑 3:@RefreshScope 下健康检查器未刷新

问题:配置更新后健康检查器状态未变

原因:HealthIndicator 默认是单例

解决

@RefreshScope
@Component
public class RefreshableHealthIndicator implements HealthIndicator {
    // 每次健康检查都会重新创建
}

坑 4:多注册中心时健康检查冲突

问题:同时使用 Nacos 和 Consul 导致冲突

解决

spring:
  cloud:
    nacos:
      discovery:
        enabled: true
    consul:
      discovery:
        enabled: false  # 禁用 Consul

坑 5:自定义 HealthIndicator 未注册

问题:自定义健康检查器不生效

原因:Bean 未被扫描到或未实现正确接口

解决

// 确保添加 @Component
@Component
public class MyHealthIndicator implements HealthIndicator {
    @Override
    public Health health() {
        return Health.up().build();
    }
}

// 或者显式注册
@Configuration
public class HealthConfig {
    @Bean
    public HealthIndicator myHealthIndicator() {
        return new MyHealthIndicator();
    }
}

十、Kubernetes 探针集成

10.1 三种探针类型

探针 用途 失败后果
LivenessProbe 容器是否存活 重启容器
ReadinessProbe 容器是否就绪 移除流量
StartupProbe 启动完成检查 启动期间禁用其他探针

10.2 Spring Boot Actuator 探针配置

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
      base-path: /actuator
  endpoint:
    health:
      # 显示详细健康信息
      show-details: always
      # 启用 K8s 探针
      probes:
        enabled: true
      # 自定义探针分组
      group:
        liveness:
          include: livenessState,db,redis
        readiness:
          include: readinessState,db,redis,consul
  health:
    # Liveness 探针配置
    livenessState:
      enabled: true
    # Readiness 探针配置
    readinessState:
      enabled: true

10.3 Kubernetes Pod 配置

apiVersion: v1
kind: Pod
metadata:
  name: user-service
spec:
  containers:
    - name: user-service
      image: user-service:latest
      ports:
        - containerPort: 8080
      livenessProbe:
        httpGet:
          path: /actuator/health/liveness
          port: 8080
        initialDelaySeconds: 60
        periodSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /actuator/health/readiness
          port: 8080
        initialDelaySeconds: 30
        periodSeconds: 5
        failureThreshold: 3
      startupProbe:
        httpGet:
          path: /actuator/health
          port: 8080
        failureThreshold: 30
        periodSeconds: 10

10.4 探针执行流程

Application Spring Actuator Kubernetes Application Spring Actuator Kubernetes Pod 启动 Startup 完成后 alt [Liveness 失败] alt [Readiness 失败] StartupProbe 检查 获取应用健康状态 返回 UP/DOWN 继续检查 LivenessProbe 检查 ReadinessProbe 检查 重启 Pod 从 Service 移除

10.5 自定义探针分组

@Configuration
public class HealthGroupConfig {
    
    @Bean
    public HealthIndicator customLivenessIndicator() {
        return () -> {
            // 自定义 Liveness 检查逻辑
            boolean healthy = checkCriticalResources();
            return healthy ? 
                Health.up().build() : 
                Health.down().build();
        };
    }
    
    @Bean
    public HealthIndicator customReadinessIndicator() {
        return () -> {
            // 自定义 Readiness 检查逻辑
            boolean ready = checkDependencies();
            return ready ? 
                Health.up().build() : 
                Health.down().build();
        };
    }
    
    private boolean checkCriticalResources() {
        // 检查关键资源
        return true;
    }
    
    private boolean checkDependencies() {
        // 检查依赖服务
        return true;
    }
}

10.6 Nacos 健康检查集成

spring:
  cloud:
    nacos:
      discovery:
        # 启用健康检查
        health-check:
          enabled: true
          # 健康检查路径
          path: /actuator/health
          # 检查间隔
          interval: 5000
          # 失败次数阈值
          failure-threshold: 3

10.7 注册中心与 K8s 探针联动

Spring Cloud

K8s

Liveness

Readiness

实例列表

只读流量

Kubernetes Probes

Kubernetes Service

Actuator Health

Service Registry

十一、总结

Spring Cloud 健康检查机制是保障服务可用性的关键组件:

维度 内容
核心接口 HealthIndicator、ReactiveHealthIndicator
聚合机制 CompositeHealthIndicator + HealthAggregator
内置检查器 DiskSpace、DataSource、Redis、Consul、Nacos
状态传播 注册中心 → 负载均衡 → 请求路由
自定义扩展 实现接口或继承抽象类
K8s 探针 Liveness + Readiness + Startup Probe
探针分组 livenessState + readinessState 自定义

健康检查不仅是运维监控的基础,更是服务网格、弹性伸缩、故障剔除等高级特性的前提。深入理解其原理,才能构建真正可靠的服务治理体系。

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐