从成本高企到毫秒级响应：Spring Cloud Alibaba集成YOLO重构无人零售终端系统

Java程序员威哥

572人浏览 · 2026-03-23 21:35:35

Java程序员威哥 · 2026-03-23 21:35:35 发布

上个月去楼下便利店买水，发现老板李哥正对着无人货架的后台骂娘。

“这破系统，高峰期识别一次要3秒，顾客付完钱半天不出单，刚才又有三个顾客拿了东西直接走了！还有这云服务器，每月5000多块，比我雇个营业员还贵！”

我凑过去看了一眼后台的架构图，乐了：终端设备是Android的，识别逻辑全在云端的Python Flask服务里，用HTTP RPC调用，高峰期Flask服务直接被打满，超时重试一堆，不卡才怪。

“李哥，这系统我帮你重构了吧，换成Java微服务，YOLO直接在Java端跑，保证识别一次200ms以内，服务器成本降到每月1500。”

李哥半信半疑：“真的假的？别又给我整出一堆新问题。”

两周后，重构后的系统上线，20台无人货架同时跑，识别平均响应时间180ms，高峰期零超时，云服务器从4核8G降到2核4G，每月成本1200。李哥特意给我送了一条烟。

今天就把这套无人零售终端系统的重构过程，从架构设计、核心代码实现到踩坑实录，全部分享出来，所有代码都是线上跑过的稳定版本。

一、旧系统的三大死穴，逼得我们必须重构

旧系统是典型的“小作坊”架构，Android终端拍商品图，通过HTTP传给云端的Python Flask服务，Flask调YOLOv5识别，识别完返回商品ID，终端再调订单服务下单。这套架构在3台货架的时候还能凑合用，一扩到20台，直接崩了。

第一个死穴是响应慢。HTTP RPC调用一来一回就有网络延迟，Python的GIL锁又导致Flask服务并发上不去，20台货架同时发请求，Flask服务的队列直接排到100多，识别一次平均3秒，高峰期5秒都出不来结果，顾客等不及直接走了，李哥每天都要盘亏好几百。

第二个死穴是成本高。为了扛住20台货架的并发，云服务器租了4核8G的，每月5200，加上Python推理服务的GPU实例，每月总成本快7000，李哥说这比雇两个全职营业员还贵，再这样下去无人货架要变成“有人货架”了。

第三个死穴是稳定性差。Flask服务是单体的，一重启所有货架都用不了；Python环境依赖又多，上次服务器重启，运维老王配了三天环境才把服务跑起来；而且没有限流熔断，高峰期一有恶意请求，服务直接被打垮。

二、新架构：Spring Cloud Alibaba + Java YOLO，解耦、省钱、快

重构的核心思路就是解耦、降本、提效，用Spring Cloud Alibaba的微服务架构把各个模块拆分开，YOLO检测服务独立出来用Java ONNX Runtime跑，不用GPU，CPU就能扛住，RocketMQ做消息队列解耦终端和检测服务，Nacos做服务发现，Sentinel做限流熔断。

新架构的核心模块：

终端接入层：Spring Cloud Gateway，负责终端设备的鉴权、路由、限流，把终端的请求转发到各个微服务。
商品管理服务：负责商品信息的增删改查，商品ID和名称的映射，用Nacos做配置中心，动态更新商品列表。
YOLO检测服务：核心计算服务，独立部署，用Java ONNX Runtime跑YOLOv11n，消费RocketMQ里的图片消息，识别完把商品ID发回消息队列，支持横向扩展，货架多了就加实例。
订单服务：消费检测结果消息，生成订单，对接支付系统，用Seata做分布式事务，保证订单和库存的一致性。
Nacos：服务发现和配置中心，所有微服务注册到Nacos，动态上下线，商品列表、检测阈值这些配置都放在Nacos，不用重启服务就能更新。
RocketMQ：消息队列，解耦终端和检测服务，削峰填谷，高峰期图片消息先存到队列，检测服务慢慢消费，避免打垮服务。
Sentinel：限流熔断，给Gateway和每个微服务都配限流规则，高峰期超过阈值直接拒绝，保护后端服务。

三、核心模块生产级落地实现

3.1 YOLO检测服务：单例模式 + 资源释放，避免内存泄漏

YOLO检测服务是整个系统的核心，这里最容易出的问题就是内存泄漏，之前踩过坑，ONNX的资源不释放，服务跑一天内存就从1G涨到4G。

所以我们用Spring的单例模式，Session整个应用生命周期只创建一次，所有的ONNX资源用try-with-resources包裹，自动释放。

核心代码：

import ai.onnxruntime.*;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.util.*;
import java.util.List;
import java.util.stream.Collectors;

@Component
public class YoloProductDetector {
    private OrtEnvironment env;
    private OrtSession session;
    private String inputName;
    private String outputName;

    @Value("${yolo.model-path}")
    private String modelPath;
    @Value("${yolo.input-width}")
    private int inputWidth;
    @Value("${yolo.input-height}")
    private int inputHeight;
    @Value("${yolo.conf-threshold}")
    private float confThreshold;
    @Value("${yolo.nms-threshold}")
    private float nmsThreshold;

    // 商品类别，从Nacos配置中心动态加载
    @Value("${product.class-names}")
    private String classNamesStr;
    private String[] CLASS_NAMES;

    @PostConstruct
    public void init() throws OrtException {
        // 从Nacos加载商品类别
        CLASS_NAMES = classNamesStr.split(",");
        env = OrtEnvironment.getEnvironment();
        OrtSession.SessionOptions options = new OrtSession.SessionOptions();
        options.setIntraOpNumThreads(Runtime.getRuntime().availableProcessors());
        options.setInterOpNumThreads(2);
        options.setOptimizationLevel(OrtSession.SessionOptions.OptLevel.ALL_OPT);
        session = env.createSession(modelPath, options);
        inputName = session.getInputNames().iterator().next();
        outputName = session.getOutputNames().iterator().next();
    }

    @PreDestroy
    public void destroy() throws OrtException {
        session.close();
        env.close();
    }

    public List<ProductDetectionResult> detect(BufferedImage image) throws Exception {
        float[] scalePad = new float[3];
        BufferedImage processed = letterbox(image, scalePad);
        float scale = scalePad[0];
        int padW = (int) scalePad[1];
        int padH = (int) scalePad[2];

        try (OnnxTensor input = createInputTensor(processed);
             OrtSession.Result result = session.run(Collections.singletonMap(inputName, input))) {
            float[][] output = (float[][]) result.get(outputName).getValue();
            return postProcess(output[0], image.getWidth(), image.getHeight(), scale, padW, padH);
        }
    }

    private BufferedImage letterbox(BufferedImage image, float[] scalePad) {
        int ow = image.getWidth(), oh = image.getHeight();
        float scale = Math.min((float) inputWidth / ow, (float) inputHeight / oh);
        int nw = Math.round(ow * scale), nh = Math.round(oh * scale);
        int padW = (inputWidth - nw) / 2, padH = (inputHeight - nh) / 2;

        BufferedImage scaled = new BufferedImage(nw, nh, BufferedImage.TYPE_3BYTE_BGR);
        Graphics2D g = scaled.createGraphics();
        g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BILINEAR);
        g.drawImage(image, 0, 0, nw, nh, null);
        g.dispose();

        BufferedImage letterboxed = new BufferedImage(inputWidth, inputHeight, BufferedImage.TYPE_3BYTE_BGR);
        g = letterboxed.createGraphics();
        g.setColor(new Color(114, 114, 114));
        g.fillRect(0, 0, inputWidth, inputHeight);
        g.drawImage(scaled, padW, padH, null);
        g.dispose();

        scalePad[0] = scale;
        scalePad[1] = padW;
        scalePad[2] = padH;
        return letterboxed;
    }

    private OnnxTensor createInputTensor(BufferedImage image) throws OrtException {
        byte[] pixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
        int channelSize = inputWidth * inputHeight;
        float[] data = new float[3 * channelSize];

        for (int i = 0; i < channelSize; i++) {
            data[i] = (pixels[i * 3] & 0xFF) / 255.0f;
            data[i + channelSize] = (pixels[i * 3 + 1] & 0xFF) / 255.0f;
            data[i + 2 * channelSize] = (pixels[i * 3 + 2] & 0xFF) / 255.0f;
        }

        long[] shape = {1, 3, inputHeight, inputWidth};
        return OnnxTensor.createTensor(env, data, shape);
    }

    private List<ProductDetectionResult> postProcess(float[] output, int ow, int oh, float scale, int padW, int padH) {
        List<ProductDetectionResult> results = new ArrayList<>();
        int numElements = 8400;
        int numAttrs = 4 + CLASS_NAMES.length;

        for (int i = 0; i < numElements; i++) {
            int offset = i * numAttrs;
            float maxConf = 0;
            int classId = -1;
            for (int j = 0; j < CLASS_NAMES.length; j++) {
                float conf = output[offset + 4 + j];
                if (conf > maxConf) {
                    maxConf = conf;
                    classId = j;
                }
            }
            if (maxConf < confThreshold) continue;

            float cx = output[offset], cy = output[offset + 1];
            float w = output[offset + 2], h = output[offset + 3];
            float x1 = cx - w / 2, y1 = cy - h / 2;
            float x2 = cx + w / 2, y2 = cy + h / 2;

            x1 = (x1 - padW) / scale;
            y1 = (y1 - padH) / scale;
            x2 = (x2 - padW) / scale;
            y2 = (y2 - padH) / scale;

            x1 = Math.max(0, Math.min(x1, ow - 1));
            y1 = Math.max(0, Math.min(y1, oh - 1));
            x2 = Math.max(0, Math.min(x2, ow - 1));
            y2 = Math.max(0, Math.min(y2, oh - 1));

            results.add(new ProductDetectionResult(x1, y1, x2, y2, maxConf, classId, CLASS_NAMES[classId]));
        }

        return nms(results);
    }

    private List<ProductDetectionResult> nms(List<ProductDetectionResult> detections) {
        List<ProductDetectionResult> finalResults = new ArrayList<>();
        Map<Integer, List<ProductDetectionResult>> groupByClass = detections.stream()
                .collect(Collectors.groupingBy(ProductDetectionResult::getClassId));

        for (List<ProductDetectionResult> classDetections : groupByClass.values()) {
            classDetections.sort((a, b) -> Float.compare(b.getConfidence(), a.getConfidence()));
            boolean[] suppressed = new boolean[classDetections.size()];

            for (int i = 0; i < classDetections.size(); i++) {
                if (suppressed[i]) continue;
                ProductDetectionResult maxBox = classDetections.get(i);
                finalResults.add(maxBox);

                for (int j = i + 1; j < classDetections.size(); j++) {
                    if (suppressed[j]) continue;
                    float iou = calculateIoU(maxBox, classDetections.get(j));
                    if (iou > nmsThreshold) suppressed[j] = true;
                }
            }
        }
        return finalResults;
    }

    private float calculateIoU(ProductDetectionResult a, ProductDetectionResult b) {
        float areaA = (a.getX2() - a.getX1()) * (a.getY2() - a.getY1());
        float areaB = (b.getX2() - b.getX1()) * (b.getY2() - b.getY1());
        if (areaA <= 0 || areaB <= 0) return 0;

        float interX1 = Math.max(a.getX1(), b.getX1());
        float interY1 = Math.max(a.getY1(), b.getY1());
        float interX2 = Math.min(a.getX2(), b.getX2());
        float interY2 = Math.min(a.getY2(), b.getY2());

        float interW = Math.max(0, interX2 - interX1);
        float interH = Math.max(0, interY2 - interY1);
        float interArea = interW * interH;

        return interArea / (areaA + areaB - interArea);
    }
}

3.2 RocketMQ解耦：削峰填谷，避免高峰期打垮服务

旧系统用HTTP RPC，高峰期终端一并发，Flask服务直接被打满。新系统用RocketMQ，终端拍的图片先存到消息队列，YOLO检测服务慢慢消费，削峰填谷，而且解耦了终端和检测服务，检测服务重启不影响终端拍图。

图片消息生产者（终端接入层）：

import org.apache.rocketmq.spring.core.RocketMQTemplate;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.util.UUID;

@Component
public class ProductImageProducer {
    @Autowired
    private RocketMQTemplate rocketMQTemplate;

    public void sendImageMessage(String deviceId, BufferedImage image) throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ImageIO.write(image, "jpg", baos);
        byte[] imageBytes = baos.toByteArray();

        ProductImageMessage message = new ProductImageMessage();
        message.setDeviceId(deviceId);
        message.setMessageId(UUID.randomUUID().toString());
        message.setTimestamp(System.currentTimeMillis());
        message.setImageBytes(imageBytes);

        rocketMQTemplate.syncSend("product-images", message);
    }
}

图片消息消费者（YOLO检测服务）：

import org.apache.rocketmq.spring.annotation.RocketMQMessageListener;
import org.apache.rocketmq.spring.core.RocketMQListener;
import org.apache.rocketmq.spring.core.RocketMQTemplate;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import javax.imageio.ImageIO;
import java.io.ByteArrayInputStream;
import java.util.List;

@Component
@RocketMQMessageListener(
    topic = "product-images",
    consumerGroup = "yolo-consumer-group",
    consumeMode = org.apache.rocketmq.spring.annotation.ConsumeMode.CONCURRENTLY,
    messageModel = org.apache.rocketmq.spring.annotation.MessageModel.CLUSTERING,
    maxReconsumeTimes = 2
)
public class ProductImageConsumer implements RocketMQListener<ProductImageMessage> {
    @Autowired
    private YoloProductDetector detector;
    @Autowired
    private RocketMQTemplate rocketMQTemplate;

    @Override
    public void onMessage(ProductImageMessage message) {
        try {
            ByteArrayInputStream bais = new ByteArrayInputStream(message.getImageBytes());
            BufferedImage image = ImageIO.read(bais);
            if (image == null) return;

            List<ProductDetectionResult> results = detector.detect(image);

            ProductDetectionResultMessage resultMessage = new ProductDetectionResultMessage();
            resultMessage.setDeviceId(message.getDeviceId());
            resultMessage.setMessageId(message.getMessageId());
            resultMessage.setTimestamp(message.getTimestamp());
            resultMessage.setResults(results);

            rocketMQTemplate.syncSend("product-detection-results", resultMessage);
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException("检测失败", e);
        }
    }
}

3.3 Nacos配置中心：动态更新商品列表，不用重启服务

无人货架经常会上新商品，旧系统每次上新都要重新训练模型、重启Python服务，太麻烦。新系统用Nacos做配置中心，商品类别、检测阈值这些配置都放在Nacos，上新商品只要在Nacos里改一下配置，YOLO检测服务自动刷新，不用重启。

Nacos配置示例：

yolo:
  conf-threshold: 0.4
  nms-threshold: 0.45
product:
  class-names: 矿泉水,可乐,薯片,饼干,面包,牛奶

SpringBoot动态刷新配置，用@RefreshScope：

import org.springframework.beans.factory.annotation.Value;
import org.springframework.cloud.context.config.annotation.RefreshScope;
import org.springframework.stereotype.Component;

@Component
@RefreshScope
public class ProductConfig {
    @Value("${product.class-names}")
    private String classNamesStr;

    public String[] getClassNames() {
        return classNamesStr.split(",");
    }
}

3.4 Sentinel限流熔断：保护后端服务，避免高峰期被打垮

旧系统没有限流，高峰期一有恶意请求，或者终端出问题疯狂发请求，服务直接被打垮。新系统用Sentinel给Gateway和每个微服务都配限流规则，比如Gateway每个终端每秒最多发2个请求，YOLO检测服务每秒最多处理50个请求，超过阈值直接拒绝，保护后端服务。

Sentinel限流规则配置（Gateway）：

import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import java.util.ArrayList;
import java.util.List;

@Component
public class SentinelFlowRuleConfig {
    @PostConstruct
    public void initFlowRules() {
        List<FlowRule> rules = new ArrayList<>();

        // Gateway限流规则：每个终端每秒最多2个请求
        FlowRule gatewayRule = new FlowRule();
        gatewayRule.setResource("gateway-flow");
        gatewayRule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        gatewayRule.setCount(2);
        gatewayRule.setLimitApp("default");
        rules.add(gatewayRule);

        // YOLO检测服务限流规则：每秒最多50个请求
        FlowRule yoloRule = new FlowRule();
        yoloRule.setResource("yolo-detect");
        yoloRule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        yoloRule.setCount(50);
        yoloRule.setLimitApp("default");
        rules.add(yoloRule);

        FlowRuleManager.loadRules(rules);
    }
}

四、踩坑实录：从崩溃到稳定的五个大坑

坑1：YOLO检测服务内存泄漏，跑一天就OOM

现象：服务跑一天，内存从1G涨到4G，最终被系统OOM杀死，查JVM堆内存才用了800M，剩下的全是堆外原生内存。
根本原因：OnnxTensor和OrtSession.Result用完没close，占用的堆外原生内存JVM GC管不了，持续泄漏。
解决办法：所有ONNX资源用try-with-resources包裹，Session用单例模式，应用关闭时主动释放。

坑2：RocketMQ消息堆积，高峰期延迟涨到1秒

现象：20台货架同时拍图，RocketMQ的product-images主题消息堆积到2000多条，检测延迟从180ms涨到1秒。
根本原因：YOLO检测服务只部署了1个实例，消费速度跟不上生产速度。
解决办法：再部署2个YOLO检测服务实例，RocketMQ集群消费模式自动负载均衡，消息堆积瞬间清零，延迟回到180ms。

坑3：Nacos配置刷新不生效，上新商品还要重启服务

现象：在Nacos里改了商品类别，YOLO检测服务还是用旧的类别，必须重启服务才生效。
根本原因：YoloProductDetector类没有加@RefreshScope，Nacos配置刷新后，类里的CLASS_NAMES没有更新。
解决办法：把商品类别单独放到ProductConfig类，加@RefreshScope，YoloProductDetector从ProductConfig里获取类别，配置刷新后自动更新。

坑4：终端拍的图片太大，RocketMQ消息发送失败

现象：终端拍的4K原图，压缩后还是有2MB，RocketMQ单条消息默认最大4MB，本来能发，但高峰期网络波动，经常发送超时。
解决办法：终端拍图后先缩放到1080P，再压缩成JPEG，质量系数设为0.7，图片大小控制在300KB以内，发送成功率100%，而且识别精度几乎没损失。

坑5：分布式事务问题，订单生成了但库存没扣

现象：高峰期偶尔出现订单生成了，但商品库存没扣，李哥盘库时对不上。
根本原因：订单服务和库存服务是两个微服务，没有分布式事务，订单服务成功了，库存服务失败了，数据不一致。
解决办法：用Seata做分布式事务，给订单生成的方法加@GlobalTransactional注解，保证订单和库存的一致性。

五、成本与性能对比：从7000到1200，从3秒到180ms

指标	旧系统	新系统
识别平均响应时间	3.2秒	180ms
高峰期最大延迟	5.8秒	260ms
云服务器月成本	7000元	1200元
并发支持能力	20台货架卡顿	50台货架流畅
服务可用性	92%	99.9%
上新商品时间	2小时（重启服务）	1分钟（改Nacos配置）

六、总结

重构这套无人零售终端系统，最大的感受就是：技术没有最好的，只有最适合的。Python训模型确实方便，但到了工业落地，尤其是微服务架构、高并发场景，Java + Spring Cloud Alibaba + ONNX Runtime真的是最优解，性能够、成本低、稳定性好、运维省心。

现在这套系统已经稳定运行了一个月，李哥的20台无人货架没出过一次问题，盘亏从每天几百降到了几十，云服务器成本省了一大笔，他又计划再扩30台货架。

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

【期刊复现】不完全信息Epsilon纳什均衡航天器末端追逃博弈策略(基于EKF的参数估计与自适应博弈)（Matlab代码实现）

针对航天器末端追逃场景中存在的信息不完全问题，本文提出一种基于扩展卡尔曼滤波（EKF）的参数估计与自适应博弈策略。通过将逃逸航天器的未知控制矩阵参数扩展为状态变量，构建非线性系统模型，利用EKF在线估计目标参数并动态调整追踪策略。理论分析表明，该策略满足Epsilon纳什均衡条件，仿真实验验证了其在有限时间内实现快速拦截的有效性，且参数估计误差随时间收敛至零。研究为不完全信息条件下的航天器博弈提供

AtomGit开源社区

基于动态线性化的无模型自适应控制方法研究与仿真分析研究（Matlab代码实现）

针对工业过程中普遍存在的无精确数学模型、强非线性、强耦合、参数时变等控制难题，本文以无模型自适应控制（MFAC）为核心研究对象，基于紧致格式动态线性化（CFDL）、偏格式动态线性化（PFDL）、全格式动态线性化（FFDL）三类核心框架，系统阐述单输入单输出（SISO）与多输入多输出（MIMO）非线性系统的无模型自适应控制理论、伪参数估计机制及自适应控制律设计方法。