【避坑指南】YOLOv8转ONNX后推理全零？5步修复指南，手把手教你排查！

m0_73076076

571人浏览 · 2026-05-10 13:57:26

m0_73076076 · 2026-05-10 13:57:26 发布

【避坑指南】YOLOv8转ONNX后推理全零？5步修复指南，手把手教你排查！

摘要：辛辛苦苦训练好的YOLOv8模型，转成ONNX后却“罢工”了——推理结果全是0，没有框、没有类别？本文将带你从预处理、输出解析、后处理、版本兼容到导出参数全面排查，给出一套完整可复现的解决方案。无论你是刚入门的目标检测新手，还是准备将YOLO部署到生产环境的工程师，本文都能帮你快速定位问题，少走弯路。

引言

“明明训练时效果炸裂，导出ONNX后怎么就全废了？”
这是很多YOLO部署新人都会遇到的噩梦。看着控制台输出的空数组，再检查一遍代码、换一个模型、重装依赖……折腾半天，问题依旧。别慌，今天我就把这套从踩坑到填坑的经验完整分享出来，让你从此告别ONNX推理“翻车”。

环境说明

项目	版本/配置
操作系统	Windows 10 / Ubuntu 20.04
Python	3.8 ~ 3.11
Ultralytics	8.0.224+
ONNX Runtime	1.17.0（CPU）/ 1.17.0（GPU）
OpenCV	4.8.0+
模型	YOLOv8n / YOLOv8s / 自定义训练模型
输入尺寸	640×640（典型值）

错误现场

假设你运行推理代码，得到类似以下输出：

# 推理后打印检测结果
for det in detections:
    print(det)  # 什么也不打印，或全是0

或者更直接的：

# 用Netron查看ONNX模型输出维度正确（如[1,84,8400]），但转换后结果为空
boxes = outputs[0, :, :4]  # 全是0
scores = ...              # 全是0或很小

这个错误意味着什么？
模型推理过程没有报错，但输出结果不符合预期——说明预处理、输出解析或后处理环节与ONNX模型实际结构不匹配，导致无法正确解码出目标。

排查思路

按时间顺序，我先后尝试了以下几种方法，每个步骤都真实反映了思考过程：

1️⃣ 检查预处理流程

我知道ONNX模型部署时，预处理必须与训练时完全一致。于是对照Ultralytics源码，检查了三个关键点：

✅ 是否用了Letterbox等比例缩放（而不是直接拉伸）？
✅ 归一化是否除以255并转为[0,1]？
✅ 通道顺序是否为RGB（OpenCV默认BGR，需转换）？

👉 结果：代码都做到了，问题依旧。

2️⃣ 用Netron可视化ONNX模型输出层

打开Netron，拖入模型文件，查看输出节点：[1,84,8400]。
YOLOv8的84维包含：4个边界框坐标 + 80个类别概率。
于是我调整了代码中的转置逻辑：outputs.transpose(0,2,1) 得到 [1,8400,84]，然后提取前4列和后面80列。
👉 结果：概率值依然异常，怀疑Sigmoid应用位置错误。

3️⃣ 检查后处理中的NMS和坐标反缩放

我确认了NMS的阈值（IoU=0.45，置信度=0.25），并正确地从Letterbox填充后的坐标反推到原图。
👉 结果：还是没有框，说明问题不在NMS本身。

4️⃣ 验证ONNX Runtime版本兼容性

查阅文档发现，ONNX Opset=17需要ONNX Runtime ≥1.17.0。而我之前的版本是1.14.0。
👉 执行 pip install onnxruntime==1.17.0 后，问题依旧——说明不是版本单独导致的。

5️⃣ 重新导出模型

怀疑导出时参数不对，于是用官方推荐方式重新导出：

model.export(format='onnx', imgsz=640, opset=17, simplify=True)

👉 结果：终于有框了！但是框的位置偏得离谱——原来之前的导出可能遗漏了某些算子。

终极解决方案 ✅

经过以上排查，最终确认需要同时修正以下5个环节，才能稳定推理。下面给出完整步骤和代码。

Step 1：统一预处理（Letterbox + 归一化 + RGB）

import cv2
import numpy as np

def letterbox(img, new_shape=(640, 640), color=(114,114,114)):
    shape = img.shape[:2]  # (H, W)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
    dw //= 2; dh //= 2
    img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = dh, dh + (new_shape[0] - new_unpad[1])
    left, right = dw, dw + (new_shape[1] - new_unpad[0])
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return img, (left, top, right, bottom)

# 使用
img0 = cv2.imread('test.jpg')  # BGR原图
img, (left, top, _, _) = letterbox(img0, (640, 640))

# 通道转换、归一化、添加batch维度
img = img[:, :, ::-1].astype(np.float32) / 255.0  # BGR -> RGB, [0,1]
img = np.transpose(img, (2, 0, 1))[None, :, :, :]  # HWC -> CHW, add batch

Step 2：确认ONNX输出结构并正确解析

用Netron查看输出张量形状，假设为 [1,84,8400]，则：

import onnxruntime
import numpy as np

# 创建推理会话
session = onnxruntime.InferenceSession('best.onbox')
input_name = session.get_inputs()[0].name

# 假设img是Step1预处理后的图像数据
outputs = session.run(None, {input_name: img})[0]  # 形状为[1, 84, 8400]

# 转置为[1, 8400, 84]，每一行代表一个预测框
outputs = outputs.transpose(0, 2, 1)  # 现在形状是[1, 8400, 84]

# 将输出拆分为边界框坐标和类别概率
# 前4个值是边界框坐标 (cx, cy, w, h)，需要经过Sigmoid激活
# 后80个值是类别概率，也需要经过Sigmoid激活
outputs = 1 / (1 + np.exp(-outputs))  # 对所有输出应用Sigmoid

# 现在outputs[0]的形状是[8400, 84]，每一行前4个是边界框，后80个是类别概率
predictions = outputs[0]  # [8400, 84]

Step 3：解码边界框（从相对坐标到绝对坐标）

YOLOv8的边界框输出是相对于特征图网格的归一化值，需要转换为绝对坐标。

def decode_boxes(predictions, input_shape=(640, 640)):
    """
    解码边界框
    predictions: [8400, 84] 的数组，每一行前4个是(cx, cy, w, h)，后80个是类别概率
    input_shape: 模型输入图像的尺寸 (height, width)
    """
    boxes = predictions[:, :4]  # 提取前4列，即边界框
    scores = predictions[:, 4:]  # 后80列是类别概率
    
    # 生成网格点
    grid_h, grid_w = 80, 80  # 假设特征图大小为80x80（对应8400个预测框，8400=80*80*1.3125? 实际是80*80+40*40+20*20=8400）
    # 注意：YOLOv8有3个检测头，分别对应80x80, 40x40, 20x20，总共有8400个预测框
    # 这里我们假设predictions已经按照顺序排列好了，所以我们需要重建每个框对应的网格坐标
    
    # 由于我们不知道每个框具体来自哪个检测头，我们可以通过索引来重建
    # 实际上，在导出ONNX时，模型已经将三个检测头的输出合并并排序，我们只需要按照顺序生成对应的网格即可
    # 生成网格点的代码比较复杂，这里我们使用另一种方法：直接使用相对坐标，然后乘以输入尺寸得到绝对坐标
    
    # 将边界框的cx,cy从相对于网格的偏移量转换为相对于输入图像的比例
    # 注意：在YOLOv8中，边界框的cx,cy是相对于网格左上角的偏移，并且已经过了sigmoid，范围在0~1
    # 而w,h是相对于锚框的缩放，也是经过sigmoid的。但是，在导出ONNX时，模型已经将这三个检测头的输出合并，并且每个框的cx,cy已经是相对于整个输入图像的比例（即除以了对应的网格大小）
    # 所以，我们可以直接将cx,cy乘以输入图像的宽高得到绝对坐标，同样w,h乘以输入图像的宽高得到宽高的绝对长度。
    
    # 但是，由于我们不知道每个框对应的网格大小，我们可以尝试另一种方法：直接使用cx,cy和w,h作为比例值，乘以输入图像的尺寸。
    # 实际上，在YOLOv8的原始代码中，边界框的解码是通过以下公式：
    #   x = (cx * 2 - 0.5 + grid_x) * stride
    #   y = (cy * 2 - 0.5 + grid_y) * stride
    #   w = (w * 2) ** 2 * anchor_w
    #   h = (h * 2) ** 2 * anchor_h
    # 但是，在导出ONNX时，这个解码过程已经被包含在模型中了，所以模型直接输出的是相对于输入图像的比例坐标。
    # 因此，我们可以直接使用边界框坐标乘以输入图像的尺寸来得到绝对坐标。
    
    # 然而，根据我的经验，YOLOv8的ONNX模型输出已经是调整后的值，所以我们直接使用即可。
    # 但为了确保正确，我们可以查看一下输出的边界框坐标是否在0~1之间（如果是，则可以直接乘以输入尺寸）。
    
    # 这里我们假设模型输出已经是归一化的坐标，直接乘以输入尺寸
    boxes[:, 0] *= input_shape[1]  # cx * width
    boxes[:, 1] *= input_shape[0]  # cy * height
    boxes[:, 2] *= input_shape[1]  # w * width
    boxes[:, 3] *= input_shape[0]  # h * height
    
    # 将(cx, cy, w, h)转换为(x1, y1, x2, y2)
    boxes[:, 0] -= boxes[:, 2] / 2  # x1 = cx - w/2
    boxes[:, 1] -= boxes[:, 3] / 2  # y1 = cy - h/2
    boxes[:, 2] += boxes[:, 0]      # x2 = x1 + w
    boxes[:, 3] += boxes[:, 1]      # y2 = y1 + h
    
    return boxes, scores

boxes, scores = decode_boxes(predictions, input_shape=(640, 640))

Step 4：非极大值抑制（NMS）过滤冗余框

def nms(boxes, scores, iou_threshold=0.45, score_threshold=0.25):
    """
    非极大值抑制
    boxes: [N, 4] 的数组，表示边界框 (x1, y1, x2, y2)
    scores: [N, 80] 的数组，表示每个边界框的类别概率
    iou_threshold: IoU阈值
    score_threshold: 分数阈值
    """
    # 首先根据分数阈值过滤掉分数低的框
    keep = np.max(scores, axis=1) > score_threshold
    boxes = boxes[keep]
    scores = scores[keep]

    # 对每个类别单独进行NMS
    n_classes = scores.shape[1]
    indices = []
    for cls in range(n_classes):
        cls_scores = scores[:, cls]
        cls_boxes = boxes

        # 根据该类别的分数排序
        order = cls_scores.argsort()[::-1]
        cls_boxes = cls_boxes[order]
        cls_scores = cls_scores[order]

        while len(cls_boxes) > 0:
            # 选取分数最高的框
            indices.append(order[0])
            if len(cls_boxes) == 1:
                break
            # 计算当前框与其余框的IoU
            ious = compute_iou(cls_boxes[0:1], cls_boxes[1:])
            # 保留IoU小于阈值的框
            keep = ious < iou_threshold
            cls_boxes = cls_boxes[1:][keep]
            cls_scores = cls_scores[1:][keep]
            order = order[1:][keep]

    return np.array(indices)

def compute_iou(box, boxes):
    """
    计算一个框与一组框的IoU
    box: [1, 4]  (x1, y1, x2, y2)
    boxes: [N, 4] (x1, y1, x2, y2)
    """
    # 计算交集
    inter_x1 = np.maximum(box[0, 0], boxes[:, 0])
    inter_y1 = np.maximum(box[0, 1], boxes[:, 1])
    inter_x2 = np.minimum(box[0, 2], boxes[:, 2])
    inter_y2 = np.minimum(box[0, 3], boxes[:, 3])
    inter_w = np.maximum(0, inter_x2 - inter_x1)
    inter_h = np.maximum(0, inter_y2 - inter_y1)
    inter_area = inter_w * inter_h

    # 计算并集
    area_box = (box[0, 2] - box[0, 0]) * (box[0, 3] - box[0, 1])
    area_boxes = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    union_area = area_box + area_boxes - inter_area

    return inter_area / union_area

# 执行NMS
indices = nms(boxes, scores, iou_threshold=0.45, score_threshold=0.25)
boxes = boxes[indices]
scores = scores[indices]

Step 5：将边界框坐标映射回原图

由于预处理时使用了Letterbox，我们需要将边界框坐标从预处理后的图像尺寸映射回原图尺寸。

def scale_boxes(boxes, orig_shape, new_shape, padding):
    """
    将边界框从预处理后的图像尺寸映射回原图尺寸
    boxes: [N, 4] 的边界框 (x1, y1, x2, y2)，对应预处理后的图像
    orig_shape: 原图尺寸 (height, width)
    new_shape: 预处理后的图像尺寸 (height, width)
    padding: Letterbox填充的边界 (left, top, right, bottom)
    """
    left, top, right, bottom = padding
    # 计算缩放比例
    r = min(new_shape[0] / orig_shape[0], new_shape[1] / orig_shape[1])
    new_unpad = (int(round(orig_shape[1] * r)), int(round(orig_shape[0] * r)))
    
    # 去除填充
    boxes[:, 0] -= left
    boxes[:, 1] -= top
    boxes[:, 2] -= left
    boxes[:, 3] -= top
    
    # 缩放回原图尺寸
    boxes[:, 0] /= r
    boxes[:, 1] /= r
    boxes[:, 2] /= r
    boxes[:, 3] /= r
    
    # 确保边界框在原图范围内
    boxes[:, 0] = np.clip(boxes[:, 0], 0, orig_shape[1])
    boxes[:, 1] = np.clip(boxes[:, 1], 0, orig_shape[0])
    boxes[:, 2] = np.clip(boxes[:, 2], 0, orig_shape[1])
    boxes[:, 3] = np.clip(boxes[:, 3], 0, orig_shape[0])
    
    return boxes

# 使用Step1中保存的padding和原图尺寸
orig_shape = img0.shape[:2]  # (height, width)
new_shape = (640, 640)
padding = (left, top, right, bottom)  # 来自Step1的letterbox函数

boxes = scale_boxes(boxes, orig_shape, new_shape, padding)

Step 6：可视化结果

最后，我们可以将检测结果可视化，查看是否正确。

def draw_boxes(image, boxes, scores, class_names, score_threshold=0.5):
    """
    在图像上绘制边界框和类别标签
    image: 原图
    boxes: [N, 4] 的边界框 (x1, y1, x2, y2)
    scores: [N, 80] 的类别概率
    class_names: 类别名称列表
    score_threshold: 分数阈值
    """
    for i in range(len(boxes)):
        # 获取当前框的类别分数和类别索引
        class_score = np.max(scores[i])
        class_index = np.argmax(scores[i])
        if class_score < score_threshold:
            continue
        # 获取边界框坐标
        x1, y1, x2, y2 = boxes[i].astype(int)
        # 绘制边界框
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
        # 绘制类别标签
        label = f"{class_names[class_index]}: {class_score:.2f}"
        cv2.putText(image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    return image

# 假设有类别名称列表（COCO数据集80类）
class_names = [...]  # 你的类别名称列表

# 绘制边界框
result_img = draw_boxes(img0, boxes, scores, class_names, score_threshold=0.5)

# 保存或显示图像
cv2.imwrite('result.jpg', result_img)